LM head module #19

dbaranchuk · 2022-07-10T17:45:04Z

BloomForCausalLM == BloomForCausalLM from HF transformers;
Added LMHead module that doesn't duplicate word embeddings;
This module also allows for efficient chunked forward when embeddings are in half-precision on CPU. The chunk size can be set in DistributedBloomConfig (default: 10000). Default chunk_size provides reasonable performance and consumes extra ~0.6Gb memory instead of ~7.2Gb for the 176B model;
Moved 'set_requires_grad' from BloomModel to DistributedBloomModel to make BloomModel == BloomModel from HF transformers.

justheuristic

LGTM, let's merge

dbaranchuk added 6 commits July 10, 2022 15:10

LM head for CausalLM & chunked forward

df42822

add LM head for DistributedBloomCausalLM

b3cc9e0

fix

6bffeff

refactoring

79280c4

fix comments

5168a34

mv set_requires_grad to remote_model

21e1f42

justheuristic approved these changes Jul 12, 2022

View reviewed changes

minor refactoring

fd0bf06

dbaranchuk merged commit ac7df18 into main Jul 12, 2022

Provide feedback