-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for Cerebras-GPT for training #2276
Conversation
model/model_training/utils.py
Outdated
@@ -189,6 +190,9 @@ def get_tokenizer(conf) -> transformers.AutoTokenizer: | |||
# explicitly specify LLaMATokenizer class until AutoTokenizer works | |||
# assumes that the tokenizer config is stored in the same directory as the model weights | |||
tokenizer = LLaMATokenizer.from_pretrained(conf.model_name) | |||
elif "cerebras" in conf.model_name: | |||
# Cerebras tokenizer for 13B is the tokenizer for all sizes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does this have to be specified? Are the other models released without a tokenizer-config? Otherwise this coud be removed .. the special handling for LLaMA was also removed already...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
13B is the only model which has an accompanying tokenizer uploaded to HuggingFace, so when removing this line and running with any size other than 13B we get an error saying, for e.g., could not find tokenizer cerebras/Cerebras-GPT-6.7B
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As part of resolving merge conflicts with the LLaMa change I have tweaked this and added a comment explaining why it is done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
This adds configs for the 13B and 6.7B Cerebras models but with the necessary code changes made it should be easy enough to add configs for the smaller models too if desired. The tokenizer seems to be the GPT-2 fast tokenizer from HuggingFace so the special tokens have been configured for that.
This adds configs for the 13B and 6.7B Cerebras models but with the necessary code changes made it should be easy enough to add configs for the smaller models too if desired. The tokenizer seems to be the GPT-2 fast tokenizer from HuggingFace so the special tokens have been configured for that.