-
Notifications
You must be signed in to change notification settings - Fork 445
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
prompt format? #30
Comments
Carl, This is just a base model(for now), so there is no fancy prompting except for BOS and EOS token, which is <|startoftext|> and <|endoftext|> respectfully. |
LOL THX |
Actually, just EOS (<|endoftext|>) in the base models :) |
The tokenizer has a unique BOS token. It should not used for finetuning? |
I think it's OK using it in finetuning. |
@loofahcus I carefully check the tokenizer token map and find three speical token |
They're reserved for chat models, but finally we did not use them for some reason |
Would you mind to explain it a litte bit in detail? Because I am going to sft a chat model with these format. Will it lead to a poor conversation performance? Or should I use an other general chat format? e.g., CHATML? |
@ericzhou571 It would not lead to a poor conversation performance once you re-initialize embed and lm_head in those positions~ |
this is not an issue but did not know where to put it. Is there a specific prompt format to use?
The text was updated successfully, but these errors were encountered: