prompt format? #30

silvacarl2 · 2023-11-06T20:35:59Z

this is not an issue but did not know where to put it. Is there a specific prompt format to use?

mallorbc · 2023-11-06T22:23:52Z

Carl,

This is just a base model(for now), so there is no fancy prompting except for BOS and EOS token, which is <|startoftext|> and <|endoftext|> respectfully.

silvacarl2 · 2023-11-06T22:58:00Z

LOL THX

loofahcus · 2023-11-07T02:30:15Z

Carl,

This is just a base model(for now), so there is no fancy prompting except for BOS and EOS token, which is <|startoftext|> and <|endoftext|> respectfully.

Actually, just EOS (<|endoftext|>) in the base models :)

mallorbc · 2023-11-08T17:01:33Z

Carl,
This is just a base model(for now), so there is no fancy prompting except for BOS and EOS token, which is <|startoftext|> and <|endoftext|> respectfully.

Actually, just EOS (<|endoftext|>) in the base models :)

The tokenizer has a unique BOS token. It should not used for finetuning?

loofahcus · 2023-11-08T23:31:41Z

Carl,
This is just a base model(for now), so there is no fancy prompting except for BOS and EOS token, which is <|startoftext|> and <|endoftext|> respectfully.

Actually, just EOS (<|endoftext|>) in the base models :)

The tokenizer has a unique BOS token. It should not used for finetuning?

I think it's OK using it in finetuning.

ericzhou571 · 2023-11-09T02:36:57Z

loofahcus · 2023-11-09T05:56:41Z

@loofahcus I carefully check the tokenizer token map and find three speical token <|System|> <|Human|> <|Assistant|>. Will that be the speical token you plan to use to build multi round conversation? e.g.: <|Human|>repeat "this is a multi-turn conversation1" pls<|Assistant|>this is a multi-turn conversation1<|endoftext|><|Human|>repeat "this is a multi-turn conversation2" pls<|Assistant|>this is a multi-turn conversation2<|endoftext|><|Human|>repeat "this is a multi-turn conversation3" pls<|Assistant|>this is a multi-turn conversation3<|endoftext|>

They're reserved for chat models, but finally we did not use them for some reason

ericzhou571 · 2023-11-09T06:02:42Z

@loofahcus I carefully check the tokenizer token map and find three speical token <|System|> <|Human|> <|Assistant|>. Will that be the speical token you plan to use to build multi round conversation? e.g.: <|Human|>repeat "this is a multi-turn conversation1" pls<|Assistant|>this is a multi-turn conversation1<|endoftext|><|Human|>repeat "this is a multi-turn conversation2" pls<|Assistant|>this is a multi-turn conversation2<|endoftext|><|Human|>repeat "this is a multi-turn conversation3" pls<|Assistant|>this is a multi-turn conversation3<|endoftext|>

They're reserved for chat models, but finally we did not use them for some reason

Would you mind to explain it a litte bit in detail? Because I am going to sft a chat model with these format. Will it lead to a poor conversation performance? Or should I use an other general chat format? e.g., CHATML?

loofahcus · 2023-11-09T07:18:13Z

@ericzhou571 It would not lead to a poor conversation performance once you re-initialize embed and lm_head in those positions~

ZhaoFancy added question Further information is requested regression doc Your PR contains doc changes, no matter whether the changes are in markdown or code files. labels Nov 7, 2023

ZhaoFancy closed this as completed Nov 16, 2023

Yimi81 added the doc-not-needed Your PR changes do not impact docs. label Mar 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

prompt format? #30

prompt format? #30

silvacarl2 commented Nov 6, 2023

mallorbc commented Nov 6, 2023

silvacarl2 commented Nov 6, 2023

loofahcus commented Nov 7, 2023 •

edited

mallorbc commented Nov 8, 2023

loofahcus commented Nov 8, 2023

ericzhou571 commented Nov 9, 2023

loofahcus commented Nov 9, 2023

ericzhou571 commented Nov 9, 2023

loofahcus commented Nov 9, 2023 •

edited

prompt format? #30

prompt format? #30

Comments

silvacarl2 commented Nov 6, 2023

mallorbc commented Nov 6, 2023

silvacarl2 commented Nov 6, 2023

loofahcus commented Nov 7, 2023 • edited

mallorbc commented Nov 8, 2023

loofahcus commented Nov 8, 2023

ericzhou571 commented Nov 9, 2023

loofahcus commented Nov 9, 2023

ericzhou571 commented Nov 9, 2023

loofahcus commented Nov 9, 2023 • edited

loofahcus commented Nov 7, 2023 •

edited

loofahcus commented Nov 9, 2023 •

edited