Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

why keep "###" before instruct text? #44

Closed
Cescfangs opened this issue Jun 27, 2023 · 6 comments
Closed

why keep "###" before instruct text? #44

Cescfangs opened this issue Jun 27, 2023 · 6 comments

Comments

@Cescfangs
Copy link

for tokenized_len, speaker in zip(tokenized_lens, speakers):
if speaker == "human":
target[cur_idx+2:cur_idx + tokenized_len] = IGNORE_INDEX

I was reading _mask_targets(), I guess this function is using mask to ignore loss from instruct text, but why do you manually keep [curr_idx: curr_idx+2] which is "###" before the actual instruct text?

@hangzhang-nlp
Copy link
Collaborator

The assistant will learn to generate "###" if it wants to end the current round. So "###" can be understood as the EOS flag of each session.

@Cescfangs
Copy link
Author

The assistant will learn to generate "###" if it wants to end the current round. So "###" can be understood as the EOS flag of each session.

Thanks for the reply, I understand "###" works like EOS here, and I think the "###" in assistant text is more like EOS? howerver why do we want assistant to learn EOS before instruct?

@hangzhang-nlp
Copy link
Collaborator

During inference, we will stop generating tokens once the assistant outputs "###".
Suppose that the training data is " ###Human: Hi. ###Assistant: Hi, can I help you. ###Human: yes".
The content which needs to calculate loss is "Hi, can I help you. ###". So the model can learn to generate "###" as the end flag of his reply.

@Cescfangs
Copy link
Author

During inference, we will stop generating tokens once the assistant outputs "###". Suppose that the training data is " ###Human: Hi. ###Assistant: Hi, can I help you. ###Human: yes". The content which needs to calculate loss is "Hi, can I help you. ###". So the model can learn to generate "###" as the end flag of his reply.

I mean you are also keep the first "###"(the one before "Human: Hi...").

@hangzhang-nlp
Copy link
Collaborator

Oh, I see. This place is just for convenience, and you can add a judgment, mask the first "###".

@Cescfangs
Copy link
Author

Okay, thanks for the confirmation

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants