-
Notifications
You must be signed in to change notification settings - Fork 471
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Set add_special_tokens=False
to not add EOS unexpectedly
#287
Conversation
I just realised this breaks the T5 example via │ /home/a/trlx/examples/summarize_daily_cnn/t5_summarize_daily_cnn.py:26 in reward_fn │
│ │
│ 23 if __name__ == "__main__": │
│ 24 │ │
│ 25 │ def reward_fn(samples: List[str], prompts: List[str], outputs: List[str]): │
│ ❱ 26 │ │ original_summaries = [prompt_label[prompt.strip()] for prompt in prompts] │
│ 27 │ │ scores = [ │
│ 28 │ │ │ meteor.compute(predictions=[output.strip()], references=[original])["meteor" │
│ 29 │ │ │ for (original, output) in zip(original_summaries, outputs) │
│ │
│ /home/a/trlx/examples/summarize_daily_cnn/t5_summarize_daily_cnn.py:26 in <listcomp> │
│ │
│ 23 if __name__ == "__main__": │
│ 24 │ │
│ 25 │ def reward_fn(samples: List[str], prompts: List[str], outputs: List[str]): │
│ ❱ 26 │ │ original_summaries = [prompt_label[prompt.strip()] for prompt in prompts] │
│ 27 │ │ scores = [ │
│ 28 │ │ │ meteor.compute(predictions=[output.strip()], references=[original])["meteor" │
│ 29 │ │ │ for (original, output) in zip(original_summaries, outputs) │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
KeyError: 'Summarize: (CNN) -- This week marks one of the most-exciting non-major events of the golf season -- the Players Championship at the famed TPC Sawgrass. With
a deep field and a great course, you won\'t want to miss any of the action. Before the tournament tees off, we had a chance to catch up with TPC Sawgrass PGA Head
Professional Matt Borocz, who provided some inside insight on the home of the PGA Tour. PGA.com: Thanks for joining us. This week presents one of the most exciting on |
@cat-state wait where is this happening in the trainers? It looks like the trlx/trlx/trainer/accelerate_ilql_trainer.py Lines 38 to 52 in e139ca4
Let me loop @reciprocated in here because I vaguely recall he brought this up before. Also, if these methods are dead we should probably remove them. |
Neither trlx/trlx/pipeline/offline_pipeline.py Line 20 in f115eea
trlx/trlx/orchestrator/ppo_orchestrator.py Line 117 in f115eea
Only the last needs to have <eos> appended (for the reward model & to make indexing work) and I think none really need <bos>
|
Oh right! so this was just dead code 😅
With the 20b or J or gpt2 the tokenizers don't add BOS/EOS by default it seems, however the original issue was when using a tokenizer that does |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, @cat-state! This looks good 👍 Just leaving some questions for small edits.
trlx/pipeline/offline_pipeline.py
Outdated
model_inputs = tokenizer(prompts, truncation=True, padding=False, max_length=max_prompt_length) | ||
prompts = model_inputs["input_ids"] | ||
|
||
# manually prepend bos token if not already present to match RL trainers tokenization |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment might be misleading since we no longer prepend BOS tokens from the trainers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh yeah, I'll change it. We still do add BOS and EOS for ILQL btw
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like in your report there are 5 different runs all with slightly different results, however all use gpt2-imdb
for which this pr shouldn't change anything, do you think this is only due to added BOS's influence?
trlx/pipeline/offline_pipeline.py
Outdated
# default tokenizer behavior for PPO | ||
if tokenizer.bos_token is not None: | ||
prompts = [ | ||
tokenizer.bos_token + prompt if not prompt.startswith(tokenizer.bos_token) else prompt |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have to force adding BOS here? In ILQL it is only added in case there is only a single string passed, to make sure loss starts from 0-index of output (action_ixs[0] = 0) by prepending a "prompt" as BOS, and not when real [prompt, output]
is passed (unless truncation comes into play). Don't want to change behaviour here as I'm finilizing HH PR with already shaby results as they are, which may or may not take a hit due to this change 🙂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, that makes sense that the bos
is only added to the the prompt for a completion if no prompt is given for that completion.
@reciprocated As an aside; could you clarify why you believe "none [of the tokenizations] really need |
The |
add_special_tokens=False
to not add EOS unexpectedly
@jon-tow There I've meant that
section C.4.:
|
@reciprocated Yeah... I also haven't seen any experiments comparing results from |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cat-state Just noting that this subtly changes results for T5 models dailymail/cnn summarization example (PPO): https://api.wandb.ai/links/jon-tow/p3kg4ejf
Reward seems to improve better with this fix since removing the eos token before passing ids to generate
is (probably?) the right thing to do.
@jon-tow Yeah, but the first logprob of interest in PPO is in the beginning of output, not of the prompt, so unless Lines 36 to 37 in 34817b6
|
Thanks for checking! I hope so, and its on the same random seed too. Seems to not affect GPT-2/neox tokenizer based models as those don't add special tokens by default. |
This PR addresses #253 . Set
add_special_tokens=False
and instead add BOS manually, matching what is done in the trainers.comparison report with
ppo_sentiments
: https://wandb.ai/carperai/trlx/reports/PromptPipeline-Add-BOS---VmlldzozNTA2MTM4