Set `add_special_tokens=False` to not add EOS unexpectedly #287

cat-state · 2023-02-07T23:20:19Z

This PR addresses #253 . Set add_special_tokens=False and instead add BOS manually, matching what is done in the trainers.

comparison report with ppo_sentiments: https://wandb.ai/carperai/trlx/reports/PromptPipeline-Add-BOS---VmlldzozNTA2MTM4

cat-state · 2023-02-08T00:16:27Z

I just realised this breaks the T5 example via

│ /home/a/trlx/examples/summarize_daily_cnn/t5_summarize_daily_cnn.py:26 in reward_fn              │
│                                                                                                  │
│   23 if __name__ == "__main__":                                                                  │
│   24 │                                                                                           │
│   25 │   def reward_fn(samples: List[str], prompts: List[str], outputs: List[str]):              │
│ ❱ 26 │   │   original_summaries = [prompt_label[prompt.strip()] for prompt in prompts]           │
│   27 │   │   scores = [                                                                          │
│   28 │   │   │   meteor.compute(predictions=[output.strip()], references=[original])["meteor"    │
│   29 │   │   │   for (original, output) in zip(original_summaries, outputs)                      │
│                                                                                                  │
│ /home/a/trlx/examples/summarize_daily_cnn/t5_summarize_daily_cnn.py:26 in <listcomp>             │
│                                                                                                  │
│   23 if __name__ == "__main__":                                                                  │
│   24 │                                                                                           │
│   25 │   def reward_fn(samples: List[str], prompts: List[str], outputs: List[str]):              │
│ ❱ 26 │   │   original_summaries = [prompt_label[prompt.strip()] for prompt in prompts]           │
│   27 │   │   scores = [                                                                          │
│   28 │   │   │   meteor.compute(predictions=[output.strip()], references=[original])["meteor"    │
│   29 │   │   │   for (original, output) in zip(original_summaries, outputs)                      │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
KeyError: 'Summarize: (CNN) -- This week marks one of the most-exciting non-major events of the golf season -- the Players Championship at the famed TPC Sawgrass. With 
a deep field and a great course, you won\'t want to miss any of the action. Before the tournament tees off, we had a chance to catch up with TPC Sawgrass PGA Head 
Professional Matt Borocz, who provided some inside insight on the home of the PGA Tour. PGA.com: Thanks for joining us. This week presents one of the most exciting on

jon-tow · 2023-02-09T03:54:18Z

Set add_special_tokens=False and instead add BOS manually, matching what is done in the trainers.

@cat-state wait where is this happening in the trainers? It looks like the <>Trainer.tokenize() methods that manually add BOS tokens never actually get called anymore, e.g. the base method

trlx/trlx/trainer/accelerate_ilql_trainer.py

Lines 38 to 52 in e139ca4

    
           def tokenize(self, texts: Union[Sequence[str], Sequence[torch.LongTensor]]): 
        
               if isinstance(texts[0], torch.LongTensor): 
        
                   return texts 
        
               tokenized = self.tokenizer( 
        
                   [self.tokenizer.bos_token + x + self.tokenizer.eos_token for x in texts], 
        
                   max_length=self.max_length, 
        
                   truncation=True, 
        
                   # NOTE: We manually add special tokens (bos) above so we set this False 
        
                   # to avoid models that automatically add special tokens (e.g. OPT) 
        
                   # adding them twice more. 
        
                   add_special_tokens=False, 
        
               ) 
        
               input_ids = list(map(torch.as_tensor, tokenized.input_ids)) 
        
               return input_ids

Let me loop @reciprocated in here because I vaguely recall he brought this up before. Also, if these methods are dead we should probably remove them.

maxreciprocate · 2023-02-09T17:52:49Z

Neither AccelerateRLTrainer.tokenize, AccelerateILQLTrainer.tokenize nor NeMoILQLTrainer.tokenize don't seem to be used anywhere at the moment. Actually there are only three places were tokenization happens: in both orchestrators and in the prompt pipeline (apart from examples)

trlx/trlx/pipeline/offline_pipeline.py

Line 20 in f115eea

    
           model_inputs = tokenizer(prompts, truncation=True, padding=False, max_length=max_prompt_length)

trlx/trlx/orchestrator/offline_orchestrator.py

Line 32 in f115eea

tokens = tokenizer(phrase).input_ids[-ctx_length:]

trlx/trlx/orchestrator/ppo_orchestrator.py

Line 117 in f115eea

outputs = self.trainer.tokenizer(str_outputs).input_ids

Only the last needs to have <eos> appended (for the reward model & to make indexing work) and I think none really need <bos>

cat-state · 2023-02-09T19:04:14Z

are only three places were tokenization happens: in both orchestrators and in the prompt pipeline (apart from examples)

Oh right! so this was just dead code 😅
It seems like we still add EOS and BOS for ILQL in

trlx/trlx/orchestrator/offline_orchestrator.py

Line 23 in f115eea

dialogue = [tokenizer.bos_token, dialogue]

With the 20b or J or gpt2 the tokenizers don't add BOS/EOS by default it seems, however the original issue was when using a tokenizer that does

jon-tow

Thanks, @cat-state! This looks good 👍 Just leaving some questions for small edits.

trlx/trainer/accelerate_ilql_trainer.py

jon-tow · 2023-02-10T15:24:37Z

trlx/pipeline/offline_pipeline.py

-        model_inputs = tokenizer(prompts, truncation=True, padding=False, max_length=max_prompt_length)
-        prompts = model_inputs["input_ids"]
+
+        # manually prepend bos token if not already present to match RL trainers tokenization


This comment might be misleading since we no longer prepend BOS tokens from the trainers.

Oh yeah, I'll change it. We still do add BOS and EOS for ILQL btw

…ine-tokens

maxreciprocate

Seems like in your report there are 5 different runs all with slightly different results, however all use gpt2-imdb for which this pr shouldn't change anything, do you think this is only due to added BOS's influence?

maxreciprocate · 2023-02-10T17:36:16Z

trlx/pipeline/offline_pipeline.py

+        # default tokenizer behavior for PPO
+        if tokenizer.bos_token is not None:
+            prompts = [
+                tokenizer.bos_token + prompt if not prompt.startswith(tokenizer.bos_token) else prompt


Do we have to force adding BOS here? In ILQL it is only added in case there is only a single string passed, to make sure loss starts from 0-index of output (action_ixs[0] = 0) by prepending a "prompt" as BOS, and not when real [prompt, output] is passed (unless truncation comes into play). Don't want to change behaviour here as I'm finilizing HH PR with already shaby results as they are, which may or may not take a hit due to this change 🙂

I see, that makes sense that the bos is only added to the the prompt for a completion if no prompt is given for that completion.

jon-tow · 2023-02-10T17:50:28Z

@reciprocated As an aside; could you clarify why you believe "none [of the tokenizations] really need <bos>"? Is this from empirical results you've noticed with RL? The "natural" thing to do with LMs is to include <bos> for inference (e.g. for PPO).

cat-state · 2023-02-10T17:57:28Z

Seems like in your report there are 5 different runs all with slightly different results, however all use gpt2-imdb for which this pr shouldn't change anything, do you think this is only due to added BOS's influence?

The main runs ran with the same seeds have the same results, I included 2 runs with each seed for both main and with bos to see how they change it

maxreciprocate · 2023-02-10T18:45:21Z

could you clarify why you believe "none [of the tokenizations] really need <bos>"?

@jon-tow There I've meant that <bos> is really not needed algorithmically (unlike in the single output ILQL case), however I don't have any evidence nor counter-evidence that adding it is not advantageous in general. Yet there is such evidence for adding <eos>, something we actually don't do currently but I feel like ought to, from Anthropic's 2021 HHH paper:

We also found that appending a special ‘end-of-context’ token to each sequence to unambiguously delineate the end of passage sometimes improves performance, as discussed in section C.4.

section C.4.:

Here we outline a technical detail that improves the overall performance of preference models. We designate a special “end-of-context” token (EOC) which is included as the final token of each sample context. The preference model score is also predicted directly on top of this token. For our experiments we used the token, but in principle many other choices are possible.

We compare finetuning experiments with and without the EOC token. For experiments with, we consistently apply the same EOC token throughout both the PMP and fine-tuning stages; and for experiments without, we consistently do not apply the EOC token. From figure 31 we see that the EOC clearly improves performance.

We hypothesize that the improvement comes from two factors:
• Sometimes the sentiment behind a natural language statement can be altered or reversed significantly by the addition of one or two words, and so knowing where the context ends can be helpful for the preference model to predict a sensible score.
• Without an EOC token, the preference model must not only predict a score, but also try to anticipate where the context ends. As a result, the model is forced to predict a score at multiple tokens where the context may end, rather than at a single token where it definitely ends. This adds a level of ambiguity which may cause the model to under-perform.

jon-tow · 2023-02-10T21:36:37Z

@reciprocated Yeah... I also haven't seen any experiments comparing results from <bos> vs no-<bos> for decoder models but folks make the argument you should because it gets you logprobs for the first token (however that may steer inference 🤷).
Thanks for linking the EOC stuff! (off-topic but I wonder if we did this for the HH RM training 🤔)

jon-tow

@cat-state Just noting that this subtly changes results for T5 models dailymail/cnn summarization example (PPO): https://api.wandb.ai/links/jon-tow/p3kg4ejf
Reward seems to improve better with this fix since removing the eos token before passing ids to generate is (probably?) the right thing to do.

maxreciprocate · 2023-02-10T22:29:25Z

@jon-tow Yeah, but the first logprob of interest in PPO is in the beginning of output, not of the prompt, so unless len(prompt) == 0 there is no reason to add <bos>. Regarding <eos> with HH RM, yes we did in https://github.com/Dahoas/reward-modeling/blob/deba81c2e9dab1514032800109325258caa470db/reward-modeling/rm_datasets.py#L93 @Dahoas, I also add it manually in:

trlx/examples/hh/ppo_hh.py

Lines 36 to 37 in 34817b6

    
           def reward_fn(samples, prompts, outputs): 
        
               samples = [s + reward_tokenizer.eos_token for s in samples]

cat-state · 2023-02-10T23:13:10Z

improve better with this fix since removing the eos token before passing ids to generate is (probably?) the right thing to do.

Thanks for checking! I hope so, and its on the same random seed too. Seems to not affect GPT-2/neox tokenizer based models as those don't add special tokens by default.

add bos and no special tokens

2ee85d8

cat-state requested a review from jon-tow February 7, 2023 23:20

cat-state marked this pull request as draft February 8, 2023 00:15

cat-state added 3 commits February 8, 2023 00:16

add no bos check

091a722

fix t5 example

d2497ab

fix other t5

8b33e9d

cat-state marked this pull request as ready for review February 8, 2023 22:54

cat-state added 2 commits February 9, 2023 19:05

Merge branch 'main' into fix-prompt-pipeline-tokens

3776db1

remove unused tokenize method

f2845b1

jon-tow requested changes Feb 10, 2023

View reviewed changes

cat-state added 4 commits February 10, 2023 17:16

Merge branch 'main' into fix-prompt-pipeline-tokens

c5aa0c0

change comment

3fc6ce5

Merge branch 'main' of github.com:CarperAI/trlx into fix-prompt-pipel…

24418e1

…ine-tokens

remove tokenize from base

a64a2b1

cat-state requested a review from jon-tow February 10, 2023 17:23

reremove special tokens adding from tokenize_dialog

4db387c

maxreciprocate reviewed Feb 10, 2023

View reviewed changes

no bos add

250dea1

cat-state changed the title ~~Fix PromptPipeline tokenization~~ Set add_special_tokens=False to not add EOS unexpectedly Feb 10, 2023

jon-tow approved these changes Feb 10, 2023

View reviewed changes

cat-state merged commit b91da7b into main Feb 10, 2023

cat-state mentioned this pull request Feb 11, 2023

Unexpected tokenization behavior of PromptPipeline #253

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Set `add_special_tokens=False` to not add EOS unexpectedly #287

Set `add_special_tokens=False` to not add EOS unexpectedly #287

cat-state commented Feb 7, 2023

cat-state commented Feb 8, 2023

jon-tow commented Feb 9, 2023

maxreciprocate commented Feb 9, 2023

cat-state commented Feb 9, 2023

jon-tow left a comment

jon-tow Feb 10, 2023

cat-state Feb 10, 2023

maxreciprocate left a comment •

edited

Loading

maxreciprocate Feb 10, 2023 •

edited

Loading

cat-state Feb 10, 2023

jon-tow commented Feb 10, 2023 •

edited

Loading

cat-state commented Feb 10, 2023 •

edited

Loading

maxreciprocate commented Feb 10, 2023 •

edited

Loading

jon-tow commented Feb 10, 2023

jon-tow left a comment

maxreciprocate commented Feb 10, 2023 •

edited

Loading

cat-state commented Feb 10, 2023 •

edited

Loading

Set add_special_tokens=False to not add EOS unexpectedly #287

Set add_special_tokens=False to not add EOS unexpectedly #287

Conversation

cat-state commented Feb 7, 2023

cat-state commented Feb 8, 2023

jon-tow commented Feb 9, 2023

maxreciprocate commented Feb 9, 2023

cat-state commented Feb 9, 2023

jon-tow left a comment

Choose a reason for hiding this comment

jon-tow Feb 10, 2023

Choose a reason for hiding this comment

cat-state Feb 10, 2023

Choose a reason for hiding this comment

maxreciprocate left a comment • edited Loading

Choose a reason for hiding this comment

maxreciprocate Feb 10, 2023 • edited Loading

Choose a reason for hiding this comment

cat-state Feb 10, 2023

Choose a reason for hiding this comment

jon-tow commented Feb 10, 2023 • edited Loading

cat-state commented Feb 10, 2023 • edited Loading

maxreciprocate commented Feb 10, 2023 • edited Loading

jon-tow commented Feb 10, 2023

jon-tow left a comment

Choose a reason for hiding this comment

maxreciprocate commented Feb 10, 2023 • edited Loading

cat-state commented Feb 10, 2023 • edited Loading

Set `add_special_tokens=False` to not add EOS unexpectedly #287

Set `add_special_tokens=False` to not add EOS unexpectedly #287

maxreciprocate left a comment •

edited

Loading

maxreciprocate Feb 10, 2023 •

edited

Loading

jon-tow commented Feb 10, 2023 •

edited

Loading

cat-state commented Feb 10, 2023 •

edited

Loading

maxreciprocate commented Feb 10, 2023 •

edited

Loading

maxreciprocate commented Feb 10, 2023 •

edited

Loading

cat-state commented Feb 10, 2023 •

edited

Loading