Motivation
How to modify the special token set? I would like to keep <ref> and </ref> in the output, but skip other special tokens.
I modified the added_tokens.json, special_tokens_map.json by deleting the <ref> and </ref>. I also set the "special" attribute in <ref> and </ref> from tokenizer_config.json to be false. These approaches did not work. It worked when I modified the "responses = tokenizer.batch_decode(generation_output, skip_special_tokens=false)" from modeling_internvl_chat.py, but I want to skip other special tokens.
UPDADE: It works after I remove all <ref> and </ref> in tokenizer_config.json. However, model outputs <ref> and </ref> with white space around them.
Model Output: 1 <ref> car </ref> ({<30.47><63.77><6.42><2.90>|<68>})
Ground Truth: 1 <ref>car</ref>({<30.37><64.16><6.53><3.16>|<68>})
What should I do?
Thanks.
Related resources
No response
Additional context
No response
Motivation
How to modify the special token set? I would like to keep <ref> and </ref> in the output, but skip other special tokens.
I modified the added_tokens.json, special_tokens_map.json by deleting the <ref> and </ref>. I also set the "special" attribute in <ref> and </ref> from tokenizer_config.json to be false. These approaches did not work. It worked when I modified the "responses = tokenizer.batch_decode(generation_output, skip_special_tokens=false)" from modeling_internvl_chat.py, but I want to skip other special tokens.
UPDADE: It works after I remove all <ref> and </ref> in tokenizer_config.json. However, model outputs <ref> and </ref> with white space around them.
Model Output: 1 <ref> car </ref> ({<30.47><63.77><6.42><2.90>|<68>})
Ground Truth: 1 <ref>car</ref>({<30.37><64.16><6.53><3.16>|<68>})
What should I do?
Thanks.
Related resources
No response
Additional context
No response