Skip to content

[Feature] How to modify the special token? I would like to keep <ref> and </ref> in the output, but skip other special tokens. #803

@zilunzhang

Description

@zilunzhang

Motivation

How to modify the special token set? I would like to keep <ref> and </ref> in the output, but skip other special tokens.

I modified the added_tokens.json, special_tokens_map.json by deleting the <ref> and </ref>. I also set the "special" attribute in <ref> and </ref> from tokenizer_config.json to be false. These approaches did not work. It worked when I modified the "responses = tokenizer.batch_decode(generation_output, skip_special_tokens=false)" from modeling_internvl_chat.py, but I want to skip other special tokens.

UPDADE: It works after I remove all <ref> and </ref> in tokenizer_config.json. However, model outputs <ref> and </ref> with white space around them.

Model Output: 1 <ref> car </ref> ({<30.47><63.77><6.42><2.90>|<68>})

Ground Truth: 1 <ref>car</ref>({<30.37><64.16><6.53><3.16>|<68>})

What should I do?

Thanks.

Related resources

No response

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions