Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AssertionError: Non-consecutive added token '<obj>' found. Should have index 50272 but has index 50265 in saved vocabulary. #1

Closed
rahul765 opened this issue Nov 14, 2021 · 7 comments
Assignees
Labels
bug Something isn't working

Comments

@rahul765
Copy link

Traceback:
File "/home/rahulpal/anaconda3/envs/rebel/lib/python3.7/site-packages/streamlit/script_runner.py", line 338, in _run_script
exec(code, module.dict)
File "/home/rahulpal/Documents/rebel-main/demo.py", line 57, in
tokenizer, model, dataset = load_models()
File "/home/rahulpal/anaconda3/envs/rebel/lib/python3.7/site-packages/streamlit/caching.py", line 573, in wrapped_func
return get_or_create_cached_value()
File "/home/rahulpal/anaconda3/envs/rebel/lib/python3.7/site-packages/streamlit/caching.py", line 555, in get_or_create_cached_value
return_value = func(*args, **kwargs)
File "/home/rahulpal/Documents/rebel-main/demo.py", line 18, in load_models
tokenizer = AutoTokenizer.from_pretrained("Babelscape/rebel-large")
File "/home/rahulpal/anaconda3/envs/rebel/lib/python3.7/site-packages/transformers/models/auto/tokenization_auto.py", line 416, in from_pretrained
return tokenizer_class_fast.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs)
File "/home/rahulpal/anaconda3/envs/rebel/lib/python3.7/site-packages/transformers/tokenization_utils_base.py", line 1705, in from_pretrained
resolved_vocab_files, pretrained_model_name_or_path, init_configuration, *init_inputs, **kwargs
File "/home/rahulpal/anaconda3/envs/rebel/lib/python3.7/site-packages/transformers/tokenization_utils_base.py", line 1811, in _from_pretrained
f"Non-consecutive added token '{token}' found. "

@LittlePea13
Copy link
Collaborator

Hi, there seems to be a bug with the transformers version used for training Rebel (4.4.0) regarding the added tokens. I will check if I can update it in the requirements file and nothing breaks, but if you just want to load the model and tokenizer as in the demo.py file, just update to a newer transformers version, ie. 4.12.4, and the issue should be gone.

@LittlePea13 LittlePea13 self-assigned this Nov 17, 2021
@LittlePea13 LittlePea13 added the bug Something isn't working label Nov 17, 2021
@dxlong2000
Copy link

Hi @LittlePea13 , thanks for your great work. So far I have used newest version of Transformer and this bug is still there. Could you reopen the thread and help us solving the bug?

Thanks!

@LittlePea13
Copy link
Collaborator

Hi @dxlong2000 could you give some more context on how the error happened? thanks.

@dxlong2000
Copy link

By some reason I can run it now. Thanks very much for your reply, we can close it now.

@zozni
Copy link

zozni commented Mar 30, 2022

How did you solve it? (same issue)
I didn't solve it. ㅠㅠ

@LittlePea13
Copy link
Collaborator

Hi @zozni, did you update the transformers library?

@zozni
Copy link

zozni commented Mar 30, 2022

After reinstalling with the latest version, it worked successfully. thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants