-
Notifications
You must be signed in to change notification settings - Fork 63
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AssertionError: Non-consecutive added token '<obj>' found. Should have index 50272 but has index 50265 in saved vocabulary. #1
Comments
Hi, there seems to be a bug with the transformers version used for training Rebel (4.4.0) regarding the added tokens. I will check if I can update it in the requirements file and nothing breaks, but if you just want to load the model and tokenizer as in the demo.py file, just update to a newer transformers version, ie. 4.12.4, and the issue should be gone. |
Hi @LittlePea13 , thanks for your great work. So far I have used newest version of Transformer and this bug is still there. Could you reopen the thread and help us solving the bug? Thanks! |
Hi @dxlong2000 could you give some more context on how the error happened? thanks. |
By some reason I can run it now. Thanks very much for your reply, we can close it now. |
How did you solve it? (same issue) |
Hi @zozni, did you update the transformers library? |
After reinstalling with the latest version, it worked successfully. thank you! |
Traceback:
File "/home/rahulpal/anaconda3/envs/rebel/lib/python3.7/site-packages/streamlit/script_runner.py", line 338, in _run_script
exec(code, module.dict)
File "/home/rahulpal/Documents/rebel-main/demo.py", line 57, in
tokenizer, model, dataset = load_models()
File "/home/rahulpal/anaconda3/envs/rebel/lib/python3.7/site-packages/streamlit/caching.py", line 573, in wrapped_func
return get_or_create_cached_value()
File "/home/rahulpal/anaconda3/envs/rebel/lib/python3.7/site-packages/streamlit/caching.py", line 555, in get_or_create_cached_value
return_value = func(*args, **kwargs)
File "/home/rahulpal/Documents/rebel-main/demo.py", line 18, in load_models
tokenizer = AutoTokenizer.from_pretrained("Babelscape/rebel-large")
File "/home/rahulpal/anaconda3/envs/rebel/lib/python3.7/site-packages/transformers/models/auto/tokenization_auto.py", line 416, in from_pretrained
return tokenizer_class_fast.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs)
File "/home/rahulpal/anaconda3/envs/rebel/lib/python3.7/site-packages/transformers/tokenization_utils_base.py", line 1705, in from_pretrained
resolved_vocab_files, pretrained_model_name_or_path, init_configuration, *init_inputs, **kwargs
File "/home/rahulpal/anaconda3/envs/rebel/lib/python3.7/site-packages/transformers/tokenization_utils_base.py", line 1811, in _from_pretrained
f"Non-consecutive added token '{token}' found. "
The text was updated successfully, but these errors were encountered: