Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RecursionError: maximum recursion depth exceeded #442

Closed
philwee opened this issue Apr 27, 2023 · 10 comments
Closed

RecursionError: maximum recursion depth exceeded #442

philwee opened this issue Apr 27, 2023 · 10 comments
Labels
bug Something isn't working.

Comments

@philwee
Copy link
Contributor

philwee commented Apr 27, 2023

.local/lib/python3.9/site-packages/transformers/tokenization_utils_base.py:1142 in │
│ unk_token_id │
│ │
│ 1139 │ │ """ │
│ 1140 │ │ if self._unk_token is None: │
│ 1141 │ │ │ return None │
│ ❱ 1142 │ │ return self.convert_tokens_to_ids(self.unk_token) │
│ 1143 │ │
│ 1144 │ @Property
│ 1145 │ def sep_token_id(self) -> Optional[int]: │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
RecursionError: maximum recursion depth exceeded

Weird bug that happens when using hf-causal-experiment with model and peft

@haileyschoelkopf
Copy link
Collaborator

Thanks for raising this! I have a suspicion this might be fixable by setting the environment variable TOKENIZERS_PARALLELISM=false.

Do you have a model + task combination / command that can replicate this consistently?

@philwee
Copy link
Contributor Author

philwee commented Apr 28, 2023

Yup, I was doing using this adapter gpt4all-lora on llama7b, running arc_easy, arc_challenge (acc), piqa (acc), sciq, mnli and truthful_qa_mc

python main.py --model hf-causal-experimental --model_args pretrained=decapoda-research/llama-7b-hf,peft=nomic-ai/gpt4all-lora --tasks piqa,wikitext,mnli,arc_easy,arc_challenge,openbookqa,truthfulqa_mc,sciq --device cuda:0

If relevant, it ran perfectly fine but when it came time for the results to show up, it crashes with the error message in the issue

@haileyschoelkopf
Copy link
Collaborator

Thanks, I’ll give this a try in a minute!

This does sound like an interaction between the bootstrap stderr multiprocessing and tokenizers in this case.

@haileyschoelkopf
Copy link
Collaborator

  File "/home/hailey/anaconda3/envs/new-harness/lib/python3.9/site-packages/transformers/models/auto/tokenization_auto.py", line 699, in from_pretrained
    raise ValueError(
ValueError: Tokenizer class LLaMATokenizer does not exist or is not currently imported.

Since it seems you've been able to get this running--what's the recommended fix for this LLaMA upload? @philwee

@philwee
Copy link
Contributor Author

philwee commented Apr 28, 2023

You can either changing the transformers package you're on via one of these two

pip install git+https://github.com/mbehm/transformers (old one where it worked)
pip install git+https://github.com/huggingface/transformers (I think they fixed it some point but not sure how its going right now)

Please let me know if this helps! :)

@StellaAthena
Copy link
Member

Closing this issue as it seems to be a bug in the HF library that has now been fixed. Anyone encountering this issue should make sure they’ve updated to the latest version of transformers before reporting the bug.

@StellaAthena StellaAthena added the bug Something isn't working. label Apr 30, 2023
@haileyschoelkopf
Copy link
Collaborator

Do you have a link for the bug in transformers that raises + fixes this? The bug being raised in this issue is not the LLamaTokenizer class name error, if that's what you're referring to.

@StellaAthena StellaAthena reopened this May 1, 2023
@StellaAthena
Copy link
Member

Do you have a link for the bug in transformers that raises + fixes this? The bug being raised in this issue is not the LLamaTokenizer class name error, if that's what you're referring to.

Oh you’re right, I misread the end of the convo. The issue you’re having is that it’s LlamaTokenizer now, not LLamaTokenizer

@upunaprosk
Copy link

upunaprosk commented May 15, 2023

Had the same issue with Llama models. The problem stems from tokenizer initialization.
Put exact bos, eos and unk tokens in your tokenizer config:
{"bos_token": "<s>", "eos_token": "</s>", ... "unk_token": "<unk>"}
and add tokenizer path:
python main.py --model hf-causal-experimental --model_args pretrained="hf-format-llama-7B",tokenizer="hf-format-llama-7B" --device cuda:0 --tasks crows_pairs_english

@StellaAthena
Copy link
Member

@upunaprosk if correcting the tokenizer solves the problem, it seems like this issue should be opened on the HF transformers repo instead of this one. We are loading the model the way we are told to, it’s just that the transformers library doesn’t know how to load the model.

@philwee @haileyschoelkopf if one of you can verify that this patch solves the problem, I’m happy to mark this as closed and open a corresponding issue on the transformers repo.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working.
Projects
None yet
Development

No branches or pull requests

4 participants