Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RecursionError: maximum recursion depth exceeded #86

Closed
nkcsjxd opened this issue Aug 8, 2023 · 2 comments
Closed

RecursionError: maximum recursion depth exceeded #86

nkcsjxd opened this issue Aug 8, 2023 · 2 comments

Comments

@nkcsjxd
Copy link

nkcsjxd commented Aug 8, 2023

File "/home/gfr/miniconda3/envs/Huatuo/lib/python3.9/site-packages/transformers/tokenization_utils_base.py", line 1155, in unk_token_id
return self.convert_tokens_to_ids(self.unk_token)
File "/home/gfr/miniconda3/envs/Huatuo/lib/python3.9/site-packages/transformers/tokenization_utils_fast.py", line 250, in convert_tokens_to_ids
return self._convert_token_to_id_with_added_voc(tokens)
File "/home/gfr/miniconda3/envs/Huatuo/lib/python3.9/site-packages/transformers/tokenization_utils_fast.py", line 257, in _convert_token_to_id_with_added_voc
return self.unk_token_id
File "/home/gfr/miniconda3/envs/Huatuo/lib/python3.9/site-packages/transformers/tokenization_utils_base.py", line 1155, in unk_token_id
return self.convert_tokens_to_ids(self.unk_token)
RecursionError: maximum recursion depth exceeded
运行之后的报错信息,请问是为什么?
之前遇到ValueError: Tokenizer class LLaMATokenizer does not exist or is not currently imported.
将llama基座中的参数改为LlamaTokenizer解决
又遇到TypeError: Descriptors cannot not be created directly.
If this call came from a _pb2.py file, your generated code is out of date and must be regenerated with protoc >= 3.19.0.
If you cannot immediately regenerate your protos, some other possible workarounds are:

  1. Downgrade the protobuf package to 3.20.x or lower.
  2. Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python (but this will use pure-Python parsing and will be much slower).
    将protobuf版本改为3.20.1解决
@s65b40
Copy link
Collaborator

s65b40 commented Aug 9, 2023

您好,请您完整描述一下您所运行的代码,并尽量提供报错的完整内容,谢谢

@nkcsjxd
Copy link
Author

nkcsjxd commented Aug 10, 2023

运行代码:
/home/gfr/jxd/Huatuo-Llama-Med-Chinese/finetune.py --base_model ./model/llama-7b-hf --data_path ./data/llama_data.json --output_dir ./lora-llama-l1 --prompt_template_name med_template --micro_batch_size 128 --batch_size 128 --wandb_run_name l1
我觉得可能是这个原因:
Hey! The main issue is that they did not update the tokenizer files at "decapoda-research/llama-7b-hf" but they are using the latest version of transformers. The tokenizer was fixed see huggingface/transformers#22402 and corrected. Nothing we can do on our end...
我尝试把基模型中的tokenizer_config.json文件修改为:
{ "add_prefix_space": false, "bos_token": "<s>", "eos_token": "</s>", "model_max_length": 1000000000000000019884624838656, "pad_token": "<pad>", "padding_side": "right", "special_tokens_map_file": null, "tokenizer_class": "LlamaTokenizer", "unk_token": "<unk>" }
后可以运行,具体原因不是很清楚。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants