Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

运行脚本generate_chatllama.py后,tokenizer报错 #8

Open
tianmala opened this issue Mar 31, 2023 · 10 comments
Open

运行脚本generate_chatllama.py后,tokenizer报错 #8

tianmala opened this issue Mar 31, 2023 · 10 comments

Comments

@tianmala
Copy link

Traceback (most recent call last):
File "scripts/generate_chatllama.py", line 82, in
args.tokenizer = str2tokenizerargs.tokenizer
File "/home/mo/llama/TencentPretrain/tencentpretrain/utils/tokenizers.py", line 255, in init
super().init(args, is_src)
File "/home/mo/llama/TencentPretrain/tencentpretrain/utils/tokenizers.py", line 30, in init
self.sp_model.Load(spm_model_path)
File "/home/mo/miniconda3/envs/llm_env/lib/python3.8/site-packages/sentencepiece/init.py", line 905, in Load
return self.LoadFromFile(model_file)
File "/home/mo/miniconda3/envs/llm_env/lib/python3.8/site-packages/sentencepiece/init.py", line 310, in LoadFromFile
return _sentencepiece.SentencePieceProcessor_LoadFromFile(self, arg)
RuntimeError: Internal: src/sentencepiece_processor.cc(1101) [model_proto->ParseFromArray(serialized.data(), serialized.size())]

我运行脚本后报错了,请问这个问题有谁遇到过嘛

@davikl
Copy link

davikl commented Apr 4, 2023

我也是求教

@rayguo01
Copy link

rayguo01 commented Apr 6, 2023

同样出错

@guanlinz
Copy link

guanlinz commented Apr 7, 2023

subscribe this issue as meet the same issue

@lylcst
Copy link

lylcst commented Apr 8, 2023

同样问题,怎么解决

@2775919186
Copy link

spm_model_file = '../ChatLLaMA-zh-7B/tokenizer.model'这个分词模型是不是损坏了?

@Data2Me
Copy link

Data2Me commented Apr 12, 2023

同样出错

@ydli-ai
Copy link
Member

ydli-ai commented Apr 12, 2023

spm_model_file = '../ChatLLaMA-zh-7B/tokenizer.model'这个分词模型是不是损坏了?

我测试了没有遇到这个问题,检查一下Sentencepiece版本? 我这里是0.1.97

@Data2Me
Copy link

Data2Me commented Apr 12, 2023

spm_model_file = '../ChatLLaMA-zh-7B/tokenizer.model'这个分词模型是不是损坏了?

我测试了没有遇到这个问题,检查一下Sentencepiece版本? 我这里是0.1.97

我这边Sentencepiece版本也是0.1.97,刚试了还是报错:
File "/opt/conda/lib/python3.10/site-packages/sentencepiece/init.py", line 310, in LoadFromFile
return _sentencepiece.SentencePieceProcessor_LoadFromFile(self, arg)
RuntimeError: Internal: src/sentencepiece_processor.cc(1101) [model_proto->ParseFromArray(serialized.data(), serialized.size())]

@Data2Me
Copy link

Data2Me commented Apr 13, 2023

spm_model_file = '../ChatLLaMA-zh-7B/tokenizer.model'这个分词模型是不是损坏了?

我测试了没有遇到这个问题,检查一下Sentencepiece版本? 我这里是0.1.97

我这边Sentencepiece版本也是0.1.97,刚试了还是报错: File "/opt/conda/lib/python3.10/site-packages/sentencepiece/init.py", line 310, in LoadFromFile return _sentencepiece.SentencePieceProcessor_LoadFromFile(self, arg) RuntimeError: Internal: src/sentencepiece_processor.cc(1101) [model_proto->ParseFromArray(serialized.data(), serialized.size())]

已解决,重新下载模型权重文件。git clone时要安装git lfs

@YYForReal
Copy link

spm_model_file = '../ChatLLaMA-zh-7B/tokenizer.model'这个分词模型是不是损坏了?

我测试了没有遇到这个问题,检查一下Sentencepiece版本? 我这里是0.1.97

我这边Sentencepiece版本也是0.1.97,刚试了还是报错: File "/opt/conda/lib/python3.10/site-packages/sentencepiece/init.py", line 310, in LoadFromFile return _sentencepiece.SentencePieceProcessor_LoadFromFile(self, arg) RuntimeError: Internal: src/sentencepiece_processor.cc(1101) [model_proto->ParseFromArray(serialized.data(), serialized.size())]

已解决,重新下载模型权重文件。git clone时要安装git lfs

安装之后下载模型权重文件速度太慢了,有什么好方法吗?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants