-
Notifications
You must be signed in to change notification settings - Fork 5.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RuntimeError: Internal: src/sentencepiece_processor.cc(1101) [model_proto->ParseFromArray(serialized.data(), serialized.size())] #770
Comments
|
我并没有在docker中运行,但是我也遇到了这个问题。我对比了 https://huggingface.co/THUDM/chatglm-6b/blob/main/ice_text.model 中的文件,我们是一致的 |
我也遇到同样的问题,模型下载的是HuggingFace的Main分支下,源码下载的也是Main分支,启动web_demo2.py时,出现“RuntimeError: Internal: src/sentencepiece_processor.cc(1101) [model_proto->ParseFromArray(serialized.data(), serialized.size())]” |
我也是在微调代码的时候,运行train.sh文件遇到RuntimeError: Internal: src/sentencepiece_processor.cc(1101) [model_proto->ParseFromArray(serialized.data(), |
我发现从这个路径下载下来的ice_text.model 的sha256 与huggingface上注明的sha256不一致。是不是后期上传的时候文件上传错误了 |
前面下载的 sha256 不一致的原因是:没有使用 git-lfs
|
我是用git lfs下载的模型文件目录,然后在清华云盘上下载的模型文件,导致ice_text.model不一致; |
为什么git lfs 下载的,ice_text.model 会不一样,网络原因?没有下载完全吗 |
模型没真正克隆下来,怎么解决? |
就是克隆下来那个bin文件只有100多k |
可以去清华大学云盘手动下,然后替掉 |
替换之后还是不行,有idea是怎么回事吗 |
Is there an existing issue for this?
Current Behavior
运行python cli_demo.py报错
root@4uot40mdrplpv-0:/yx/ChatGLM-6B# python mycli_demo.py
Explicitly passing a
revision
is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.Traceback (most recent call last):
File "/yx/ChatGLM-6B/mycli_demo.py", line 6, in
tokenizer = AutoTokenizer.from_pretrained("/yx/ChatGLM-6B/THUDM/chatglm-6b", trust_remote_code=True)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/transformers/models/auto/tokenization_auto.py", line 679, in from_pretrained
return tokenizer_class.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/transformers/tokenization_utils_base.py", line 1804, in from_pretrained
return cls._from_pretrained(
^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/transformers/tokenization_utils_base.py", line 1958, in _from_pretrained
tokenizer = cls(*init_inputs, **init_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/.cache/huggingface/modules/transformers_modules/chatglm-6b/tokenization_chatglm.py", line 205, in init
self.sp_tokenizer = SPTokenizer(vocab_file, num_image_tokens=num_image_tokens)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/.cache/huggingface/modules/transformers_modules/chatglm-6b/tokenization_chatglm.py", line 61, in init
self.text_tokenizer = TextTokenizer(vocab_file)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/.cache/huggingface/modules/transformers_modules/chatglm-6b/tokenization_chatglm.py", line 22, in init
self.sp.Load(model_path)
File "/usr/local/lib/python3.11/site-packages/sentencepiece/init.py", line 905, in Load
return self.LoadFromFile(model_file)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/sentencepiece/init.py", line 310, in LoadFromFile
return _sentencepiece.SentencePieceProcessor_LoadFromFile(self, arg)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: Internal: src/sentencepiece_processor.cc(1101) [model_proto->ParseFromArray(serialized.data(), serialized.size())]
我是在docker中运行的, 麻烦看看是怎么回事, 谢谢
Expected Behavior
No response
Steps To Reproduce
help
Environment
Anything else?
No response
The text was updated successfully, but these errors were encountered: