Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError: Internal: src/sentencepiece_processor.cc(1101) [model_proto->ParseFromArray(serialized.data(), serialized.size())] #770

Closed
1 task done
yiyanxiyin opened this issue Apr 23, 2023 · 13 comments

Comments

@yiyanxiyin
Copy link

Is there an existing issue for this?

  • I have searched the existing issues

Current Behavior

运行python cli_demo.py报错

root@4uot40mdrplpv-0:/yx/ChatGLM-6B# python mycli_demo.py
Explicitly passing a revision is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Traceback (most recent call last):
File "/yx/ChatGLM-6B/mycli_demo.py", line 6, in
tokenizer = AutoTokenizer.from_pretrained("/yx/ChatGLM-6B/THUDM/chatglm-6b", trust_remote_code=True)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/transformers/models/auto/tokenization_auto.py", line 679, in from_pretrained
return tokenizer_class.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/transformers/tokenization_utils_base.py", line 1804, in from_pretrained
return cls._from_pretrained(
^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/transformers/tokenization_utils_base.py", line 1958, in _from_pretrained
tokenizer = cls(*init_inputs, **init_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/.cache/huggingface/modules/transformers_modules/chatglm-6b/tokenization_chatglm.py", line 205, in init
self.sp_tokenizer = SPTokenizer(vocab_file, num_image_tokens=num_image_tokens)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/.cache/huggingface/modules/transformers_modules/chatglm-6b/tokenization_chatglm.py", line 61, in init
self.text_tokenizer = TextTokenizer(vocab_file)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/.cache/huggingface/modules/transformers_modules/chatglm-6b/tokenization_chatglm.py", line 22, in init
self.sp.Load(model_path)
File "/usr/local/lib/python3.11/site-packages/sentencepiece/init.py", line 905, in Load
return self.LoadFromFile(model_file)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/sentencepiece/init.py", line 310, in LoadFromFile
return _sentencepiece.SentencePieceProcessor_LoadFromFile(self, arg)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: Internal: src/sentencepiece_processor.cc(1101) [model_proto->ParseFromArray(serialized.data(), serialized.size())]

我是在docker中运行的, 麻烦看看是怎么回事, 谢谢

Expected Behavior

No response

Steps To Reproduce

help

Environment

- OS:Red Hat 4.8.5-44
- Python:3.11
- Transformers:4.27.1
- PyTorch:2.0
- CUDA Support (`python -c "import torch; print(torch.cuda.is_available())"`) :False

Anything else?

No response

@duzx16
Copy link
Member

duzx16 commented Apr 23, 2023

ice_text.model 文件下载不正确,可以跟 https://huggingface.co/THUDM/chatglm-6b/blob/main/ice_text.model 对比一下

@zyr-NULL
Copy link

zyr-NULL commented May 5, 2023

我并没有在docker中运行,但是我也遇到了这个问题。我对比了 https://huggingface.co/THUDM/chatglm-6b/blob/main/ice_text.model 中的文件,我们是一致的

@lonly197
Copy link

lonly197 commented May 5, 2023

我也遇到同样的问题,模型下载的是HuggingFace的Main分支下,源码下载的也是Main分支,启动web_demo2.py时,出现“RuntimeError: Internal: src/sentencepiece_processor.cc(1101) [model_proto->ParseFromArray(serialized.data(), serialized.size())]”

@22zhangqian
Copy link

我也是在微调代码的时候,运行train.sh文件遇到RuntimeError: Internal: src/sentencepiece_processor.cc(1101) [model_proto->ParseFromArray(serialized.data(),
serialized.size())]这个错误,请问是怎么回事?

@MRuAyAN
Copy link

MRuAyAN commented May 9, 2023

ice_text.model 文件下载不正确,可以跟 https://huggingface.co/THUDM/chatglm-6b/blob/main/ice_text.model 对比一下

我发现从这个路径下载下来的ice_text.model 的sha256 与huggingface上注明的sha256不一致。是不是后期上传的时候文件上传错误了

@Vincent-Huang-2000
Copy link

前面下载的 sha256 不一致的原因是:没有使用 git-lfs

# Make sure you have git-lfs installed (https://git-lfs.com)
git lfs install
git clone https://huggingface.co/THUDM/chatglm-6b

@wangjiaqiys
Copy link

wangjiaqiys commented May 23, 2023

我是用git lfs下载的模型文件目录,然后在清华云盘上下载的模型文件,导致ice_text.model不一致;
用清华云盘上的ice_text.model替换之后就好了

@qq516249940
Copy link

我是用git lfs下载的模型文件目录,然后在清华云盘上下载的模型文件,导致ice_text.model不一致; 用清华云盘上的ice_text.model替换之后就好了

为什么git lfs 下载的,ice_text.model 会不一样,网络原因?没有下载完全吗

@Hzzhang-nlp
Copy link

模型没真正克隆下来,怎么解决?

@Hzzhang-nlp
Copy link

就是克隆下来那个bin文件只有100多k

@Hzzhang-nlp
Copy link

image

@RedNoseJJN
Copy link

模型没真正克隆下来,怎么解决?

可以去清华大学云盘手动下,然后替掉
清华大学云盘

@RedNoseJJN
Copy link

我是用git lfs下载的模型文件目录,然后在清华云盘上下载的模型文件,导致ice_text.model不一致;
用清华云盘上的ice_text.model替换之后就好了

替换之后还是不行,有idea是怎么回事吗

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

13 participants