RuntimeError: Internal: src/sentencepiece_processor.cc(1101) [model_proto->ParseFromArray(serialized.data(), serialized.size())] #770

yiyanxiyin · 2023-04-23T07:47:11Z

Is there an existing issue for this?

I have searched the existing issues

Current Behavior

运行python cli_demo.py报错

root@4uot40mdrplpv-0:/yx/ChatGLM-6B# python mycli_demo.py
Explicitly passing a revision is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Traceback (most recent call last):
File "/yx/ChatGLM-6B/mycli_demo.py", line 6, in
tokenizer = AutoTokenizer.from_pretrained("/yx/ChatGLM-6B/THUDM/chatglm-6b", trust_remote_code=True)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/transformers/models/auto/tokenization_auto.py", line 679, in from_pretrained
return tokenizer_class.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/transformers/tokenization_utils_base.py", line 1804, in from_pretrained
return cls._from_pretrained(
^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/transformers/tokenization_utils_base.py", line 1958, in _from_pretrained
tokenizer = cls(*init_inputs, **init_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/.cache/huggingface/modules/transformers_modules/chatglm-6b/tokenization_chatglm.py", line 205, in init
self.sp_tokenizer = SPTokenizer(vocab_file, num_image_tokens=num_image_tokens)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/.cache/huggingface/modules/transformers_modules/chatglm-6b/tokenization_chatglm.py", line 61, in init
self.text_tokenizer = TextTokenizer(vocab_file)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/.cache/huggingface/modules/transformers_modules/chatglm-6b/tokenization_chatglm.py", line 22, in init
self.sp.Load(model_path)
File "/usr/local/lib/python3.11/site-packages/sentencepiece/init.py", line 905, in Load
return self.LoadFromFile(model_file)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/sentencepiece/init.py", line 310, in LoadFromFile
return _sentencepiece.SentencePieceProcessor_LoadFromFile(self, arg)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: Internal: src/sentencepiece_processor.cc(1101) [model_proto->ParseFromArray(serialized.data(), serialized.size())]

我是在docker中运行的, 麻烦看看是怎么回事, 谢谢

Expected Behavior

No response

Steps To Reproduce

help

Environment

- OS:Red Hat 4.8.5-44
- Python:3.11
- Transformers:4.27.1
- PyTorch:2.0
- CUDA Support (`python -c "import torch; print(torch.cuda.is_available())"`) :False

Anything else?

No response

The text was updated successfully, but these errors were encountered:

duzx16 · 2023-04-23T07:57:12Z

ice_text.model 文件下载不正确，可以跟 https://huggingface.co/THUDM/chatglm-6b/blob/main/ice_text.model 对比一下

zyr-NULL · 2023-05-05T07:12:39Z

我并没有在docker中运行，但是我也遇到了这个问题。我对比了 https://huggingface.co/THUDM/chatglm-6b/blob/main/ice_text.model 中的文件，我们是一致的

lonly197 · 2023-05-05T10:27:19Z

我也遇到同样的问题，模型下载的是HuggingFace的Main分支下，源码下载的也是Main分支，启动web_demo2.py时，出现“RuntimeError: Internal: src/sentencepiece_processor.cc(1101) [model_proto->ParseFromArray(serialized.data(), serialized.size())]”

22zhangqian · 2023-05-06T06:38:21Z

我也是在微调代码的时候，运行train.sh文件遇到RuntimeError: Internal: src/sentencepiece_processor.cc(1101) [model_proto->ParseFromArray(serialized.data(),
serialized.size())]这个错误，请问是怎么回事？

MRuAyAN · 2023-05-09T09:59:25Z

ice_text.model 文件下载不正确，可以跟 https://huggingface.co/THUDM/chatglm-6b/blob/main/ice_text.model 对比一下

我发现从这个路径下载下来的ice_text.model 的sha256 与huggingface上注明的sha256不一致。是不是后期上传的时候文件上传错误了

Vincent-Huang-2000 · 2023-05-21T01:08:17Z

前面下载的 sha256 不一致的原因是：没有使用 git-lfs

# Make sure you have git-lfs installed (https://git-lfs.com)
git lfs install
git clone https://huggingface.co/THUDM/chatglm-6b

wangjiaqiys · 2023-05-23T02:07:20Z

我是用git lfs下载的模型文件目录，然后在清华云盘上下载的模型文件，导致ice_text.model不一致；
用清华云盘上的ice_text.model替换之后就好了

qq516249940 · 2023-06-13T10:27:57Z

我是用git lfs下载的模型文件目录，然后在清华云盘上下载的模型文件，导致ice_text.model不一致；用清华云盘上的ice_text.model替换之后就好了

为什么git lfs 下载的，ice_text.model 会不一样，网络原因？没有下载完全吗

Hzzhang-nlp · 2023-08-24T02:45:21Z

模型没真正克隆下来，怎么解决？

Hzzhang-nlp · 2023-08-24T03:02:12Z

就是克隆下来那个bin文件只有100多k

Hzzhang-nlp · 2023-08-24T03:02:31Z

RedNoseJJN · 2023-10-12T06:21:52Z

模型没真正克隆下来，怎么解决？

可以去清华大学云盘手动下，然后替掉
清华大学云盘

RedNoseJJN · 2023-10-12T06:56:40Z

我是用git lfs下载的模型文件目录，然后在清华云盘上下载的模型文件，导致ice_text.model不一致；
用清华云盘上的ice_text.model替换之后就好了

替换之后还是不行，有idea是怎么回事吗

shunyuchu mentioned this issue Jun 6, 2023

utoModel.from_pretrained("../visualglm-6b", trust_remote_code=True).half().cuda() error THUDM/VisualGLM-6B#74

Open

StevenyzZhang mentioned this issue Jul 20, 2023

Internal: src/sentencepiece_processor.cc(1101) [model_proto->ParseFromArray(serialized.data(), serialized.size())] SALT-NLP/LLaVAR#6

Closed

zhangch9 closed this as completed Aug 16, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RuntimeError: Internal: src/sentencepiece_processor.cc(1101) [model_proto->ParseFromArray(serialized.data(), serialized.size())] #770

RuntimeError: Internal: src/sentencepiece_processor.cc(1101) [model_proto->ParseFromArray(serialized.data(), serialized.size())] #770

yiyanxiyin commented Apr 23, 2023

duzx16 commented Apr 23, 2023

zyr-NULL commented May 5, 2023

lonly197 commented May 5, 2023

22zhangqian commented May 6, 2023

MRuAyAN commented May 9, 2023

Vincent-Huang-2000 commented May 21, 2023

wangjiaqiys commented May 23, 2023 •

edited

qq516249940 commented Jun 13, 2023

Hzzhang-nlp commented Aug 24, 2023

Hzzhang-nlp commented Aug 24, 2023

Hzzhang-nlp commented Aug 24, 2023

RedNoseJJN commented Oct 12, 2023

RedNoseJJN commented Oct 12, 2023

RuntimeError: Internal: src/sentencepiece_processor.cc(1101) [model_proto->ParseFromArray(serialized.data(), serialized.size())] #770

RuntimeError: Internal: src/sentencepiece_processor.cc(1101) [model_proto->ParseFromArray(serialized.data(), serialized.size())] #770

Comments

yiyanxiyin commented Apr 23, 2023

Is there an existing issue for this?

Current Behavior

Expected Behavior

Steps To Reproduce

Environment

Anything else?

duzx16 commented Apr 23, 2023

zyr-NULL commented May 5, 2023

lonly197 commented May 5, 2023

22zhangqian commented May 6, 2023

MRuAyAN commented May 9, 2023

Vincent-Huang-2000 commented May 21, 2023

wangjiaqiys commented May 23, 2023 • edited

qq516249940 commented Jun 13, 2023

Hzzhang-nlp commented Aug 24, 2023

Hzzhang-nlp commented Aug 24, 2023

Hzzhang-nlp commented Aug 24, 2023

RedNoseJJN commented Oct 12, 2023

RedNoseJJN commented Oct 12, 2023

wangjiaqiys commented May 23, 2023 •

edited