Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Linux 运行时报以下错误 #28

Closed
guangyuanyu opened this issue Jun 1, 2023 · 12 comments
Closed

Linux 运行时报以下错误 #28

guangyuanyu opened this issue Jun 1, 2023 · 12 comments

Comments

@guangyuanyu
Copy link

failed to tokenize string!

@guangyuanyu
Copy link
Author

运行环境:Centos 7
编译环境:cmake-3.26.3 gcc-11.2.1 make-4.3
模型:chatglm-6b 模型直接从huggingface下载的

@chenqy4933
Copy link
Collaborator

输入是什么内容啊?

@guangyuanyu
Copy link
Author

image

@chenqy4933
Copy link
Collaborator

image
相同的输入,我测试没有问题

@guangyuanyu
Copy link
Author

尴尬,这个怎么排查是哪的问题,有log没
ps:我在mac上编译运行可以,就是centos不行

@chenqy4933
Copy link
Collaborator

Cmake 编译一个debug版本,gdb 看一看

@guangyuanyu
Copy link
Author

好吧,我研究一下,cpp不太熟其实🤣

@guangyuanyu
Copy link
Author

我找到问题了,但是不会解,发现是词表加载那块std::map的问题,已经加载的词会丢失,不知道是不是我centos机器太老的,标准库没升级

@guangyuanyu
Copy link
Author

问题已解决,close

@Mignet
Copy link

Mignet commented Nov 26, 2023

问题已解决,close

how did you fix it? I got the error like this on centos 7:
failed to tokenize string!

[root@VM-0-15-centos build]# ./llama -m chinese-alpaca-7b-q4.bin -t 2
main: seed = 1700961097
model is new , version = 1
load: n_vocab = 49954
load: n_ctx = 2048
load: n_embd = 4096
load: n_mult = 256
load: n_head = 32
load: n_layer = 32
load: n_rot = 128
load: model ftype = 2
total weight length = 4304332800
main: interactive mode on.
sampling parameters: temp = 0.100000, top_k = 40, top_p = 0.950000, repeat_last_n = 64, repeat_penalty = 1.300000

== Running in chat mode. ==

  • Press Ctrl+C to interject at any time.
  • If you want to submit another line, end your input in ''.
    failed to tokenize string!
    Killed

@Mignet
Copy link

Mignet commented Nov 26, 2023

failed to tokenize string!
Num Type Disp Enb Address What
1 breakpoint keep y 0x000000000040dc49 in inferllm::ModelImp::tokenize(std::string const&, bool) at /root/InferLLM/src/core/model_imp.cpp:83
breakpoint already hit 1 time
(gdb) bt
#0 inferllm::ModelImp::tokenize (this=0x7191b0, text=" Below is an instruction that describes a task. Write a response that appropriately completes the request.\n\n", bos=true) at /root/InferLLM/src/core/model_imp.cpp:83
#1 0x000000000040d57c in inferllm::ModelImp::prefill (this=0x7191b0, promote=" Below is an instruction that describes a task. Write a response that appropriately completes the request.\n\n") at /root/InferLLM/src/core/model_imp.cpp:23
#2 0x0000000000409344 in inferllm::Model::prefill (this=0x719180, promote=" Below is an instruction that describes a task. Write a response that appropriately completes the request.\n\n") at /root/InferLLM/src/core/model.cpp:33
#3 0x0000000000407131 in main (argc=5, argv=0x7fffffffe268) at /root/InferLLM/application/llama/llama.cpp:216
(gdb) c
Continuing.

@cedar33
Copy link

cedar33 commented Jan 26, 2024

failed to tokenize string

请问这个最后怎么解决的啊

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants