-
Notifications
You must be signed in to change notification settings - Fork 389
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug] internlm2 不能使用llama.cpp量化转换 #612
Comments
The architecture of InternLM2 is different from InternLM. The former adopts GQA and has no attention bias. Our project LMDeploy has supported InternLM2, including 200K context length inference and 4bit inference. The offline inference is pretty simple. You may give it a try. import lmdeploy
pipe = lmdeploy.pipeline("internlm/internlm-chat-7b")
response = pipe(["Hi, pls intro yourself", "Shanghai is"])
print(response) |
thanks for the explanation. Will look for fixing in llama.cpp. |
@gaord I fixed the first convert error by simply skipping the non-existent lazyTensors in convert.py
After the above two fixes, I suppose I hit the architecture not supporting issue. |
Use tools/convert2llama.py to convert your keys into Llama format |
At today ,the InternLM release a tool that you can covert the model to llama architecture. x python convert.py ./build/models/internlm/target
Loading model file build/models/internlm/target/pytorch_model-00001-of-00008.bin
Loading model file build/models/internlm/target/pytorch_model-00001-of-00008.bin
Loading model file build/models/internlm/target/pytorch_model-00002-of-00008.bin
Loading model file build/models/internlm/target/pytorch_model-00003-of-00008.bin
Loading model file build/models/internlm/target/pytorch_model-00004-of-00008.bin
Loading model file build/models/internlm/target/pytorch_model-00005-of-00008.bin
Loading model file build/models/internlm/target/pytorch_model-00006-of-00008.bin
Loading model file build/models/internlm/target/pytorch_model-00007-of-00008.bin
Loading model file build/models/internlm/target/pytorch_model-00008-of-00008.bin
Traceback (most recent call last):
File "/Users/xiaobai/dev/llama.cpp/convert.py", line 1295, in <module>
main()
File "/Users/xiaobai/dev/llama.cpp/convert.py", line 1234, in main
params = Params.load(model_plus)
^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/xiaobai/dev/llama.cpp/convert.py", line 318, in load
params = Params.loadHFTransformerJson(model_plus.model, hf_config_path)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/xiaobai/dev/llama.cpp/convert.py", line 230, in loadHFTransformerJson
raise NotImplementedError(f'Unknown rope scaling type: {typ}')
NotImplementedError: Unknown rope scaling type: dynamic |
discussion is here on the other side |
You can try manually converting the configuration (i.e., the There is a temporary solution:
To completely solve this issue, it will be necessary to update the code in 可以尝试手动将配置(即config.json中的rope_scaling这个参数)转换为null。转换之后还会遇到另一个问题是,llama.cpp不支持\u0000这个token,会导致 https://github.com/ggerganov/llama.cpp/blob/77bc1bbd05f0c31cb45773eb5eb59b9ff2b07e1b/llama.cpp#L3005 这个位置的代码无法通过assert。 有个临时解决方案是,
为了彻底解决这个问题,应该需要更新以下llama.cpp中的代码,我们之后会尝试发起pr来彻底修复一下这个问题 |
as workaround hinted above, just uploaded the converted gruff file (internlm2-chat-20b-no-ropescaling-q5_0.gguf) for a fresh try: |
This issue is marked as stale because it has been marked as invalid or awaiting response for 7 days without any further response. It will be closed in 7 days if the stale label is not removed or if there is no further response. |
Resolved in #627 after converting the model to llama format. |
Describe the bug
使用最新的llama.cpp代码(b1874),转化模型时报错:
python3 convert.py ../internlm2-chat-20b --outtype f16
/Users/pom/AIGC/llama.cpp-3/gguf-py
Loading model file ../internlm2-chat-20b/pytorch_model-00001-of-00021.bin
Loading model file ../internlm2-chat-20b/pytorch_model-00001-of-00021.bin
Loading model file ../internlm2-chat-20b/pytorch_model-00002-of-00021.bin
Loading model file ../internlm2-chat-20b/pytorch_model-00003-of-00021.bin
Loading model file ../internlm2-chat-20b/pytorch_model-00004-of-00021.bin
Loading model file ../internlm2-chat-20b/pytorch_model-00005-of-00021.bin
Loading model file ../internlm2-chat-20b/pytorch_model-00006-of-00021.bin
Loading model file ../internlm2-chat-20b/pytorch_model-00007-of-00021.bin
Loading model file ../internlm2-chat-20b/pytorch_model-00008-of-00021.bin
Loading model file ../internlm2-chat-20b/pytorch_model-00009-of-00021.bin
Loading model file ../internlm2-chat-20b/pytorch_model-00010-of-00021.bin
Loading model file ../internlm2-chat-20b/pytorch_model-00011-of-00021.bin
Loading model file ../internlm2-chat-20b/pytorch_model-00012-of-00021.bin
Loading model file ../internlm2-chat-20b/pytorch_model-00013-of-00021.bin
Loading model file ../internlm2-chat-20b/pytorch_model-00014-of-00021.bin
Loading model file ../internlm2-chat-20b/pytorch_model-00015-of-00021.bin
Loading model file ../internlm2-chat-20b/pytorch_model-00016-of-00021.bin
Loading model file ../internlm2-chat-20b/pytorch_model-00017-of-00021.bin
Loading model file ../internlm2-chat-20b/pytorch_model-00018-of-00021.bin
Loading model file ../internlm2-chat-20b/pytorch_model-00019-of-00021.bin
Loading model file ../internlm2-chat-20b/pytorch_model-00020-of-00021.bin
Loading model file ../internlm2-chat-20b/pytorch_model-00021-of-00021.bin
Traceback (most recent call last):
File "/Users/pom/AIGC/llama.cpp-3/convert.py", line 1658, in
main(sys.argv[1:]) # Exclude the first element (script name) from sys.argv
^^^^^^^^^^^^^^^^^^
File "/Users/pom/AIGC/llama.cpp-3/convert.py", line 1577, in main
model_plus = load_some_model(args.model)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/pom/AIGC/llama.cpp-3/convert.py", line 1354, in load_some_model
model_plus = merge_multifile_models(models_plus)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/pom/AIGC/llama.cpp-3/convert.py", line 782, in merge_multifile_models
model = merge_sharded([mp.model for mp in models_plus])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/pom/AIGC/llama.cpp-3/convert.py", line 761, in merge_sharded
return {name: convert(name) for name in names}
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/pom/AIGC/llama.cpp-3/convert.py", line 761, in
return {name: convert(name) for name in names}
^^^^^^^^^^^^^
File "/Users/pom/AIGC/llama.cpp-3/convert.py", line 736, in convert
lazy_tensors: list[LazyTensor] = [model[name] for model in models]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/pom/AIGC/llama.cpp-3/convert.py", line 736, in
lazy_tensors: list[LazyTensor] = [model[name] for model in models]
~~~~~^^^^^^
KeyError: 'model.tok_embeddings.weight'
Environment
Mac m2 ultra
pytorch-lightning 2.1.0
torch 2.1.2
torchaudio 2.1.0
torchmetrics 1.2.0
torchvision 0.16.0
Other information
No response
The text was updated successfully, but these errors were encountered: