Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support qwen1.5-*-AWQ model inference in turbomind #1430

Merged
merged 2 commits into from
Apr 15, 2024

Conversation

lvhan028
Copy link
Collaborator

@lvhan028 lvhan028 commented Apr 12, 2024

test the following cases

  • lmdeploy chat turbomind Qwen/Qwen1.5-7B-Chat-AWQ --model-format awq

Copy link
Collaborator

@irexyc irexyc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test
lmdeploy chat turbomind Qwen/Qwen1.5-1.8B-Chat-AWQ --model-format awq

Copy link
Collaborator

@AllentDan AllentDan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested lmdeploy chat turbomind Qwen/Qwen1.5-4B-Chat-AWQ --model-format awq

@lvhan028 lvhan028 merged commit cf8938b into InternLM:main Apr 15, 2024
5 checks passed
@726663676
Copy link

726663676 commented Apr 15, 2024

@irexyc @AllentDan @lvhan028 您好,请问目前运行qwen1.5的awq量化有支持并行吗?我编译了支持qwen-awq源码,启动lmdeploy后报错,单卡正常运行
单卡测试lmdeploy chat turbomind Qwen/Qwen1.5-1.8B-Chat-AWQ --model-format awq正常

多卡测试:
1.启动qwen1.5-1.8b-awq
报错信息如下:
lmdeploy chat turbomind /root/.cache/modelscope/hub/qwen/Qwen1___5-1___8B-Chat-AWQ --model-format awq --tp 2
lmdeploy/turbomind/deploy/target_model/base.py", line 240, in save_split
assert tensor.shape[split_dim] % tp == 0
AssertionError
2. 启动qwen1.5-32b-awq
lmdeploy chat turbomind /root/.cache/modelscope/hub/qwen/Qwen1___5-32B-Chat-AWQ --model-format awq --tp 2
报错信息如下:
lmdeploy/turbomind/deploy/target_model/base.py", line 223, in export_weight
tm_tensor.copy_from(torch_tensor)
RuntimeError: [TM][ERROR] Assertion fail: /lmdeploy/src/turbomind/python/bind.cpp:294

@AllentDan
Copy link
Collaborator

@726663676 试了 lmdeploy chat turbomind Qwen/Qwen1.5-4B-Chat-AWQ --model-format awq --tp 2 没问题

@726663676
Copy link

@AllentDan 您好,我刚试了上面命令lmdeploy chat turbomind Qwen/Qwen1.5-4B-Chat-AWQ --model-format awq --tp 2也能正常运行?但是qwen1.5的1.8b和32b的awq模型没法跑通?

下面测试qwen1.5-4B-awq是正常的,其他大小的模型还是出现上面的异常:
CUDA_VISIBLE_DEVICES=0,4 lmdeploy chat turbomind /root/.cache/modelscope/hub/qwen/Qwen1___5-4B-Chat-AWQ --model-format awq --tp 2
...
2024-04-15 03:30:26,884 - lmdeploy - WARNING - get 883 model params
2024-04-15 03:30:27,886 - lmdeploy - WARNING - Input chat template with model_name is None. Forcing to use qwen
[WARNING] gemm_config.in is not found; using default GEMM algo
[WARNING] gemm_config.in is not found; using default GEMM algo
session 1

double enter to end input >>> hello

<|im_start|>system
You are a helpful assistant.<|im_end|>
<|im_start|>user
�hello<|im_end|>
<|im_start|>assistant
2024-04-15 03:30:37,819 - lmdeploy - WARNING - kwargs ignore_eos is deprecated for inference, use GenerationConfig instead.
2024-04-15 03:30:37,819 - lmdeploy - WARNING - kwargs random_seed is deprecated for inference, use GenerationConfig instead.
Hello! How can I help you today? If you have any questions or need assistance, please feel free to ask.

@lvhan028
Copy link
Collaborator Author

@726663676 我也把其他规模的awq模型下载下来看看
抱歉,我之前只验证了 7B-AWQ 的模型

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants