Support qwen1.5-*-AWQ model inference in turbomind #1430

lvhan028 · 2024-04-12T09:50:42Z

test the following cases

lmdeploy chat turbomind Qwen/Qwen1.5-7B-Chat-AWQ --model-format awq

irexyc

test
lmdeploy chat turbomind Qwen/Qwen1.5-1.8B-Chat-AWQ --model-format awq

AllentDan

Tested lmdeploy chat turbomind Qwen/Qwen1.5-4B-Chat-AWQ --model-format awq

726663676 · 2024-04-15T03:10:08Z

@irexyc @AllentDan @lvhan028 您好，请问目前运行qwen1.5的awq量化有支持并行吗？我编译了支持qwen-awq源码，启动lmdeploy后报错，单卡正常运行
单卡测试lmdeploy chat turbomind Qwen/Qwen1.5-1.8B-Chat-AWQ --model-format awq正常

多卡测试：
1.启动qwen1.5-1.8b-awq
报错信息如下：
lmdeploy chat turbomind /root/.cache/modelscope/hub/qwen/Qwen1___5-1___8B-Chat-AWQ --model-format awq --tp 2
lmdeploy/turbomind/deploy/target_model/base.py", line 240, in save_split
assert tensor.shape[split_dim] % tp == 0
AssertionError
2. 启动qwen1.5-32b-awq
lmdeploy chat turbomind /root/.cache/modelscope/hub/qwen/Qwen1___5-32B-Chat-AWQ --model-format awq --tp 2
报错信息如下：
lmdeploy/turbomind/deploy/target_model/base.py", line 223, in export_weight
tm_tensor.copy_from(torch_tensor)
RuntimeError: [TM][ERROR] Assertion fail: /lmdeploy/src/turbomind/python/bind.cpp:294

AllentDan · 2024-04-15T03:23:19Z

@726663676 试了 lmdeploy chat turbomind Qwen/Qwen1.5-4B-Chat-AWQ --model-format awq --tp 2 没问题

726663676 · 2024-04-15T03:37:24Z

@AllentDan 您好，我刚试了上面命令lmdeploy chat turbomind Qwen/Qwen1.5-4B-Chat-AWQ --model-format awq --tp 2也能正常运行？但是qwen1.5的1.8b和32b的awq模型没法跑通？

下面测试qwen1.5-4B-awq是正常的，其他大小的模型还是出现上面的异常：
CUDA_VISIBLE_DEVICES=0,4 lmdeploy chat turbomind /root/.cache/modelscope/hub/qwen/Qwen1___5-4B-Chat-AWQ --model-format awq --tp 2
...
2024-04-15 03:30:26,884 - lmdeploy - WARNING - get 883 model params
2024-04-15 03:30:27,886 - lmdeploy - WARNING - Input chat template with model_name is None. Forcing to use qwen
[WARNING] gemm_config.in is not found; using default GEMM algo
[WARNING] gemm_config.in is not found; using default GEMM algo
session 1

double enter to end input >>> hello

<|im_start|>system
You are a helpful assistant.<|im_end|>
<|im_start|>user
�hello<|im_end|>
<|im_start|>assistant
2024-04-15 03:30:37,819 - lmdeploy - WARNING - kwargs ignore_eos is deprecated for inference, use GenerationConfig instead.
2024-04-15 03:30:37,819 - lmdeploy - WARNING - kwargs random_seed is deprecated for inference, use GenerationConfig instead.
Hello! How can I help you today? If you have any questions or need assistance, please feel free to ask.

lvhan028 · 2024-04-15T03:40:37Z

@726663676 我也把其他规模的awq模型下载下来看看
抱歉，我之前只验证了 7B-AWQ 的模型

lvhan028 added 2 commits April 12, 2024 14:06

add qwen2-awq

de5a4c7

fix hacking wo.bias

217e518

lvhan028 mentioned this pull request Apr 12, 2024

[Bug] lmdeploy 启动报错，rank[0] failed with error: model.layers.0.mlp.down_proj.qweight doesn't have any device set. #1422

Closed

2 tasks

lvhan028 requested review from irexyc and AllentDan April 12, 2024 10:50

lvhan028 added the enhancement New feature or request label Apr 12, 2024

irexyc approved these changes Apr 12, 2024

View reviewed changes

AllentDan approved these changes Apr 15, 2024

View reviewed changes

lvhan028 merged commit cf8938b into InternLM:main Apr 15, 2024
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support qwen1.5-*-AWQ model inference in turbomind #1430

Support qwen1.5-*-AWQ model inference in turbomind #1430

lvhan028 commented Apr 12, 2024 •

edited

irexyc left a comment

AllentDan left a comment

726663676 commented Apr 15, 2024 •

edited

AllentDan commented Apr 15, 2024

726663676 commented Apr 15, 2024

lvhan028 commented Apr 15, 2024

Support qwen1.5-*-AWQ model inference in turbomind #1430

Support qwen1.5-*-AWQ model inference in turbomind #1430

Conversation

lvhan028 commented Apr 12, 2024 • edited

irexyc left a comment

Choose a reason for hiding this comment

AllentDan left a comment

Choose a reason for hiding this comment

726663676 commented Apr 15, 2024 • edited

AllentDan commented Apr 15, 2024

726663676 commented Apr 15, 2024

lvhan028 commented Apr 15, 2024

lvhan028 commented Apr 12, 2024 •

edited

726663676 commented Apr 15, 2024 •

edited