-
Notifications
You must be signed in to change notification settings - Fork 244
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support qwen1.5-*-AWQ model inference in turbomind #1430
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
test
lmdeploy chat turbomind Qwen/Qwen1.5-1.8B-Chat-AWQ --model-format awq
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tested lmdeploy chat turbomind Qwen/Qwen1.5-4B-Chat-AWQ --model-format awq
@irexyc @AllentDan @lvhan028 您好,请问目前运行qwen1.5的awq量化有支持并行吗?我编译了支持qwen-awq源码,启动lmdeploy后报错,单卡正常运行 多卡测试: |
@726663676 试了 |
@AllentDan 您好,我刚试了上面命令lmdeploy chat turbomind Qwen/Qwen1.5-4B-Chat-AWQ --model-format awq --tp 2也能正常运行?但是qwen1.5的1.8b和32b的awq模型没法跑通? 下面测试qwen1.5-4B-awq是正常的,其他大小的模型还是出现上面的异常: double enter to end input >>> hello <|im_start|>system |
@726663676 我也把其他规模的awq模型下载下来看看 |
test the following cases