-
Notifications
You must be signed in to change notification settings - Fork 243
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support qwen1.5 in turbomind engine #1406
Conversation
Hi @lvhan028 Does this pr support Qwen1.5-0.5B and Qwen1.5-1.8B? Thanks. |
0.5B no, since its head_dim is 64 while the turbomind engine hardcode the head_dim 128 |
Thanks for your reply. Maybe we could also update the supported table. Thanks. |
sure. updated. |
opecompass results look OK
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Have you tried |
not yet. I'll test it asap |
lmdeploy convert qwen <qwen1.5-model-path> |
|
What's the command? |
|
@lvhan028 dose this support qwen1.5-14b-chat int8 kv cache ?
|
hi, @xiaoxiaoyuwen |
Note: window attention is not supported.