Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support qwen1.5 in turbomind engine #1406

Merged
merged 12 commits into from
Apr 9, 2024
Merged

Conversation

lvhan028
Copy link
Collaborator

@lvhan028 lvhan028 commented Apr 7, 2024

Note: window attention is not supported.

@zhyncs
Copy link
Contributor

zhyncs commented Apr 8, 2024

Hi @lvhan028 Does this pr support Qwen1.5-0.5B and Qwen1.5-1.8B? Thanks.

@lvhan028
Copy link
Collaborator Author

lvhan028 commented Apr 8, 2024

0.5B no, since its head_dim is 64 while the turbomind engine hardcode the head_dim 128
1.8B Yes.

@zhyncs
Copy link
Contributor

zhyncs commented Apr 8, 2024

0.5B no, since its head_dim is 64 while the turbomind engine hardcode the head_dim 128 1.8B Yes.

Thanks for your reply. Maybe we could also update the supported table. Thanks.

@lvhan028
Copy link
Collaborator Author

lvhan028 commented Apr 8, 2024

0.5B no, since its head_dim is 64 while the turbomind engine hardcode the head_dim 128 1.8B Yes.

Thanks for your reply. Maybe we could also update the supported table. Thanks.

sure. updated.

@lvhan028 lvhan028 added the enhancement New feature or request label Apr 8, 2024
@RunningLeon
Copy link
Collaborator

opecompass results look OK

模型 mmlu gsm8k
qwen1.5-7b-chat-HF 61.48 55.65
qwen1.5-7b-chat-TB 61.47 54.74

Copy link
Collaborator

@RunningLeon RunningLeon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@irexyc
Copy link
Collaborator

irexyc commented Apr 8, 2024

Have you tried lmdeploy convert command to convert the model?

@lvhan028
Copy link
Collaborator Author

lvhan028 commented Apr 8, 2024

Have you tried lmdeploy convert command to convert the model?

not yet. I'll test it asap

@lvhan028
Copy link
Collaborator Author

lvhan028 commented Apr 8, 2024

lmdeploy convert qwen <qwen1.5-model-path>

@irexyc
Copy link
Collaborator

irexyc commented Apr 9, 2024

Traceback (most recent call last):
  File "/home/chenxin/miniconda3/envs/38/bin/lmdeploy", line 33, in <module>
    sys.exit(load_entry_point('lmdeploy', 'console_scripts', 'lmdeploy')())
  File "/home/chenxin/ws3/vl/lmdeploy/cli/entrypoint.py", line 26, in run
    args.run(args)
  File "/home/chenxin/ws3/vl/lmdeploy/cli/cli.py", line 151, in convert
    main(**kwargs)
  File "/home/chenxin/ws3/vl/lmdeploy/turbomind/deploy/converter.py", line 265, in main
    tokenizer_path = get_tokenizer_path(model_path, tokenizer_path)
  File "/home/chenxin/ws3/vl/lmdeploy/turbomind/deploy/converter.py", line 45, in get_tokenizer_path
    assert tokenizer_path, 'please supply tokenizer path by --tokenizer-path'
AssertionError: please supply tokenizer path by --tokenizer-path

@lvhan028
Copy link
Collaborator Author

lvhan028 commented Apr 9, 2024

Traceback (most recent call last):
  File "/home/chenxin/miniconda3/envs/38/bin/lmdeploy", line 33, in <module>
    sys.exit(load_entry_point('lmdeploy', 'console_scripts', 'lmdeploy')())
  File "/home/chenxin/ws3/vl/lmdeploy/cli/entrypoint.py", line 26, in run
    args.run(args)
  File "/home/chenxin/ws3/vl/lmdeploy/cli/cli.py", line 151, in convert
    main(**kwargs)
  File "/home/chenxin/ws3/vl/lmdeploy/turbomind/deploy/converter.py", line 265, in main
    tokenizer_path = get_tokenizer_path(model_path, tokenizer_path)
  File "/home/chenxin/ws3/vl/lmdeploy/turbomind/deploy/converter.py", line 45, in get_tokenizer_path
    assert tokenizer_path, 'please supply tokenizer path by --tokenizer-path'
AssertionError: please supply tokenizer path by --tokenizer-path

What's the command?

@irexyc
Copy link
Collaborator

irexyc commented Apr 9, 2024

What's the command?

lmdeploy convert qwen /mnt/140/Qwen/Qwen1.5-7B-Chat/

@lvhan028 lvhan028 merged commit edca3d3 into InternLM:main Apr 9, 2024
3 of 5 checks passed
@xiaoxiaoyuwen
Copy link

xiaoxiaoyuwen commented Apr 10, 2024

@lvhan028 dose this support qwen1.5-14b-chat int8 kv cache ?
I run the calibrate cause error

RuntimeError: Currently, quantification and calibration of Qwen2ForCausalLM are not supported.

@lvhan028
Copy link
Collaborator Author

hi, @xiaoxiaoyuwen
we are going to remove offline kv int8 and adopt online kv int8 in PR #1377
PR #1412 provides the guide.
This feature will be released in the next version. Stay tuned.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants