-
Notifications
You must be signed in to change notification settings - Fork 244
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update rewritings for qwen #1351
Conversation
layer_outputs = decoder_layer( | ||
context = self.context.context | ||
max_kv_seq_length = context.max_kv_seq_length | ||
# do not support use_dynamic_ntk |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for these reasons, we could postpone the support of the feature for now and see if users really need it:
- dynamic_ntk only works on prefilling phase where input_len is greater than
seq_lenth
in config.json, which is 8192 for qwen7b - dynamic_ntk should assign different ntk_alphas for different sequences in a batch, which is complicated to implement in pytorch engine
- Qwen2 has remove
dynamic_ntk
, which means it is not so important. - Evaluations results with opencompass suggests that it won't change the results too much:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But seq_lenth for Qwen-14B is 2048
https://huggingface.co/Qwen/Qwen-14B/blob/main/config.json
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But seq_lenth for Qwen-14B is 2048 https://huggingface.co/Qwen/Qwen-14B/blob/main/config.json
Yes. you are right. How do you use lmdeploy with Qwen-14B on pytorch engine? is dynamic ntk useful for you ? Why not use Qwen2?
Motivation
update rewritings for the latest
modeling_qwen.py
as in huggingfaceModification
Please briefly describe what modification is made in this PR.
BC-breaking (Optional)
None
Use cases (Optional)
If this PR introduces a new feature, it is better to list some use cases here, and update the documentation.
Checklist