Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update rewritings for qwen #1351

Merged
merged 2 commits into from
Mar 28, 2024
Merged

Update rewritings for qwen #1351

merged 2 commits into from
Mar 28, 2024

Conversation

RunningLeon
Copy link
Collaborator

Motivation

update rewritings for the latest modeling_qwen.py as in huggingface

Modification

Please briefly describe what modification is made in this PR.

BC-breaking (Optional)

None

Use cases (Optional)

If this PR introduces a new feature, it is better to list some use cases here, and update the documentation.

Checklist

  1. Pre-commit or other linting tools are used to fix the potential lint issues.
  2. The modification is covered by complete unit tests. If not, please add more unit tests to ensure the correctness.
  3. If the modification has a dependency on downstream projects of a newer version, this PR should be tested with all supported versions of downstream projects.
  4. The documentation has been modified accordingly, like docstring or example tutorials.

layer_outputs = decoder_layer(
context = self.context.context
max_kv_seq_length = context.max_kv_seq_length
# do not support use_dynamic_ntk
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for these reasons, we could postpone the support of the feature for now and see if users really need it:

  1. dynamic_ntk only works on prefilling phase where input_len is greater than seq_lenth in config.json, which is 8192 for qwen7b
  2. dynamic_ntk should assign different ntk_alphas for different sequences in a batch, which is complicated to implement in pytorch engine
  3. Qwen2 has remove dynamic_ntk, which means it is not so important.
  4. Evaluations results with opencompass suggests that it won't change the results too much:
1711513343289

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But seq_lenth for Qwen-14B is 2048 https://huggingface.co/Qwen/Qwen-14B/blob/main/config.json

Yes. you are right. How do you use lmdeploy with Qwen-14B on pytorch engine? is dynamic ntk useful for you ? Why not use Qwen2?

@lvhan028 lvhan028 merged commit 69207f0 into InternLM:main Mar 28, 2024
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants