Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support qwen2 1.5b #1782

Merged
merged 9 commits into from
Jun 17, 2024
Merged

support qwen2 1.5b #1782

merged 9 commits into from
Jun 17, 2024

Conversation

lvhan028
Copy link
Collaborator

qwen2 1.5b set tie_word_embeddings=True.
In this case, the output layer and the token embedding layer share the same weight

@lvhan028 lvhan028 requested review from lzhangzz and irexyc June 14, 2024 14:06
Comment on lines 172 to +173
ret = self.Reader(new_params, unused_params,
i == self.nmgrs - 1, self.model_info())
i == self.nmgrs - 1, self.model_config)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will affect many models, like internlm2, internvl

Copy link
Collaborator Author

@lvhan028 lvhan028 Jun 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I understand. However, I believe it is necessary to ensure that the original model configuration is accessible to all source models. Otherwise, the model_info() function should be capable of handling all edge cases.

@lvhan028
Copy link
Collaborator Author

TODO: full test all supported models

@zhyncs
Copy link
Collaborator

zhyncs commented Jun 15, 2024

ref https://qwenlm.github.io/blog/qwen2/#model-information

Models Qwen2-0.5B Qwen2-1.5B Qwen2-7B Qwen2-57B-A14B Qwen2-72B
Params 0.49B 1.54B 7.07B 57.41B 72.71B
Non-Emb Params 0.35B 1.31B 5.98B 56.32B 70.21B
GQA True True True True True
Tie Embedding True True False False False
Context Length 32K 32K 128K 64K 128K

For small models, we prefer the application of tying embedding as the large sparse embeddings take up a large proportion of the total model parameters.

@dawnranger
Copy link

any plan to support qwen2-0.5b? @lvhan028

@lvhan028 lvhan028 merged commit 9dcae9b into InternLM:main Jun 17, 2024
5 checks passed
@lvhan028
Copy link
Collaborator Author

any plan to support qwen2-0.5b? @lvhan028

qwen2-0.5b defines head_dim 64. But lmdeploy turbomind engine now requires head_dim 128.
So it doesn't support qwen2-0.5b now. You may use the pytorch engine for the qwen2-0.5b model

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants