-
Notifications
You must be signed in to change notification settings - Fork 427
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
support qwen2 1.5b #1782
support qwen2 1.5b #1782
Conversation
ret = self.Reader(new_params, unused_params, | ||
i == self.nmgrs - 1, self.model_info()) | ||
i == self.nmgrs - 1, self.model_config) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will affect many models, like internlm2, internvl
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I understand. However, I believe it is necessary to ensure that the original model configuration is accessible to all source models. Otherwise, the model_info()
function should be capable of handling all edge cases.
TODO: full test all supported models |
ref https://qwenlm.github.io/blog/qwen2/#model-information
|
any plan to support |
|
qwen2 1.5b set
tie_word_embeddings=True
.In this case, the output layer and the token embedding layer share the same weight