We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
We should add support for mutransfer: https://github.com/microsoft/mup
Appears non-trivial, but not as difficult as MoE. We'd have to modify the model itself. https://github.com/microsoft/mup/blob/main/examples/Transformer/model.py appears especially relevant. A good workflow would be:
gpt-neox/megatron/model/
transformer.py
gpt-neox/megatron/optimizers.py
gpt-neox/megatron/training.py
The text was updated successfully, but these errors were encountered:
nsarka/mup-support has my changes for this so far. I haven't tested it.
nsarka/mup-support
There's one more thing to add to this list. Mup can generate a plot that's helpful for checking the correctness of the implementation. https://github.com/microsoft/mup#checking-correctness-of-parametrization
Sorry, something went wrong.
nsarka/mup-support has my changes for this so far. I haven't tested it. There's one more thing to add to this list. Mup can generate a plot that's helpful for checking the correctness of the implementation. https://github.com/microsoft/mup#checking-correctness-of-parametrization Add args for saving coord check plot
Great work! Thank you for this contribution ^_^
Don’t forget to add yourself as a library contributor in the readme as well 😉
Thanks Stella! I added myself as a contributor in the draft PR here #704 :)
Closed as completed by #704
nsarka
No branches or pull requests
We should add support for mutransfer: https://github.com/microsoft/mup
Appears non-trivial, but not as difficult as MoE. We'd have to modify the model itself. https://github.com/microsoft/mup/blob/main/examples/Transformer/model.py appears especially relevant. A good workflow would be:
gpt-neox/megatron/model/
to use mup. Probably mostly intransformer.py
gpt-neox/megatron/optimizers.py
gpt-neox/megatron/training.py
to allow previous features to be selected during trainingThe text was updated successfully, but these errors were encountered: