Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Import dmoe model into other training script? #101

Open
andrewnc opened this issue Apr 2, 2024 · 3 comments
Open

Import dmoe model into other training script? #101

andrewnc opened this issue Apr 2, 2024 · 3 comments

Comments

@andrewnc
Copy link

andrewnc commented Apr 2, 2024

Is it possible to import the dmoe model itself into another training script without training via megatron?

@tgale96
Copy link
Collaborator

tgale96 commented Apr 2, 2024

Hi, yes! The Megatron-LM binding is just what I used for experiments. You should be able to use the dMoE layer from other codebases relatively easily. Is there a particular codebase you had in mind for integrating into?

Here are a couple other repos that integrate it. These both actually do something a bit more complicated than necessary because they wanted to add features specific to their frameworks.

@andrewnc
Copy link
Author

andrewnc commented Apr 2, 2024

I have an internal code base with a pretty vanilla decoder only transformer. I am hoping to swap that out with a dmoe. Thank you for these pointers - it seems like a simpler version of the nanotron example is what I'll try to implement!

@tgale96
Copy link
Collaborator

tgale96 commented Apr 2, 2024

Awesome! You should be able to just use the dMoE class directly, like you would any other layer. Let me know if you run into any issues!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants