Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

implement post training #1407

Open
wants to merge 2 commits into
base: scatter_moe
Choose a base branch
from
Open

implement post training #1407

wants to merge 2 commits into from

Conversation

ehartford
Copy link
Collaborator

Does this look right?

@casper-hansen
Copy link
Collaborator

So the purpose of post_training is to recreate the original module with the trained weights. Ideally, you would add an original_class variable to the __init__ and save that for later. Then once you are done training, post_training is called to make your checkpoint compatible with the original model - so the idea here would be that original_class=MixtralSparseMoeBlock.

@ehartford
Copy link
Collaborator Author

Is this closer?

@winglian
Copy link
Collaborator

Can you provide more details please about what this solves? Sorry that I'm not understanding from the provided changes et.

@casper-hansen
Copy link
Collaborator

@winglian this is supposed to solve the conversion back from the fused MoE into the original architecture.

@ehartford
Copy link
Collaborator Author

I am trying to help get the branch to a usable state because I wanna start training a model with it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants