Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prepare repo for release #356

Closed
4 of 5 tasks
epwalsh opened this issue Nov 2, 2023 · 8 comments
Closed
4 of 5 tasks

Prepare repo for release #356

epwalsh opened this issue Nov 2, 2023 · 8 comments
Assignees

Comments

@epwalsh
Copy link
Member

epwalsh commented Nov 2, 2023

@Muennighoff
Copy link
Collaborator

@epwalsh Is it fine if I tackle the two remaining items in a PR? 🙂

@epwalsh
Copy link
Member Author

epwalsh commented Nov 21, 2023

@Muennighoff that would be great, go ahead.

@Muennighoff
Copy link
Collaborator

Great - did the first one here: #375

For splitting the components, I was thinking of moving the Attention & MLP out of OlmoBlock, so there'd be sth like class Attention & class MLP. To make it work with the other Blocks, I will probably also have class LlamaAttention, class LlamaMLP similar to how there are different LayerNorm classes that inherit from the same base. Is that what you were thinking of as well? @epwalsh

@epwalsh
Copy link
Member Author

epwalsh commented Nov 21, 2023

@Muennighoff separating out attention certainly makes sense from an API perspective. I think we used to have it that way, actually, but then our different block implementations became highly coupled with how attention was implemented, so we did away with that.

There might be a clean way to decouple them again, but that makes me a bit nervous because at this point we need to make sure any code changes to the model do not change:

  1. The fully qualified parameter names (i.e. model.state_dict() keys should remain the same).
  2. How FSDP shards things, since that could impact our ability to restart existing runs. Adding new submodules could affect that.

@Muennighoff
Copy link
Collaborator

I see, maybe it's not worth splitting them out then? Are there other parts you want to change to be releasable?

I find the current modeling code not that bad, but I would maybe clean up some of the comments and simplify a few things without changing the logic.

@epwalsh
Copy link
Member Author

epwalsh commented Nov 21, 2023

@Muennighoff let's try to avoid code changes unless there's low-hanging fruit that won't have side-affects on training. I think it's still worth reorganizing + doing some clean up that you're suggesting.

@Muennighoff
Copy link
Collaborator

Should we close this issue? I think the 5 points are now addressed, but let me know if not & I can work on it :)

@epwalsh
Copy link
Member Author

epwalsh commented Jan 9, 2024

I think we're good!

@epwalsh epwalsh closed this as completed Jan 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants