Prepare repo for release #356

epwalsh · 2023-11-02T18:24:26Z

consolidate Python configs into pyproject.toml, other clean up #353
remove data team's stuff #357
Code cleanup: move torch-related util functions out of util.py and into their own module. (Separate torch utils #375)
Code cleanup: split up components in model.py into submodules. (Small simplifications #377)
Fix stale links in README, scripts cleanup #359

The text was updated successfully, but these errors were encountered:

Muennighoff · 2023-11-21T02:48:56Z

@epwalsh Is it fine if I tackle the two remaining items in a PR? 🙂

epwalsh · 2023-11-21T03:53:19Z

@Muennighoff that would be great, go ahead.

Muennighoff · 2023-11-21T20:22:34Z

Great - did the first one here: #375

For splitting the components, I was thinking of moving the Attention & MLP out of OlmoBlock, so there'd be sth like class Attention & class MLP. To make it work with the other Blocks, I will probably also have class LlamaAttention, class LlamaMLP similar to how there are different LayerNorm classes that inherit from the same base. Is that what you were thinking of as well? @epwalsh

epwalsh · 2023-11-21T22:14:52Z

@Muennighoff separating out attention certainly makes sense from an API perspective. I think we used to have it that way, actually, but then our different block implementations became highly coupled with how attention was implemented, so we did away with that.

There might be a clean way to decouple them again, but that makes me a bit nervous because at this point we need to make sure any code changes to the model do not change:

The fully qualified parameter names (i.e. model.state_dict() keys should remain the same).
How FSDP shards things, since that could impact our ability to restart existing runs. Adding new submodules could affect that.

Muennighoff · 2023-11-21T23:03:30Z

I see, maybe it's not worth splitting them out then? Are there other parts you want to change to be releasable?

I find the current modeling code not that bad, but I would maybe clean up some of the comments and simplify a few things without changing the logic.

epwalsh · 2023-11-21T23:15:42Z

@Muennighoff let's try to avoid code changes unless there's low-hanging fruit that won't have side-affects on training. I think it's still worth reorganizing + doing some clean up that you're suggesting.

Muennighoff · 2024-01-07T23:58:46Z

Should we close this issue? I think the 5 points are now addressed, but let me know if not & I can work on it :)

epwalsh · 2024-01-09T00:22:18Z

I think we're good!

epwalsh self-assigned this Nov 2, 2023

epwalsh mentioned this issue Nov 2, 2023

consolidate Python configs into pyproject.toml, other clean up #353

Merged

Muennighoff mentioned this issue Nov 22, 2023

Small simplifications #377

Merged

epwalsh closed this as completed Jan 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prepare repo for release #356

Prepare repo for release #356

epwalsh commented Nov 2, 2023 •

edited by Muennighoff

Loading

Muennighoff commented Nov 21, 2023

epwalsh commented Nov 21, 2023

Muennighoff commented Nov 21, 2023

epwalsh commented Nov 21, 2023

Muennighoff commented Nov 21, 2023

epwalsh commented Nov 21, 2023

Muennighoff commented Jan 7, 2024

epwalsh commented Jan 9, 2024

Prepare repo for release #356

Prepare repo for release #356

Comments

epwalsh commented Nov 2, 2023 • edited by Muennighoff Loading

Muennighoff commented Nov 21, 2023

epwalsh commented Nov 21, 2023

Muennighoff commented Nov 21, 2023

epwalsh commented Nov 21, 2023

Muennighoff commented Nov 21, 2023

epwalsh commented Nov 21, 2023

Muennighoff commented Jan 7, 2024

epwalsh commented Jan 9, 2024

epwalsh commented Nov 2, 2023 •

edited by Muennighoff

Loading