Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix for torch 2.1 inits #290

Merged
merged 7 commits into from
Sep 26, 2023
Merged

fix for torch 2.1 inits #290

merged 7 commits into from
Sep 26, 2023

Conversation

saurabh111233212
Copy link
Contributor

@saurabh111233212 saurabh111233212 commented Sep 26, 2023

In torch 2.1 FSDP now only calls reset_parameters() on modules which directly manage parameters (i.e. not the top level Olmo module).

Very simple fix: call olmo_model.reset_parameters() after the FSDP wrap. this recovers the loss curve from torch 2.0.1
No changes if using torch version < 2.1.0

Copy link
Member

@epwalsh epwalsh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that this does mean the model will be initialized twice for earlier versions, leading to slightly different inits for the same seed than we had before.

This isn't true given that we have the version check, right?

But for later versions, some params might be initialized twice, right? If so, is there a way to avoid that? Like force FSDP to materialize params but not apply init fn.

scripts/train.py Outdated Show resolved Hide resolved
Copy link
Member

@epwalsh epwalsh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just a suggestion to add some comments explaining what's going on

scripts/train.py Show resolved Hide resolved
@saurabh111233212 saurabh111233212 merged commit 012e97f into main Sep 26, 2023
10 checks passed
@saurabh111233212 saurabh111233212 deleted the torch2.1init branch September 26, 2023 21:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants