Skip to content

Conversation

@emersodb
Copy link
Collaborator

@emersodb emersodb commented Jul 9, 2023

Quick PR to fix the way that Scaffold control variates are initialized. Part of this change was originally rolled into the pull request here. I realized that if we eliminated the control variate initialization of the client side (which is flawed but actually doesn't fail due to the way zip combines lists of unequal sizes) then a bug arises on the client side model initialization with the unpacking class.

This fix addresses this by making initialization of the control variates a mandatory input to the strategy and handles it thereafter.

Also fixing an issue inside modify_grad where the tensors wouldn't end up on a GPU if one was being used.

Finally, this PR also addresses a bit of a thorny issue where model state and model parameters might differ depending on the underlying model. If this is true then a lot of the underlying assumptions about scaffold break. This occurs, for example, when the layers of a model are frozen or there are state carrying layers like Batch Normalization. So we separate this notion in the code. This requires modifications to the packing class as well, as the number of control variates and the number of model state tensors is not necessarily equal.

emersodb added 2 commits July 9, 2023 10:51
…dients do not properly align. This occurs, for example, when the layers of a model are frozen or there are state carrying layers like Batch Normalization.
Copy link
Contributor

@jewelltaylor jewelltaylor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for fixing these bugs up!

@emersodb emersodb merged commit a37c159 into main Jul 12, 2023
@emersodb emersodb deleted the dbe/fix_scaffold branch July 12, 2023 12:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants