First shot at fixing the server side parameter initialization issue. #37
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Quick PR to fix the way that Scaffold control variates are initialized. Part of this change was originally rolled into the pull request here. I realized that if we eliminated the control variate initialization of the client side (which is flawed but actually doesn't fail due to the way
zipcombines lists of unequal sizes) then a bug arises on the client side model initialization with the unpacking class.This fix addresses this by making initialization of the control variates a mandatory input to the strategy and handles it thereafter.
Also fixing an issue inside modify_grad where the tensors wouldn't end up on a GPU if one was being used.
Finally, this PR also addresses a bit of a thorny issue where model state and model parameters might differ depending on the underlying model. If this is true then a lot of the underlying assumptions about scaffold break. This occurs, for example, when the layers of a model are frozen or there are state carrying layers like Batch Normalization. So we separate this notion in the code. This requires modifications to the packing class as well, as the number of control variates and the number of model state tensors is not necessarily equal.