Fix FLamby FedAdam NaN Issue on Eval #39

emersodb · 2023-07-11T21:41:31Z

This PR is meant to address the issue we were seeing with Batch Normalization layers ending up with negative variance estimates due to momentum based aggregation in FedAdam. The effect is that sometimes the variance estimates for a batch normalization layer becomes negative. This causes failure in the forward pass during evaluation. The fix is to configure the batch normalization layers to not use training estimates of the batch mean and variance during evaluation, which means that the estimates tracked during training have no effect.

I've tested this fix for the Fed Isic 2019 EfficientNet model and it now appears to be avoiding NaN values in the eval stage throughout training. Will be post-processing the HP search and measuring performance tomorrow as long as the scripts run well through the night.

…ization layers. This should alleviate the issue we were seeing where nans crept into the model due to negative variance coming from momentum on the server side aggregation.

…_issue

…nsure that mean and var state are not applied during eval.

yuchongzhang

David and I had a call to discuss the math behind this and why setting the track_running-stats boolean to false is insufficient (the official PyTorch documentation is kind of misleading on this point). Changes look good to me.

…_issue

emersodb added 2 commits July 11, 2023 17:31

Updating the model for FedAdam to not track state in its batch normal…

e9067e5

…ization layers. This should alleviate the issue we were seeing where nans crept into the model due to negative variance coming from momentum on the server side aggregation.

Fixing comment.

031845e

emersodb requested review from fatemetkl, jewelltaylor, sanaAyrml and yuchongzhang July 11, 2023 21:41

emersodb added 2 commits July 12, 2023 09:00

Merge branch 'dbe/expand_basic_client' into dbe/fix_flamby_fedadam_bn…

45fc480

…_issue

Need to further modify some of the underlying state of BN layers to e…

12c3fa4

…nsure that mean and var state are not applied during eval.

yuchongzhang approved these changes Jul 13, 2023

View reviewed changes

emersodb changed the base branch from dbe/expand_basic_client to main July 13, 2023 21:40

Merge branch 'main' into dbe/fix_flamby_fedadam_bn_issue

04b1ddb

emersodb changed the base branch from main to dbe/expand_basic_client July 13, 2023 21:41

Merge branch 'dbe/expand_basic_client' into dbe/fix_flamby_fedadam_bn…

a8bf774

…_issue

Base automatically changed from dbe/expand_basic_client to main July 14, 2023 15:17

Merge branch 'main' into dbe/fix_flamby_fedadam_bn_issue

58735c4

emersodb merged commit 212d698 into main Jul 14, 2023

emersodb deleted the dbe/fix_flamby_fedadam_bn_issue branch July 14, 2023 15:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix FLamby FedAdam NaN Issue on Eval #39

Fix FLamby FedAdam NaN Issue on Eval #39

Uh oh!

emersodb commented Jul 11, 2023 •

edited

Loading

Uh oh!

yuchongzhang left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Fix FLamby FedAdam NaN Issue on Eval #39

Fix FLamby FedAdam NaN Issue on Eval #39

Uh oh!

Conversation

emersodb commented Jul 11, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yuchongzhang left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

emersodb commented Jul 11, 2023 •

edited

Loading