-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ensure EMA checkpoints are also deleted when normal checkpoints are #5724
Conversation
Signed-off-by: SeanNaren <snarenthiran@nvidia.com>
@@ -1,6 +1,6 @@ | |||
hydra-core>=1.2.0,<1.3 | |||
omegaconf>=2.2,<2.3 | |||
pytorch-lightning>=1.8.3 | |||
pytorch-lightning>=1.8.6 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hope its ok to upgrade the version here @titu1994
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be fine, r1.15 has been cut already so it affects only main.
Signed-off-by: SeanNaren <snarenthiran@nvidia.com>
@@ -248,6 +248,27 @@ def test_exp_manager_ema_weights(self, tmpdir): | |||
for saved_weight, ema_weight in zip(duplicate_model.state_dict().values(), ema_weights): | |||
assert torch.allclose(saved_weight.cpu(), ema_weight.cpu()) | |||
|
|||
@pytest.mark.unit | |||
def test_exp_manager_ema_weights_topk(self, tmpdir): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Confirmed this test fails when using pytorch-lightning==1.8.3
Signed-off-by: SeanNaren <snarenthiran@nvidia.com>
Signed-off-by: SeanNaren <snarenthiran@nvidia.com>
Signed-off-by: SeanNaren <snarenthiran@nvidia.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great ! Thanks !
@@ -1,6 +1,6 @@ | |||
hydra-core>=1.2.0,<1.3 | |||
omegaconf>=2.2,<2.3 | |||
pytorch-lightning>=1.8.3 | |||
pytorch-lightning>=1.8.6 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be fine, r1.15 has been cut already so it affects only main.
…5724) * Ensure EMA checkpoints are also deleted when normal checkpoints are Signed-off-by: SeanNaren <snarenthiran@nvidia.com> * Simplify test Signed-off-by: SeanNaren <snarenthiran@nvidia.com> * Remove comment Signed-off-by: SeanNaren <snarenthiran@nvidia.com> * Fix bug where `save_best_model` caused a crash Signed-off-by: SeanNaren <snarenthiran@nvidia.com> * Swap to logging only on rank 0 Signed-off-by: SeanNaren <snarenthiran@nvidia.com> Signed-off-by: SeanNaren <snarenthiran@nvidia.com> Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>
…VIDIA#5724) * Ensure EMA checkpoints are also deleted when normal checkpoints are Signed-off-by: SeanNaren <snarenthiran@nvidia.com> * Simplify test Signed-off-by: SeanNaren <snarenthiran@nvidia.com> * Remove comment Signed-off-by: SeanNaren <snarenthiran@nvidia.com> * Fix bug where `save_best_model` caused a crash Signed-off-by: SeanNaren <snarenthiran@nvidia.com> * Swap to logging only on rank 0 Signed-off-by: SeanNaren <snarenthiran@nvidia.com> Signed-off-by: SeanNaren <snarenthiran@nvidia.com>
What does this PR do ?
Closes #5631
Collection: Common
Changelog
Before your PR is "Ready for review"
Pre checks:
PR Type: