Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test_loss_scale_decrease fails with some random seeds #695

Open
hartb opened this issue Jan 29, 2020 · 2 comments
Open

test_loss_scale_decrease fails with some random seeds #695

hartb opened this issue Jan 29, 2020 · 2 comments

Comments

@hartb
Copy link

hartb commented Jan 29, 2020

Test test_loss_scale_decrease in run_amp/test_checkpointing.py fails consistently with certain random seeds:

$ diff -u test_checkpointing.py.orig test_checkpointing.py
--- test_checkpointing.py.orig  2020-01-29 23:28:10.266063356 +0000
+++ test_checkpointing.py       2020-01-29 23:28:33.493162829 +0000
@@ -162,6 +162,7 @@
                             continue
 
     def test_loss_scale_decrease(self):
+        torch.manual_seed(2)
         num_losses = 3
         nb_decrease_loss_scales = [0, 1, 2]
         for opt_level in self.test_opt_levels:

$ python -m pytest -v test_checkpointing.py::TestCheckpointing::test_loss_scale_decrease
...
>               self.assertEqual(update_ls, init_ls / 2**factor)
E               AssertionError: 32768.0 != 16384.0

test_checkpointing.py:213: AssertionError

The failure always seems to occur when opt_level = O1, and always with the same values failing the assertion:

update_ls = 32768.0
init_ls   = 65536.0
factor    = 2

The failure is consistent for me (same failing seeds, opt_level, and values) across:

  • x86_64 and ppc64le
  • CUDA 10.1 and 10.2
  • PyTorch 1.2.0 and 1.3.1
@HangJie720
Copy link

I also find the error, could you tell me how you solve it.

@hartb
Copy link
Author

hartb commented Jan 4, 2022

I'm afraid we just noted the failure and moved on, hoping the NVIDIA team would be able to recreate and resolve.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants