New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix LR scheduler behaviour with AMP #16229
base: master
Are you sure you want to change the base?
Conversation
In the process of fixing tests I discovered and fixed a bug where the scheduler wouldn't match its optimizer when multiple optimizers are instantiated with frequencies. Now the optimizers and schedulers match and alternate as they should, resetting the cycle every epoch. |
@carmocca Ready for final review |
@@ -390,7 +391,7 @@ def update_lr_schedulers(self, interval: str, update_plateau_schedulers: bool) - | |||
if interval == "step" and self._should_accumulate(): | |||
return | |||
active_optimizers = _get_active_optimizers( | |||
self.trainer.optimizers, self.trainer.optimizer_frequencies, self.total_batch_idx | |||
self.trainer.optimizers, self.trainer.optimizer_frequencies, self.batch_idx |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add a test to verify this works properly ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I modified the third case of test_step_scheduling_for_multiple_optimizers_with_frequency
so that it tests that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you check the failing tests?
setup.cfg
Outdated
@@ -34,6 +34,7 @@ markers = | |||
cloud:Run the cloud tests for example | |||
filterwarnings = | |||
error::FutureWarning | |||
error:Detected call of `lr_scheduler.step\(\)` before `optimizer.step\(\)`:UserWarning |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added this line so that our CI fails if this warning appears. This way it tests that your patch works as expected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, but this also makes IPU tests fail, this PR is focused on GPU. Not sure where to fix the issue on IPUs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
The way I fixed the |
I also modified |
Hi @Borda, I also encounter the same issue. Will this be merged? |
Let me check what is missing here... |
Is this PR merged already? I'm still having this issue. |
there were some failing tests, @milesial mind have a look? |
|
GitGuardian id | GitGuardian status | Secret | Commit | Filename | |
---|---|---|---|---|---|
- | Generic High Entropy Secret | 78fa3af | tests/tests_app/utilities/test_login.py | View secret | |
- | Base64 Basic Authentication | 78fa3af | tests/tests_app/utilities/test_login.py | View secret |
🛠 Guidelines to remediate hardcoded secrets
- Understand the implications of revoking this secret by investigating where it is used in your code.
- Replace and store your secret safely. Learn here the best practices.
- Revoke and rotate this secret.
- If possible, rewrite git history. Rewriting git history is not a trivial act. You might completely break other contributing developers' workflow and you risk accidentally deleting legitimate data.
To avoid such incidents in the future consider
- following these best practices for managing and storing secrets including API keys and other credentials
- install secret detection on pre-commit to catch secret before it leaves your machine and ease remediation.
🦉 GitGuardian detects secrets in your source code to help developers and security teams secure the modern development process. You are seeing this because you or someone else with access to this repository has authorized GitGuardian to scan your pull request.
Our GitHub checks need improvements? Share your feedbacks!
Codecov Report
Additional details and impacted files@@ Coverage Diff @@
## master #16229 +/- ##
==========================================
- Coverage 83% 29% -54%
==========================================
Files 450 442 -8
Lines 38089 37941 -148
==========================================
- Hits 31803 11015 -20788
- Misses 6286 26926 +20640 |
for more information, see https://pre-commit.ci
What does this PR do?
When training when native AMP and a LR scheduler, we get this warning that indicates that a LR step has been taken when an optimizer step was skipped (expected at the beginning of the training with native AMP):
Fixes #16228 #5558
Does your PR introduce any breaking changes? If yes, please list them.
No
Before submitting
PR review
Anyone in the community is welcome to review the PR.
Before you start reviewing, make sure you have read the review guidelines. In short, see the following bullet-list: