Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Skip scheduler.step() if optimizer.step() isn't called in the iteration #9923

Closed
wants to merge 36 commits into from

Conversation

akihironitta
Copy link
Contributor

@akihironitta akihironitta commented Oct 14, 2021

What does this PR do?

Fixes #5558

Does your PR introduce any breaking changes? If yes, please list them.

Before submitting

  • Was this discussed/approved via a GitHub issue? (not for typos and docs)
  • Did you read the contributor guideline, Pull Request section?
  • Did you make sure your PR does only one thing, instead of bundling different changes together?
  • [n/a] Did you make sure to update the documentation with your changes? (if necessary)
  • Did you write any new necessary tests? (not for typos and docs)
  • Did you verify new and existing tests pass locally with your changes?
  • Did you list all the breaking changes introduced by this pull request?
  • Did you update the CHANGELOG? (not for typos, docs, test updates, or internal minor changes/refactorings)

PR review

Anyone in the community is welcome to review the PR.
Before you start reviewing make sure you have read Review guidelines. In short, see the following bullet-list:

  • Is this pull request ready for review? (if not, please submit in draft mode)
  • Check that all items from Before submitting are resolved
  • Make sure the title is self-explanatory and the description concisely explains the PR
  • Add labels and milestones (and optionally projects) to the PR so it can be classified

Did you have fun?

Make sure you had fun coding 🙃

cc @carmocca @justusschock @awaelchli @akihironitta

@akihironitta akihironitta added the bug Something isn't working label Oct 14, 2021
@akihironitta akihironitta changed the title [wip] Fix scheduler.step() called in the wrong order with precsion=16 Fix scheduler.step() called in the wrong order with precsion=16 Oct 15, 2021
@akihironitta
Copy link
Contributor Author

akihironitta commented Oct 17, 2021

When the warning is raised

This issue only happens when lr_scheduler.step() runs every small steps, i.e.

def configure_optimizers(self):
    optimizer = ...
    scheduler = {
        "scheduler": ...,
        "interval": "step",
        "frequency": 1,  # another small number may also cause this issue.
    }
    return {"optimizer": optimizer, "lr_scheduler": scheduler}

Cause of the warning
As documented in the pytorch source code torch/cuda/amp/grad_scaler.py, scaler.step(optimizer) (which is called when using native amp) is likely to skip optimizer.step() for the first few iterations, and thus, it makes lr_scheduler.step() called before any call of optimizer.step().

EDIT (2021-10-28): native amp skips optimizer.step() not only for the first few iterations but also during training sometimes, so we need to skip lr_scheduler.step() whenever optimizer.step() isn't called by the scaler.

pytorch/pytorch#44511
https://discuss.pytorch.org/t/optimizer-step-before-lr-scheduler-step-error-using-gradscaler/92930

@akihironitta akihironitta changed the title Fix scheduler.step() called in the wrong order with precsion=16 Fix scheduler.step() called before optimizer.step() with native amp and "interval": "step" Oct 17, 2021
@akihironitta akihironitta changed the title Fix scheduler.step() called before optimizer.step() with native amp and "interval": "step" Skip scheduler.step() if optimizer.step() is never called Oct 18, 2021
@awaelchli awaelchli added this to the 1.5.x milestone Nov 3, 2021
@akihironitta akihironitta changed the title [NEED HELP] Skip scheduler.step() when optimizer.step() isn't called Skip scheduler.step() when optimizer.step() isn't called Nov 17, 2021
@akihironitta akihironitta changed the title Skip scheduler.step() when optimizer.step() isn't called Skip scheduler.step() if optimizer.step() isn't called in the iteration Nov 17, 2021
@mergify mergify bot removed the has conflicts label Nov 17, 2021
@akihironitta akihironitta added lr scheduler precision: amp Automatic Mixed Precision labels Nov 17, 2021
@akihironitta akihironitta marked this pull request as draft November 17, 2021 07:06
@akihironitta akihironitta marked this pull request as ready for review November 19, 2021 06:22
@akihironitta
Copy link
Contributor Author

still wip

@akihironitta akihironitta deleted the bugfix/scheduler-before-optimizer branch January 11, 2022 03:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working has conflicts lr scheduler precision: amp Automatic Mixed Precision
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Mixed precision: scheduler and optimizer are called in the wrong order
4 participants