Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

HPU & TPU doesn't support torch.inference_mode #13014

Merged
merged 9 commits into from May 9, 2022

Conversation

jerome-habana
Copy link
Contributor

@jerome-habana jerome-habana commented May 9, 2022

Signed-off-by: Jerome janand@habana.ai

What does this PR do?

#12715 introduced torch.inference_mode without checking all backends. This resulted in the following error on habana backends
"RuntimeError: Cannot set version_counter for inference tensor"

This PR disables inference_mode and uses no_grad mode for eval/predict stages with hpu backend. Once hpu impl adds support for inference_mode, this check will be removed.

Fixes #<issue_number>

Does your PR introduce any breaking changes? If yes, please list them.

Before submitting

  • Was this discussed/approved via a GitHub issue? (not for typos and docs)
  • Did you read the contributor guideline, Pull Request section?
  • Did you make sure your PR does only one thing, instead of bundling different changes together?
  • Did you make sure to update the documentation with your changes? (if necessary)
  • Did you write any new necessary tests? (not for typos and docs)
  • Did you verify new and existing tests pass locally with your changes?
  • Did you list all the breaking changes introduced by this pull request?
  • Did you update the CHANGELOG? (not for typos, docs, test updates, or minor internal changes/refactors)

PR review

Anyone in the community is welcome to review the PR.
Before you start reviewing, make sure you have read the review guidelines. In short, see the following bullet-list:

  • Is this pull request ready for review? (if not, please submit in draft mode)
  • Check that all items from Before submitting are resolved
  • Make sure the title is self-explanatory and the description concisely explains the PR
  • Add labels and milestones (and optionally projects) to the PR so it can be classified

Did you have fun?

Make sure you had fun coding 馃檭

jerome-habana and others added 3 commits May 9, 2022 07:31
Signed-off-by: Jerome <janand@habana.ai>
Signed-off-by: Jerome <janand@habana.ai>
@kaushikb11 kaushikb11 added priority: 0 High priority task accelerator: hpu (external) Habana Processing Unit bug Something isn't working labels May 9, 2022
@kaushikb11 kaushikb11 added this to the 1.6.x milestone May 9, 2022
@mergify mergify bot added the ready PRs ready to be merged label May 9, 2022
@kaushikb11
Copy link
Contributor

@awaelchli @tchaton

pytorch_lightning/trainer/trainer.py Outdated Show resolved Hide resolved
docs/source/accelerators/hpu_basic.rst Outdated Show resolved Hide resolved
Signed-off-by: Jerome <janand@habana.ai>
pytorch_lightning/trainer/trainer.py Outdated Show resolved Hide resolved
pytorch_lightning/trainer/trainer.py Outdated Show resolved Hide resolved
@kaushikb11 kaushikb11 enabled auto-merge (squash) May 9, 2022 10:27
Copy link
Member

@Borda Borda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

go

@kaushikb11 kaushikb11 changed the title HPU doesn't support torch.inference_mode HPU & TPU doesn't support torch.inference_mode May 9, 2022
@lexierule lexierule disabled auto-merge May 9, 2022 14:27
@lexierule lexierule merged commit fb40cbc into Lightning-AI:master May 9, 2022
@carmocca carmocca modified the milestones: 1.6.x, 1.7 May 9, 2022
@jerome-habana jerome-habana deleted the hpu_infer branch May 10, 2022 02:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
accelerator: hpu (external) Habana Processing Unit bug Something isn't working priority: 0 High priority task ready PRs ready to be merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants