HPU & TPU doesn't support torch.inference_mode #13014

jerome-habana · 2022-05-09T04:39:07Z

What does this PR do?

#12715 introduced torch.inference_mode without checking all backends. This resulted in the following error on habana backends
"RuntimeError: Cannot set version_counter for inference tensor"

This PR disables inference_mode and uses no_grad mode for eval/predict stages with hpu backend. Once hpu impl adds support for inference_mode, this check will be removed.

Fixes #<issue_number>

Does your PR introduce any breaking changes? If yes, please list them.

Before submitting

Was this discussed/approved via a GitHub issue? (not for typos and docs)
Did you read the contributor guideline, Pull Request section?
Did you make sure your PR does only one thing, instead of bundling different changes together?
Did you make sure to update the documentation with your changes? (if necessary)
Did you write any new necessary tests? (not for typos and docs)
Did you verify new and existing tests pass locally with your changes?
Did you list all the breaking changes introduced by this pull request?
Did you update the CHANGELOG? (not for typos, docs, test updates, or minor internal changes/refactors)

PR review

Anyone in the community is welcome to review the PR.
Before you start reviewing, make sure you have read the review guidelines. In short, see the following bullet-list:

Is this pull request ready for review? (if not, please submit in draft mode)
Check that all items from Before submitting are resolved
Make sure the title is self-explanatory and the description concisely explains the PR
Add labels and milestones (and optionally projects) to the PR so it can be classified

Did you have fun?

Make sure you had fun coding 🙃

Signed-off-by: Jerome <janand@habana.ai>

for more information, see https://pre-commit.ci

Signed-off-by: Jerome <janand@habana.ai>

pytorch_lightning/trainer/trainer.py

kaushikb11 · 2022-05-09T07:56:03Z

@awaelchli @tchaton

pytorch_lightning/trainer/trainer.py

docs/source/accelerators/hpu_basic.rst

Signed-off-by: Jerome <janand@habana.ai>

pytorch_lightning/trainer/trainer.py

Borda

go

jerome-habana and others added 3 commits May 9, 2022 07:31

HPU doesn't support torch.inference_mode

673223c

Signed-off-by: Jerome <janand@habana.ai>

[pre-commit.ci] auto fixes from pre-commit.com hooks

23f9315

for more information, see https://pre-commit.ci

Update doc and changelog

f9d4621

Signed-off-by: Jerome <janand@habana.ai>

jerome-habana marked this pull request as ready for review May 9, 2022 04:49

jerome-habana requested review from williamFalcon, tchaton, awaelchli, edenlightning, Borda, SeanNaren, carmocca, justusschock, kaushikb11 and rohitgr7 as code owners May 9, 2022 04:49

kaushikb11 reviewed May 9, 2022

View reviewed changes

pytorch_lightning/trainer/trainer.py Outdated Show resolved Hide resolved

Update pytorch_lightning/trainer/trainer.py

03248c9

kaushikb11 added priority: 0 High priority task accelerator: hpu (external) Habana Processing Unit bug Something isn't working labels May 9, 2022

kaushikb11 added this to the 1.6.x milestone May 9, 2022

kaushikb11 reviewed May 9, 2022

View reviewed changes

pytorch_lightning/trainer/trainer.py Outdated Show resolved Hide resolved

Revert back to HPU available

1f05e59

kaushikb11 approved these changes May 9, 2022

View reviewed changes

Borda approved these changes May 9, 2022

View reviewed changes

mergify bot added the ready PRs ready to be merged label May 9, 2022

rohitgr7 reviewed May 9, 2022

View reviewed changes

pytorch_lightning/trainer/trainer.py Outdated Show resolved Hide resolved

docs/source/accelerators/hpu_basic.rst Outdated Show resolved Hide resolved

Address reviews

1fb7964

Signed-off-by: Jerome <janand@habana.ai>

jerome-habana requested a review from rohitgr7 May 9, 2022 09:00

rohitgr7 approved these changes May 9, 2022

View reviewed changes

pytorch_lightning/trainer/trainer.py Outdated Show resolved Hide resolved

pytorch_lightning/trainer/trainer.py Outdated Show resolved Hide resolved

rohitgr7 added 2 commits May 9, 2022 14:56

Update pytorch_lightning/trainer/trainer.py

843f4e6

Update pytorch_lightning/trainer/trainer.py

9fdbb50

kaushikb11 enabled auto-merge (squash) May 9, 2022 10:27

Borda reviewed May 9, 2022

View reviewed changes

Add TPU accelerator condition

36fdef1

kaushikb11 changed the title ~~HPU doesn't support torch.inference_mode~~ HPU & TPU doesn't support torch.inference_mode May 9, 2022

lexierule disabled auto-merge May 9, 2022 14:27

lexierule merged commit fb40cbc into Lightning-AI:master May 9, 2022

carmocca modified the milestones: 1.6.x, 1.7 May 9, 2022

kaushikb11 mentioned this pull request May 10, 2022

Patch release v1.6.4 #13019

Merged

jerome-habana deleted the hpu_infer branch May 10, 2022 02:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HPU & TPU doesn't support torch.inference_mode #13014

HPU & TPU doesn't support torch.inference_mode #13014

jerome-habana commented May 9, 2022 •

edited by kaushikb11

kaushikb11 commented May 9, 2022

Borda left a comment

HPU & TPU doesn't support torch.inference_mode #13014

HPU & TPU doesn't support torch.inference_mode #13014

Conversation

jerome-habana commented May 9, 2022 • edited by kaushikb11

What does this PR do?

Does your PR introduce any breaking changes? If yes, please list them.

Before submitting

PR review

Did you have fun?

kaushikb11 commented May 9, 2022

Borda left a comment

Choose a reason for hiding this comment

jerome-habana commented May 9, 2022 •

edited by kaushikb11