-
Notifications
You must be signed in to change notification settings - Fork 569
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix Torch tensor locality with autoray-registered coerce
method
#5438
Conversation
Hello. You may have forgotten to update the changelog!
|
coerce
method
[sc-59860] |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #5438 +/- ##
==========================================
- Coverage 99.68% 99.67% -0.01%
==========================================
Files 402 402
Lines 37527 37246 -281
==========================================
- Hits 37407 37125 -282
- Misses 120 121 +1 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks again so much @mlxd !
Should we add a little changelog entry for this improvement/bug fix? Else looks good with everything green 👍
Thanks @Qottmann and @mudit2812. Will update the changelog and push this through |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me!
Before submitting
Please complete the following checklist when submitting a PR:
All new features must include a unit test.
If you've fixed a bug or added code that should be tested, add a test to the
test directory!
All new functions and code must be clearly commented and documented.
If you do make documentation changes, make sure that the docs build and
render correctly by running
make docs
.Ensure that the test suite passes, by running
make test
.Add a new entry to the
doc/releases/changelog-dev.md
file, summarizing thechange, and including a link back to the PR.
The PennyLane source code conforms to
PEP8 standards.
We check all of our code against Pylint.
To lint modified files, simply
pip install pylint
, and thenrun
pylint pennylane/path/to/file.py
.When all the above are checked, delete everything above the dashed
line and fill in the pull request template.
Context: When Torch has a GPU backed data-buffer, failures can occur when attempting to make autoray-dispatched calls to Torch method with paired CPU data. In this case, for probabilities on the GPU, and eigenvalues on the host (read from the observables), failures appeared with
qml.dot
, and can be reproduced from:This pair modifies the registered
coerce
method for Torch to always automigrate mixed CPU-GPU data to always favour the associated GPU. In addition, this method now also catches multi-GPU data, where tensors do not reside on the same index, and will fail outright. As a longer term solution, moving the Torch GPU dispatch calls to earlier in the stack would be more sound, but this fixes the aforementioned issue, at the expense of always migrating from CPU to GPU.Description of the Change: As above.
Benefits: Allows automatic data migration from host to device when using a GPU backed tensor. In addition, will catch multi-GPU tensor data when using Torch, and fail due to non-local representations.
Possible Drawbacks: Auto migration may not always be wanted. The alternative solution is to always be explicit about locality, and move the eigenvalue data to exist on the device at a higher layer in the stack.
Related GitHub Issues: #5269 introduced changes that resulted in GPU errors.