Fix Torch tensor locality with autoray-registered `coerce` method #5438

mlxd · 2024-03-26T16:34:24Z

Before submitting

Please complete the following checklist when submitting a PR:

All new features must include a unit test.
If you've fixed a bug or added code that should be tested, add a test to the
test directory!
All new functions and code must be clearly commented and documented.
If you do make documentation changes, make sure that the docs build and
render correctly by running make docs.
Ensure that the test suite passes, by running make test.
Add a new entry to the doc/releases/changelog-dev.md file, summarizing the
change, and including a link back to the PR.
The PennyLane source code conforms to
PEP8 standards.
We check all of our code against Pylint.
To lint modified files, simply pip install pylint, and then
run pylint pennylane/path/to/file.py.

When all the above are checked, delete everything above the dashed
line and fill in the pull request template.

Context: When Torch has a GPU backed data-buffer, failures can occur when attempting to make autoray-dispatched calls to Torch method with paired CPU data. In this case, for probabilities on the GPU, and eigenvalues on the host (read from the observables), failures appeared with qml.dot, and can be reproduced from:

import pennylane as qml
import torch
import numpy as np

torch_device="cuda"
dev = qml.device("default.qubit.torch", wires=2, torch_device=torch_device)
ham = qml.Hamiltonian(torch.tensor([0.1, 0.2], requires_grad=True), [qml.PauliX(0), qml.PauliZ(1)])

@qml.qnode(dev, diff_method="backprop", interface="torch")
def circuit():
    qml.RX(np.zeros(5), 0)  # Broadcast the state by applying a broadcasted identity
    return qml.expval(ham)

res = circuit()
assert qml.math.allclose(res, 0.2)

This pair modifies the registered coerce method for Torch to always automigrate mixed CPU-GPU data to always favour the associated GPU. In addition, this method now also catches multi-GPU data, where tensors do not reside on the same index, and will fail outright. As a longer term solution, moving the Torch GPU dispatch calls to earlier in the stack would be more sound, but this fixes the aforementioned issue, at the expense of always migrating from CPU to GPU.

Description of the Change: As above.

Benefits: Allows automatic data migration from host to device when using a GPU backed tensor. In addition, will catch multi-GPU tensor data when using Torch, and fail due to non-local representations.

Possible Drawbacks: Auto migration may not always be wanted. The alternative solution is to always be explicit about locality, and move the eigenvalue data to exist on the device at a higher layer in the stack.

Related GitHub Issues: #5269 introduced changes that resulted in GPU errors.

github-actions · 2024-03-26T16:34:40Z

Hello. You may have forgotten to update the changelog!
Please edit doc/releases/changelog-dev.md with:

A one-to-two sentence description of the change. You may include a small working example for new features.
A link back to this PR.
Your name (or GitHub username) in the contributors section.

mlxd · 2024-03-26T19:02:19Z

[sc-59860]

codecov · 2024-03-27T13:59:31Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 99.67%. Comparing base (30f69b0) to head (c648974).

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #5438      +/-   ##
==========================================
- Coverage   99.68%   99.67%   -0.01%     
==========================================
  Files         402      402              
  Lines       37527    37246     -281     
==========================================
- Hits        37407    37125     -282     
- Misses        120      121       +1

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Qottmann

Thanks again so much @mlxd !

Should we add a little changelog entry for this improvement/bug fix? Else looks good with everything green 👍

mlxd · 2024-03-27T15:19:50Z

Thanks @Qottmann and @mudit2812. Will update the changelog and push this through

lillian542

Looks good to me!

Remove redundant sum check to fix backprop CUDA error

574090b

mlxd added 2 commits March 26, 2024 14:44

Coerce torch tensors to GPU if mixed-mode C-G data used

aeebf80

remove unneeded tensor

b78952e

mlxd added the gpu label Mar 26, 2024

mlxd changed the title ~~Fix Torch tensor locality with scalar prods, H and sums~~ Fix Torch tensor locality with autoray-registered coerce method Mar 26, 2024

Fix format

828421f

mlxd marked this pull request as ready for review March 26, 2024 18:58

mudit2812 approved these changes Mar 27, 2024

View reviewed changes

Qottmann approved these changes Mar 27, 2024

View reviewed changes

Update changelog

c648974

lillian542 approved these changes Mar 27, 2024

View reviewed changes

mlxd merged commit 1bb10be into master Mar 27, 2024
42 checks passed

mlxd deleted the bugfix/cuda_tensor_sum branch March 27, 2024 15:52

isaacdevlugt mentioned this pull request May 3, 2024

Updates to v0.36 release notes - part 2 #5573

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix Torch tensor locality with autoray-registered `coerce` method #5438

Fix Torch tensor locality with autoray-registered `coerce` method #5438

mlxd commented Mar 26, 2024 •

edited

Loading

github-actions bot commented Mar 26, 2024

mlxd commented Mar 26, 2024

codecov bot commented Mar 27, 2024 •

edited

Loading

Qottmann left a comment

mlxd commented Mar 27, 2024

lillian542 left a comment

Fix Torch tensor locality with autoray-registered coerce method #5438

Fix Torch tensor locality with autoray-registered coerce method #5438

Conversation

mlxd commented Mar 26, 2024 • edited Loading

Before submitting

github-actions bot commented Mar 26, 2024

mlxd commented Mar 26, 2024

codecov bot commented Mar 27, 2024 • edited Loading

Codecov Report

Qottmann left a comment

Choose a reason for hiding this comment

mlxd commented Mar 27, 2024

lillian542 left a comment

Choose a reason for hiding this comment

Fix Torch tensor locality with autoray-registered `coerce` method #5438

Fix Torch tensor locality with autoray-registered `coerce` method #5438

mlxd commented Mar 26, 2024 •

edited

Loading

codecov bot commented Mar 27, 2024 •

edited

Loading