Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Torch tensor locality with autoray-registered coerce method #5438

Merged
merged 5 commits into from
Mar 27, 2024

Conversation

mlxd
Copy link
Member

@mlxd mlxd commented Mar 26, 2024

Before submitting

Please complete the following checklist when submitting a PR:

  • All new features must include a unit test.
    If you've fixed a bug or added code that should be tested, add a test to the
    test directory!

  • All new functions and code must be clearly commented and documented.
    If you do make documentation changes, make sure that the docs build and
    render correctly by running make docs.

  • Ensure that the test suite passes, by running make test.

  • Add a new entry to the doc/releases/changelog-dev.md file, summarizing the
    change, and including a link back to the PR.

  • The PennyLane source code conforms to
    PEP8 standards.
    We check all of our code against Pylint.
    To lint modified files, simply pip install pylint, and then
    run pylint pennylane/path/to/file.py.

When all the above are checked, delete everything above the dashed
line and fill in the pull request template.


Context: When Torch has a GPU backed data-buffer, failures can occur when attempting to make autoray-dispatched calls to Torch method with paired CPU data. In this case, for probabilities on the GPU, and eigenvalues on the host (read from the observables), failures appeared with qml.dot, and can be reproduced from:

import pennylane as qml
import torch
import numpy as np

torch_device="cuda"
dev = qml.device("default.qubit.torch", wires=2, torch_device=torch_device)
ham = qml.Hamiltonian(torch.tensor([0.1, 0.2], requires_grad=True), [qml.PauliX(0), qml.PauliZ(1)])

@qml.qnode(dev, diff_method="backprop", interface="torch")
def circuit():
    qml.RX(np.zeros(5), 0)  # Broadcast the state by applying a broadcasted identity
    return qml.expval(ham)

res = circuit()
assert qml.math.allclose(res, 0.2)

This pair modifies the registered coerce method for Torch to always automigrate mixed CPU-GPU data to always favour the associated GPU. In addition, this method now also catches multi-GPU data, where tensors do not reside on the same index, and will fail outright. As a longer term solution, moving the Torch GPU dispatch calls to earlier in the stack would be more sound, but this fixes the aforementioned issue, at the expense of always migrating from CPU to GPU.

Description of the Change: As above.

Benefits: Allows automatic data migration from host to device when using a GPU backed tensor. In addition, will catch multi-GPU tensor data when using Torch, and fail due to non-local representations.

Possible Drawbacks: Auto migration may not always be wanted. The alternative solution is to always be explicit about locality, and move the eigenvalue data to exist on the device at a higher layer in the stack.

Related GitHub Issues: #5269 introduced changes that resulted in GPU errors.

Copy link
Contributor

Hello. You may have forgotten to update the changelog!
Please edit doc/releases/changelog-dev.md with:

  • A one-to-two sentence description of the change. You may include a small working example for new features.
  • A link back to this PR.
  • Your name (or GitHub username) in the contributors section.

@mlxd mlxd added the gpu label Mar 26, 2024
@mlxd mlxd changed the title Fix Torch tensor locality with scalar prods, H and sums Fix Torch tensor locality with autoray-registered coerce method Mar 26, 2024
@mlxd mlxd marked this pull request as ready for review March 26, 2024 18:58
@mlxd
Copy link
Member Author

mlxd commented Mar 26, 2024

[sc-59860]

Copy link

codecov bot commented Mar 27, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 99.67%. Comparing base (30f69b0) to head (c648974).

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #5438      +/-   ##
==========================================
- Coverage   99.68%   99.67%   -0.01%     
==========================================
  Files         402      402              
  Lines       37527    37246     -281     
==========================================
- Hits        37407    37125     -282     
- Misses        120      121       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

@Qottmann Qottmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again so much @mlxd !

Should we add a little changelog entry for this improvement/bug fix? Else looks good with everything green 👍

@mlxd
Copy link
Member Author

mlxd commented Mar 27, 2024

Thanks @Qottmann and @mudit2812. Will update the changelog and push this through

Copy link
Contributor

@lillian542 lillian542 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me!

@mlxd mlxd merged commit 1bb10be into master Mar 27, 2024
42 checks passed
@mlxd mlxd deleted the bugfix/cuda_tensor_sum branch March 27, 2024 15:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants