Fix GPU usage with `default.qubit.torch` for `qml.qnn.TorchLayer` #1705

antalszava · 2021-09-29T21:10:44Z

Context

In PennyLane v0.18.0, the default.qubit.torch device was added. This device is used under the hood by QNodes for backpropagation with Torch tensors. The QNode, however, doesn't have a logic for checking what the torch_device keyword argument should be when creating a backdrop device and instantiates default.qubit.torch with default arguments (torch_device='cpu'). That is, there is no opportunity to specify using the GPU.

Therefore, there may be issues with passing Torch tensors using CUDA as QNode arguments (see #1688). What we would like to have in such cases is equivalent to dev = qml.device('default.qubit.torch', wires=3, torch_device='cuda').

Further to 1., the default.qubit.torch device uses the internal methods of DefaultQubit in many cases. These methods use the _asarray method. One detail of _asarray is that it doesn't consider the Torch device that's used, hence computations might shift from the GPU to CPU.

Changes

Changes the QNode to specify torch_device='cuda' internally when creating a dev = qml.device('default.qubit.torch', wires=3) device for backpropagation.
Changes the _asarray method to always put the output on the Torch device specified internally;

Related issues

Closes #1688.

github-actions · 2021-09-29T21:11:01Z

Hello. You may have forgotten to update the changelog!
Please edit doc/releases/changelog-dev.md with:

A one-to-two sentence description of the change. You may include a small working example for new features.
A link back to this PR.
Your name (or GitHub username) in the contributors section.

antalszava · 2021-09-29T21:54:45Z

[ch9374]

codecov · 2021-09-29T22:13:22Z

Codecov Report

Merging #1705 (da1a7e0) into master (e0dc0dc) will decrease coverage by 0.00%.
The diff coverage is 90.90%.

@@            Coverage Diff             @@
##           master    #1705      +/-   ##
==========================================
- Coverage   99.22%   99.21%   -0.01%     
==========================================
  Files         204      204              
  Lines       15424    15429       +5     
==========================================
+ Hits        15304    15308       +4     
- Misses        120      121       +1

Impacted Files	Coverage Δ
pennylane/devices/default_qubit_torch.py	`92.30% <90.90%> (-0.63%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e0dc0dc...da1a7e0. Read the comment docs.

josh146 · 2021-09-30T05:49:57Z

doc/introduction/interfaces/torch.rst

+    phi = torch.tensor([0.011, 0.012], requires_grad=True, device='cuda')
+    theta = torch.tensor(0.05, requires_grad=True, device='cuda')


@antalszava I don't think we should use device='cuda' explicitly in the documentation, since this will fail for users that don't have CUDA installed or GPU access. For example, if I run this snippet locally, I get:

AssertionError: Torch not compiled with CUDA enabled

Good point! Made the change as the text states

with the classical nodes processed on a GPU

and the output was not put on CUDA devices. Maybe having

dev = "cuda" torch.cuda.is_available() else "cpu" phi = torch.tensor([0.011, 0.012], requires_grad=True, device=dev) theta = torch.tensor(0.05, requires_grad=True, device=dev)

could help that the example is executable even in such cases.

🤔 This is a case where I would probably suggest removing all references to the GPU in this page altogether. Users can refer to PyTorch docs for using GPUs, but we shouldn't have code examples in our docs that not all users can run

josh146 · 2021-09-30T05:51:42Z

pennylane/devices/default_qubit_torch.py

+        if torch_device is None:
+            self._torch_device = "cpu"
+            self._user_def_torch_device = False
+        else:
+            self._torch_device = torch_device
+            self._user_def_torch_device = True


A nice feature about the Torch device argument is that it also treats None as the default device :)

>>> x = torch.tensor(0.2, device=None) >>> x.device device(type='cpu')

So you can probably remove this branching here, and simply store

self._torch_device = torch_device

Oh didn't know that, thank you! 🙂 One piece of information that's also being stored is self._user_def_torch_device. We'd like to know if the user has explicitly set a Torch device because the QNode creates a default.qubit.torch device under the hood. So maybe it's good to use the branching logic for that. 🤔

josh146 · 2021-09-30T05:55:00Z

pennylane/qnode.py

+            # Check if we should be using CUDA
+            ops_and_obs = self.qtape.operations + self.qtape.observables
+            any_op_uses_cuda = any(
+                data.is_cuda for op in ops_and_obs for data in op.data if hasattr(data, "is_cuda")
+            )
+
+            if any_op_uses_cuda and self.device._torch_device == "cpu":


@antalszava I must admit, this feels like a check that shouldn't be in the QNode, but instead be part of the default.qubit.torch device.

Instead of passing torch_device=... to the Device, could we not move this logic to the device instead? E.g.,

def apply(circuit,...): ops_and_obs = circuit.operations + circuit.observables any_op_uses_cuda = any( data.is_cuda for op in ops_and_obs for data in op.data if hasattr(data, "is_cuda") ) ...

This way:

The QNode avoids having interface and device specific logic

The Device simply chooses the torch device to use by checking what torch devices the received circuit uses, which feels much cleaner :)

Essentially: the default.qubit.torch device should check, based on the input circuit, whether it is a GPU circuit. If it is, all tensors the device creates should also be on the same device.

@josh146 that's a fair idea, frankly, I had this logic in the execute method first. The reason why I ended up putting this in the QNode is that the need for a change came from the QNode itself. At the moment, the QNode "silently" creates a default.qubit.torch device with the default torch_device='cpu' option, while the circuit might be using the GPU. This is only happening because the QNode swaps out default.qubit to a default.qubit.torch device. When the user explicitly instanties a default.qubit.torch device, the responsibility is on them to choose the correct torch_device argument matching their circuit (GPU or CPU).

The QNode avoids having interface and device specific logic

We'll likely have to include some logic sooner or later in the QNode because right now the QNode instantiates all backprop devices with default arguments. It feels like we're offloading the duty of the QNode getting the correct device options onto the devices, just to preserve the workflow of the QNode:

a) creating a device with default options, but then
b) the device itself updates those device options under the hood.

The Device simply chooses the torch device to use by checking what torch devices the received circuit uses, which feels much cleaner :)

While this sounds good to have, frankly I'd think we're making up for the poor use of the device by the user if we have logic like this in the device.

All in all, I'd argue keeping the logic in the QNode, as that feels like the place where this logic belongs, just because it's a QNode use case. Having said that, there are indeed merits to moving this to the device. so keen to hear more on this (likely it's also a design decision that will apply in the future for other cases). 🙂

mlxd

Nice one @antalszava
I can confirm the issue no longer appears for CUDA or CPU devices:

python ./test_cpu.py 
/home/mlxd/DELME/pyenv/lib/python3.9/site-packages/torch/autograd/__init__.py:147: UserWarning: Casting complex values to real discards the imaginary part (Triggered internally at  ../aten/src/ATen/native/Copy.cpp:240.)
  Variable._execution_engine.run_backward(
Average loss over epoch 1: 0.3317
Average loss over epoch 2: 0.2200

python ./test_gpu.py
[W Copy.cpp:240] Warning: Casting complex values to real discards the imaginary part (function operator())
Average loss over epoch 1: 0.3755
Average loss over epoch 2: 0.2203

Though I did notice the GPU version taking quite a long time compared to the CPU.

josh146

Looks good on my end! Thanks everyone for fixing this so fast.

Only requirements from my side are:

Minor suggestions to the documentation
Fixing the tests, linting, and black 🙂

josh146 · 2021-10-07T13:42:06Z

doc/introduction/interfaces/torch.rst

+>>> phi_final
+tensor([0.7345, 0.0120], requires_grad=True)
+>>> theta_final
+tensor(0.8316, requires_grad=True)
+>>> circuit4(phi_final, theta_final)
+tensor(0.5000, dtype=torch.float64, grad_fn=<SqueezeBackward0>)


doc/introduction/interfaces/torch.rst

josh146 · 2021-10-07T13:44:05Z

doc/introduction/interfaces/torch.rst

+>>> timeit.timeit("circuit_cuda(params)", globals=globals(), number=5)
+2.297812332981266


josh146 · 2021-10-07T13:45:07Z

pennylane/devices/default_qubit_torch.py

@@ -176,17 +176,27 @@ def circuit(x):
    _norm = staticmethod(torch.norm)
    _flatten = staticmethod(torch.flatten)

-    def __init__(self, wires, *, shots=None, analytic=None, torch_device="cpu"):
-        self._torch_device = torch_device
+    def __init__(self, wires, *, shots=None, analytic=None, torch_device=None):


Can the torch_device argument be removed now?

oh yeah... probably.

josh146 · 2021-10-07T13:47:01Z

pennylane/devices/default_qubit_torch.py

-    def _asarray(a, dtype=None):
+    def execute(self, circuit, **kwargs):
+        ops_and_obs = circuit.operations + circuit.observables
+        if any(data.is_cuda for op in ops_and_obs for data in op.data if hasattr(data, "is_cuda")):


Nice solution!

I guess only one downside is that for a large circuit on the CPU, this for loop might take a while? Maybe something we could consider for the Operator overhaul (e.g., op.is_cuda)

Maybe we can make this check on the tape. The torch interface tape mixin could override _update_par_info and the tape itself could have a _torch_device attribute.

josh146 · 2021-10-07T13:48:51Z

pennylane/devices/default_qubit_torch.py

        return res

+    _cast = _asarray


How come this was changed?

@antalszava ?

I haven't found any reason to distinguish _cast and _asarray and followed the example we have for default.qubit.jax, where they are identical. The reason why the bug arose in the first place was that _asarray was not doing what _cast was doing already: calling torch.as_tensor by also passing device=self._torch_device. Now this conversion has been added to the end of _asarray, so it seemed like we could just make _cast be the same as _asarray.

DefaultQubit is basically calling _asarray under the hood, hence we needed to add torch.as_tensor + device=self._torch_device there (instead of simply using _cast).

josh146 · 2021-10-07T13:49:15Z

tests/gpu/test_torch.py

+torch = pytest.importorskip("torch")
+if not torch.cuda.is_available():
+    pytest.skip("cuda not available")


Oh wait, this line is throwing an error:

Using pytest.skip outside of a test is not allowed. To decorate a test function, use the @pytest.mark.skip or @pytest.mark.skipif decorators instead, and to skip a module use `pytestmark = pytest.mark.{skip,skipif}.

Co-authored-by: Josh Izaac <josh146@gmail.com>

…lane into fix_torch_cuda

doc/introduction/interfaces/torch.rst

pennylane/qnode.py

…ument

antalszava added 6 commits September 29, 2021 15:00

fix _asarray

df5b14b

update_device_options in QNode

a5f72c5

Add test for QNode switching to CUDA usage for the torch device

1714506

Test warning

42b2ce6

Test warning

86f37eb

docstring

b451db8

antalszava added 3 commits September 29, 2021 17:13

Update interface intro

dfabcef

formatting

91b7fad

changelog

dc4e144

antalszava changed the title ~~Fix GPU usage with default.qubit.torch~~ Fix GPU usage with default.qubit.torch for qml.qnn.TorchLayer Sep 29, 2021

antalszava added 2 commits September 29, 2021 17:51

hasattr

4ebf1c6

format

49a84c0

antalszava requested review from albi3ro and mlxd September 29, 2021 21:51

josh146 reviewed Sep 30, 2021

View reviewed changes

mlxd reviewed Sep 30, 2021

View reviewed changes

albi3ro and others added 4 commits October 6, 2021 13:26

check cuda in device execute

93c4f85

some test changes

b66e4fe

add testing, edit documentation

8e58356

Merge branch 'master' into fix_torch_cuda

9281e5f

albi3ro requested a review from josh146 October 7, 2021 13:06

josh146 approved these changes Oct 7, 2021

View reviewed changes

albi3ro and others added 3 commits October 7, 2021 15:29

Apply suggestions from code review

68179af

Co-authored-by: Josh Izaac <josh146@gmail.com>

actaully skip the tests

802d490

Merge branch 'master' into fix_torch_cuda

a304d52

albi3ro added 3 commits October 7, 2021 19:32

black

bc2dd9a

Merge branch 'fix_torch_cuda' of https://github.com/PennyLaneAI/penny…

9179c98

…lane into fix_torch_cuda

fix tests

0e2c628

antalszava commented Oct 7, 2021

View reviewed changes

doc/introduction/interfaces/torch.rst Outdated Show resolved Hide resolved

Update doc/introduction/interfaces/torch.rst

756fb56

antalszava commented Oct 7, 2021

View reviewed changes

pennylane/qnode.py Outdated Show resolved Hide resolved

antalszava added 3 commits October 7, 2021 16:00

Update pennylane/qnode.py

77f407c

add item to breaking changes to about removing the torch_device arg…

82eaa7d

…ument

PR number

921112a

antalszava mentioned this pull request Oct 7, 2021

Update torch.rst #1704

Closed

Merge branch 'master' into fix_torch_cuda

da1a7e0

josh146 merged commit 05b1987 into master Oct 8, 2021

josh146 deleted the fix_torch_cuda branch October 8, 2021 06:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix GPU usage with `default.qubit.torch` for `qml.qnn.TorchLayer` #1705

Fix GPU usage with `default.qubit.torch` for `qml.qnn.TorchLayer` #1705

antalszava commented Sep 29, 2021 •

edited

Loading

github-actions bot commented Sep 29, 2021

antalszava commented Sep 29, 2021

codecov bot commented Sep 29, 2021 •

edited

Loading

josh146 Sep 30, 2021

antalszava Sep 30, 2021

josh146 Sep 30, 2021

josh146 Sep 30, 2021

antalszava Sep 30, 2021

josh146 Sep 30, 2021

josh146 Sep 30, 2021

antalszava Sep 30, 2021 •

edited

Loading

mlxd left a comment

josh146 left a comment

josh146 Oct 7, 2021

josh146 Oct 7, 2021

josh146 Oct 7, 2021

albi3ro Oct 7, 2021

josh146 Oct 7, 2021

josh146 Oct 7, 2021

albi3ro Oct 7, 2021

josh146 Oct 7, 2021

albi3ro Oct 7, 2021

antalszava Oct 7, 2021

antalszava Oct 7, 2021

josh146 Oct 7, 2021

josh146 Oct 7, 2021

		phi = torch.tensor([0.011, 0.012], requires_grad=True, device='cuda')
		theta = torch.tensor(0.05, requires_grad=True, device='cuda')

		>>> timeit.timeit("circuit_cuda(params)", globals=globals(), number=5)
		2.297812332981266

Fix GPU usage with default.qubit.torch for qml.qnn.TorchLayer #1705

Fix GPU usage with default.qubit.torch for qml.qnn.TorchLayer #1705

Conversation

antalszava commented Sep 29, 2021 • edited Loading

github-actions bot commented Sep 29, 2021

antalszava commented Sep 29, 2021

codecov bot commented Sep 29, 2021 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

antalszava Sep 30, 2021 • edited Loading

Choose a reason for hiding this comment

mlxd left a comment

Choose a reason for hiding this comment

josh146 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Fix GPU usage with `default.qubit.torch` for `qml.qnn.TorchLayer` #1705

Fix GPU usage with `default.qubit.torch` for `qml.qnn.TorchLayer` #1705

antalszava commented Sep 29, 2021 •

edited

Loading

codecov bot commented Sep 29, 2021 •

edited

Loading

antalszava Sep 30, 2021 •

edited

Loading