-
Notifications
You must be signed in to change notification settings - Fork 575
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix GPU usage with default.qubit.torch
for qml.qnn.TorchLayer
#1705
Conversation
Hello. You may have forgotten to update the changelog!
|
default.qubit.torch
default.qubit.torch
for qml.qnn.TorchLayer
[ch9374] |
Codecov Report
@@ Coverage Diff @@
## master #1705 +/- ##
==========================================
- Coverage 99.22% 99.21% -0.01%
==========================================
Files 204 204
Lines 15424 15429 +5
==========================================
+ Hits 15304 15308 +4
- Misses 120 121 +1
Continue to review full report at Codecov.
|
phi = torch.tensor([0.011, 0.012], requires_grad=True, device='cuda') | ||
theta = torch.tensor(0.05, requires_grad=True, device='cuda') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@antalszava I don't think we should use device='cuda'
explicitly in the documentation, since this will fail for users that don't have CUDA installed or GPU access. For example, if I run this snippet locally, I get:
AssertionError: Torch not compiled with CUDA enabled
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point! Made the change as the text states
with the classical nodes processed on a GPU
and the output was not put on CUDA devices. Maybe having
dev = "cuda" torch.cuda.is_available() else "cpu"
phi = torch.tensor([0.011, 0.012], requires_grad=True, device=dev)
theta = torch.tensor(0.05, requires_grad=True, device=dev)
could help that the example is executable even in such cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤔 This is a case where I would probably suggest removing all references to the GPU in this page altogether. Users can refer to PyTorch docs for using GPUs, but we shouldn't have code examples in our docs that not all users can run
if torch_device is None: | ||
self._torch_device = "cpu" | ||
self._user_def_torch_device = False | ||
else: | ||
self._torch_device = torch_device | ||
self._user_def_torch_device = True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A nice feature about the Torch device
argument is that it also treats None
as the default device :)
>>> x = torch.tensor(0.2, device=None)
>>> x.device
device(type='cpu')
So you can probably remove this branching here, and simply store
self._torch_device = torch_device
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh didn't know that, thank you! 🙂 One piece of information that's also being stored is self._user_def_torch_device
. We'd like to know if the user has explicitly set a Torch device because the QNode creates a default.qubit.torch
device under the hood. So maybe it's good to use the branching logic for that. 🤔
pennylane/qnode.py
Outdated
# Check if we should be using CUDA | ||
ops_and_obs = self.qtape.operations + self.qtape.observables | ||
any_op_uses_cuda = any( | ||
data.is_cuda for op in ops_and_obs for data in op.data if hasattr(data, "is_cuda") | ||
) | ||
|
||
if any_op_uses_cuda and self.device._torch_device == "cpu": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@antalszava I must admit, this feels like a check that shouldn't be in the QNode, but instead be part of the default.qubit.torch
device.
Instead of passing torch_device=...
to the Device, could we not move this logic to the device instead? E.g.,
def apply(circuit,...):
ops_and_obs = circuit.operations + circuit.observables
any_op_uses_cuda = any(
data.is_cuda for op in ops_and_obs for data in op.data if hasattr(data, "is_cuda")
)
...
This way:
- The QNode avoids having interface and device specific logic
- The Device simply chooses the torch device to use by checking what torch devices the received circuit uses, which feels much cleaner :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Essentially: the default.qubit.torch
device should check, based on the input circuit, whether it is a GPU circuit. If it is, all tensors the device creates should also be on the same device.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@josh146 that's a fair idea, frankly, I had this logic in the execute
method first. The reason why I ended up putting this in the QNode is that the need for a change came from the QNode itself. At the moment, the QNode "silently" creates a default.qubit.torch
device with the default torch_device='cpu'
option, while the circuit might be using the GPU. This is only happening because the QNode swaps out default.qubit
to a default.qubit.torch
device. When the user explicitly instanties a default.qubit.torch
device, the responsibility is on them to choose the correct torch_device
argument matching their circuit (GPU or CPU).
The QNode avoids having interface and device specific logic
We'll likely have to include some logic sooner or later in the QNode because right now the QNode instantiates all backprop devices with default arguments. It feels like we're offloading the duty of the QNode getting the correct device options onto the devices, just to preserve the workflow of the QNode:
a) creating a device with default options, but then
b) the device itself updates those device options under the hood.
The Device simply chooses the torch device to use by checking what torch devices the received circuit uses, which feels much cleaner :)
While this sounds good to have, frankly I'd think we're making up for the poor use of the device by the user if we have logic like this in the device.
All in all, I'd argue keeping the logic in the QNode, as that feels like the place where this logic belongs, just because it's a QNode use case. Having said that, there are indeed merits to moving this to the device. so keen to hear more on this (likely it's also a design decision that will apply in the future for other cases). 🙂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice one @antalszava
I can confirm the issue no longer appears for CUDA or CPU devices:
python ./test_cpu.py
/home/mlxd/DELME/pyenv/lib/python3.9/site-packages/torch/autograd/__init__.py:147: UserWarning: Casting complex values to real discards the imaginary part (Triggered internally at ../aten/src/ATen/native/Copy.cpp:240.)
Variable._execution_engine.run_backward(
Average loss over epoch 1: 0.3317
Average loss over epoch 2: 0.2200
python ./test_gpu.py
[W Copy.cpp:240] Warning: Casting complex values to real discards the imaginary part (function operator())
Average loss over epoch 1: 0.3755
Average loss over epoch 2: 0.2203
Though I did notice the GPU version taking quite a long time compared to the CPU.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good on my end! Thanks everyone for fixing this so fast.
Only requirements from my side are:
- Minor suggestions to the documentation
- Fixing the tests, linting, and black 🙂
>>> phi_final | ||
tensor([0.7345, 0.0120], requires_grad=True) | ||
>>> theta_final | ||
tensor(0.8316, requires_grad=True) | ||
>>> circuit4(phi_final, theta_final) | ||
tensor(0.5000, dtype=torch.float64, grad_fn=<SqueezeBackward0>) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
>>> timeit.timeit("circuit_cuda(params)", globals=globals(), number=5) | ||
2.297812332981266 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
😍
@@ -176,17 +176,27 @@ def circuit(x): | |||
_norm = staticmethod(torch.norm) | |||
_flatten = staticmethod(torch.flatten) | |||
|
|||
def __init__(self, wires, *, shots=None, analytic=None, torch_device="cpu"): | |||
self._torch_device = torch_device | |||
def __init__(self, wires, *, shots=None, analytic=None, torch_device=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can the torch_device
argument be removed now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh yeah... probably.
def _asarray(a, dtype=None): | ||
def execute(self, circuit, **kwargs): | ||
ops_and_obs = circuit.operations + circuit.observables | ||
if any(data.is_cuda for op in ops_and_obs for data in op.data if hasattr(data, "is_cuda")): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice solution!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess only one downside is that for a large circuit on the CPU, this for loop might take a while? Maybe something we could consider for the Operator overhaul (e.g., op.is_cuda
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can make this check on the tape. The torch interface tape mixin could override _update_par_info
and the tape itself could have a _torch_device
attribute.
return res | ||
|
||
_cast = _asarray |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How come this was changed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't found any reason to distinguish _cast
and _asarray
and followed the example we have for default.qubit.jax
, where they are identical. The reason why the bug arose in the first place was that _asarray
was not doing what _cast
was doing already: calling torch.as_tensor
by also passing device=self._torch_device
. Now this conversion has been added to the end of _asarray
, so it seemed like we could just make _cast
be the same as _asarray
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DefaultQubit
is basically calling _asarray
under the hood, hence we needed to add torch.as_tensor
+ device=self._torch_device
there (instead of simply using _cast
).
tests/gpu/test_torch.py
Outdated
torch = pytest.importorskip("torch") | ||
if not torch.cuda.is_available(): | ||
pytest.skip("cuda not available") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh wait, this line is throwing an error:
Using pytest.skip outside of a test is not allowed.
To decorate a test function, use the @pytest.mark.skip or @pytest.mark.skipif decorators instead,
and to skip a module use `pytestmark = pytest.mark.{skip,skipif}.
Co-authored-by: Josh Izaac <josh146@gmail.com>
Context
In PennyLane v0.18.0, the
default.qubit.torch
device was added. This device is used under the hood by QNodes for backpropagation with Torch tensors. The QNode, however, doesn't have a logic for checking what thetorch_device
keyword argument should be when creating a backdrop device and instantiatesdefault.qubit.torch
with default arguments (torch_device='cpu'
). That is, there is no opportunity to specify using the GPU.Therefore, there may be issues with passing Torch tensors using CUDA as QNode arguments (see #1688). What we would like to have in such cases is equivalent to
dev = qml.device('default.qubit.torch', wires=3, torch_device='cuda')
.Further to 1., the
default.qubit.torch
device uses the internal methods ofDefaultQubit
in many cases. These methods use the_asarray
method. One detail of_asarray
is that it doesn't consider the Torch device that's used, hence computations might shift from the GPU to CPU.Changes
torch_device='cuda'
internally when creating adev = qml.device('default.qubit.torch', wires=3)
device for backpropagation._asarray
method to always put the output on the Torch device specified internally;Related issues
Closes #1688.