Change the default gradient method logic to prefer devices, and to allow backprop on child devices #1008

josh146 · 2021-01-18T11:48:25Z

Context:

When using simulators, backpropagation scales significantly better for computing gradients than the parameter-shift rule as the number of parameters increases:

However, to use backprop, you need to explicitly load default.qubit.tf or default.qubit.autograd, which can cause the ability to perform backprop be a bit 'hidden'.

Description of the Change:

The logic for choosing the 'best' differentiation method has been altered to improve performance.

If the device provides its own gradient, this is now the preferred differentiation method.
If a device provides child devices that natively support classical backpropagation, this is now preferred over the parameter-shift rule.

Devices define child devices via their capabilities() dictionary. For example, default.qubit supports child devices for TensorFlow, Autograd, and Jax:
```
{
  "passthru_devices": {
      "tf": "default.qubit.tf",
      "autograd": "default.qubit.autograd",
      "jax": "default.qubit.jax"
  },
}
```

As a result of this change, if the QNode diff_method is not explicitly provided, it is possible that the QNode will run on a child device of the device that was specifically provided:

dev = qml.device("default.qubit", wires=2)
qml.QNode(dev) # will default to backprop on default.qubit.autograd
qml.QNode(dev, interface="tf") # will default to backprop on default.qubit.tf
qml.QNode(dev, interface="jax") # will default to backprop on default.qubit.jax

Benefits:

Users will see a speed up during optimization tasks, as backprop will now be the default rather than the parameter-shift if using default.qubit.

Possible Drawbacks:

For circuits executed on the CPU, there can be some overhead on the forward pass when using TensorFlow, since intermediate values of the computation is being stored.
The QNode is using a different device to that provided by the user (dev). As a result, calling dev.state() after executing the QNode will not provide expected results. However, you can still call qnode.device.state. Since we aim to deprecate the device methods from being user facing, I'm not too worried about this change.
Backprop is only supported if analytic=True. If this is not the case, the logic simply fallsback to parameter-shift.

Related GitHub Issues: n/a

…low backprop on child devices

josh146 · 2021-01-18T12:12:36Z

It looks like there are quite a few tests failing, because they don't specify diff_method= when creating the QNode, and just assume a parameter-shift QNode is being created.

These should be easily fixed by adding diff_method="parameter-shift" and making the tests explicit.

josh146 · 2021-01-18T12:45:25Z

Note: this PR is worth benchmarking, to ensure that it provides a better experience for users using the default settings, e.g.,

dev = qml.device("default.qubit", wires=2)
@qml.qnode(dev)

I imagine some demos might also need to be updated for this change 🤔

josh146 · 2021-01-18T12:49:59Z

pennylane/tape/qnode.py

+        self._tape, self.interface, diff_method, self.device = self.get_tape(
+            device, interface, diff_method
+        )


Since it is now possible for the type of tape to be able to change the device :)

josh146 · 2021-01-18T12:50:20Z

pennylane/devices/default_qubit_autograd.py

@@ -92,6 +92,7 @@ class DefaultQubitAutograd(DefaultQubit):
        "CRX": autograd_ops.CRX,
        "CRY": autograd_ops.CRY,
        "CRZ": autograd_ops.CRZ,
+        "CRot": autograd_ops.CRot,


Noticed this was missing due to tests suddenly failing in the new backprop default!

codecov · 2021-01-18T13:02:10Z

Codecov Report

Merging #1008 (5594a4f) into master (dde4f15) will increase coverage by 0.00%.
The diff coverage is 97.22%.

@@           Coverage Diff           @@
##           master    #1008   +/-   ##
=======================================
  Coverage   97.91%   97.92%           
=======================================
  Files         151      151           
  Lines       11201    11220   +19     
=======================================
+ Hits        10968    10987   +19     
  Misses        233      233

Impacted Files	Coverage Δ
pennylane/devices/default_qubit.py	`100.00% <ø> (ø)`
pennylane/tape/qnode.py	`97.92% <97.14%> (-0.35%)`	⬇️
pennylane/devices/default_qubit_autograd.py	`98.21% <100.00%> (ø)`
pennylane/devices/autograd_ops.py	`100.00% <0.00%> (+2.50%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update dde4f15...5594a4f. Read the comment docs.

trbromley

Thanks @josh146! Looks great, I had a few questions but would be happy to approve!

.github/CHANGELOG.md

pennylane/tape/qnode.py

co9olguy · 2021-01-18T22:13:35Z

It looks like there are quite a few tests failing, because they don't specify diff_method= when creating the QNode, and just assume a parameter-shift QNode is being created.

These should be easily fixed by adding diff_method="parameter-shift" and making the tests explicit.

Isn't this also an indication that there are some "rough edges" in the codebase itself (not just the test suite)? Shouldn't all standard use-cases work without caring about the specific diff method?

chaserileyroberts · 2021-01-18T23:23:45Z

FYI: This change has awesome speedups for autograd, but seems to slow down the circuit evaluation a lot.

$ asv compare master default-gradient

All benchmarks:

       before           after         ratio
     [599e3186]       [b672829e]
     <master>         <default-gradient>
+      6.19±0.3ms       13.1±0.1ms     2.12  asv.core_suite.CircuitEvaluation.time_circuit(10, 3)
+      11.2±0.5ms       25.1±0.4ms     2.24  asv.core_suite.CircuitEvaluation.time_circuit(10, 6)
+      15.8±0.6ms       36.9±0.7ms     2.33  asv.core_suite.CircuitEvaluation.time_circuit(10, 9)
+     1.88±0.07ms       3.83±0.1ms     2.04  asv.core_suite.CircuitEvaluation.time_circuit(2, 3)
+      2.68±0.1ms       5.98±0.2ms     2.23  asv.core_suite.CircuitEvaluation.time_circuit(2, 6)
+      3.51±0.1ms       7.71±0.3ms     2.20  asv.core_suite.CircuitEvaluation.time_circuit(2, 9)
+      3.18±0.1ms       7.11±0.3ms     2.24  asv.core_suite.CircuitEvaluation.time_circuit(5, 3)
+      5.23±0.2ms      12.3±0.04ms     2.35  asv.core_suite.CircuitEvaluation.time_circuit(5, 6)
+      7.37±0.3ms       17.3±0.4ms     2.35  asv.core_suite.CircuitEvaluation.time_circuit(5, 9)
-      7.82±0.4ms       4.93±0.2ms     0.63  asv.core_suite.GradientComputation.time_gradient_autograd(2, 3)
-      21.6±0.7ms       8.09±0.3ms     0.37  asv.core_suite.GradientComputation.time_gradient_autograd(2, 6)
-        40.3±1ms       11.3±0.4ms     0.28  asv.core_suite.GradientComputation.time_gradient_autograd(5, 3)
-         142±2ms       19.4±0.8ms     0.14  asv.core_suite.GradientComputation.time_gradient_autograd(5, 6)
+      10.8±0.1ms       16.3±0.1ms     1.51  asv.core_suite.GradientComputation.time_gradient_tf(2, 3)
       27.0±0.3ms       29.4±0.3ms     1.09  asv.core_suite.GradientComputation.time_gradient_tf(2, 6)
       46.2±0.2ms         47.1±3ms     1.02  asv.core_suite.GradientComputation.time_gradient_tf(5, 3)
-         154±2ms       88.8±0.7ms     0.58  asv.core_suite.GradientComputation.time_gradient_tf(5, 6)
       8.01±0.3ms       8.11±0.2ms     1.01  asv.core_suite.GradientComputation.time_gradient_torch(2, 3)
       22.2±0.9ms       21.3±0.2ms     0.96  asv.core_suite.GradientComputation.time_gradient_torch(2, 6)
       39.3±0.6ms       39.4±0.1ms     1.00  asv.core_suite.GradientComputation.time_gradient_torch(5, 3)
        144±0.6ms          140±2ms     0.97  asv.core_suite.GradientComputation.time_gradient_torch(5, 6)

thisac

Looks nice @josh146. Also curious about the things that @trbromley commented on. Only, not sure exactly why the parameter shift diff method needs to be set for some tests.

Also, noticed that the docstrings for get_tape , and some of the following methods, need to be updated. The Returns: should include the returned device and the priority order should be switched for backprop and device in get_tape.

pennylane/tape/qnode.py

tests/collections/test_collections.py

tests/tape/tapes/test_qnode.py

josh146 · 2021-01-19T04:03:58Z

Isn't this also an indication that there are some "rough edges" in the codebase itself (not just the test suite)? Shouldn't all standard use-cases work without caring about the specific diff method?

@co9olguy: yes but also no!

A majority of the cases (~75), the test was explicitly assuming parameter-shift without passing diff_method="parameter-shift". That is, the test was explicitly mocking out a low level parameter-shift, or it was assuming an intermediate interface performing tensor unwrapping, or it was actually testing the parameter-shift gradient.

In these cases, the tests should have been specifying diff_method="parameter-shift" anyway, so I added it in. This makes these tests more resilient to future changes of the default.
There were ~30 tests where this was not the case. These tests were high-level integration tests that should have worked independently of the differentiation method. In these cases, I tracked down the bugs in the source code and fixed them, so that the tests passed without needing to specify diff_method.

Considering we have almost 6000 test cases, the fact that only 100 failed after changing the default gradient method was very relieving to me 🙂 I thought more would fail.

FYI: This change has awesome speedups for autograd, but seems to slow down the circuit evaluation a lot.

@Thenerdstation: @mariaschuld noticed the same thing. Here are some more benchmarks (master on the left):

A couple of notes:

The forward pass evaluation is slower (~2x slower)
Computing gradients with Autograd is waaay faster! Almost 10x faster for larger circuits.
Computing gradients with TF is about 2-3x faster.

Just from intuition, I'm imagining that the slower forward pass is simply due to the overhead of storing intermediate statevectors in memory. I also imagine we have something like the following:

Edit: I labelled the axes backwards, please reverse the labels in your head :)

That is, the forward pass overhead dominates for fewer parameters, but the parameter-shift scaling dominates for more parameters.

Question: Is this a better default?

If the user is simply doing forward passes, then no. However, if they do not care about gradients, they could always specify interface=None, which removes all gradient overhead from the computation (this should even be faster that the parameter-shift current default).
If the user is doing optimization, then this default should be faster (autograd gradients are ~10x faster)

Also, noticed that the docstrings for get_tape , and some of the following methods, need to be updated. The Returns: should include the returned device and the priority order should be switched for backprop and device in get_tape.

Thanks for catching this @thisac! Have updated the docstrings in 74720de.

pennylane/tape/qnode.py

Co-authored-by: Tom Bromley <49409390+trbromley@users.noreply.github.com>

trbromley

Thanks @josh146, I just want to have a quick go at testing this for some simple examples and then I can approve.

trbromley · 2021-01-19T22:52:07Z

pennylane/tape/qnode.py

+        # Store the differentiation method to be passed to JacobianTape.jacobian().
+        # Note that the tape accepts a different set of allowed methods than the QNode:
+        #     best, analytic, numeric, device


No problem for this PR, but what is the reason for the difference in method names between the QNode and the Jacobian tape?

It's purely historical, they've always deviated 🙁 I wish it were not the case, and wouldn't mind changing it.

trbromley · 2021-01-19T22:54:04Z

pennylane/tape/qnode.py

@@ -249,21 +257,39 @@ def _validate_backprop_method(device, interface):
        # determine if the device supports backpropagation
        backprop_interface = device.capabilities().get("passthru_interface", None)

+        # determine if the device has any child devices that support backpropagation
+        backprop_devices = device.capabilities().get("passthru_devices", None)


Curious how this line wouldn't be covered by tests, when following ones are 🤔 Maybe it's a bug?

Yeah, this doesn't seem possible 🤔

trbromley

Thanks @josh146 💯

trbromley · 2021-01-20T22:29:10Z

pennylane/tape/qnode.py

+                self._tape, interface, diff_method, self.device = self.get_tape(
+                    self._original_device, "torch", self.diff_method
+                )


Ohh wait, I think I get these parts now! The idea is if we change the interface, it'll automatically check if it can change the device under the hood to ideally maintain backprop support?

Do we need to do this for torch?

Yep, because this allows us to go backward 🙂

E.g., a QNode running on default.qubit.tf and backprop will be able to be converted to a QNode running on default.qubit with parameter-shift!

Even better, if the user explicitly asked for backprop, the QNode remembers that preference (since we save it as self.diff_method), and calling qnode.to_torch() will raise an error since backprop is not supported. If self.diff_method was best, it will work with no issue.

josh146 added 2 commits January 18, 2021 19:34

Change the default gradient method logic to prefer devices, and to al…

1405acf

…low backprop on child devices

update changelog

ce8e3d2

josh146 added the review-ready 👌 PRs which are ready for review by someone from the core team. label Jan 18, 2021

josh146 requested review from thisac and trbromley January 18, 2021 11:48

josh146 added 2 commits January 18, 2021 19:49

linting

bdb7026

more tests

1f92857

josh146 added 4 commits January 18, 2021 20:18

fixes

17c8779

test fixes

4d44f6c

test fixes

acd8808

fix last failing test

d2cc186

josh146 commented Jan 18, 2021

View reviewed changes

fix last failing test

1fa3571

update changelog

944a9f1

PennyLaneAI deleted a comment from shortcut-integration bot Jan 18, 2021

trbromley reviewed Jan 18, 2021

View reviewed changes

Merge branch 'master' into default-gradient

b672829

thisac reviewed Jan 19, 2021

View reviewed changes

pennylane/tape/qnode.py Show resolved Hide resolved

pennylane/tape/qnode.py Show resolved Hide resolved

tests/collections/test_collections.py Outdated Show resolved Hide resolved

tests/tape/tapes/test_qnode.py Show resolved Hide resolved

josh146 commented Jan 19, 2021

View reviewed changes

pennylane/tape/qnode.py Show resolved Hide resolved

josh146 commented Jan 19, 2021

View reviewed changes

pennylane/tape/qnode.py Outdated Show resolved Hide resolved

josh146 commented Jan 19, 2021

View reviewed changes

pennylane/tape/qnode.py Outdated Show resolved Hide resolved

josh146 and others added 2 commits January 19, 2021 12:31

Apply suggestions from code review

500560f

Co-authored-by: Tom Bromley <49409390+trbromley@users.noreply.github.com>

suggested changes

8d05b78

josh146 added 2 commits January 19, 2021 14:08

changelog

cb6d91b

update docstrings

74720de

trbromley requested review from thisac and trbromley January 19, 2021 22:47

trbromley reviewed Jan 19, 2021

View reviewed changes

josh146 added 2 commits January 20, 2021 12:34

Merge branch 'master' into default-gradient

149dca1

suggested changes

905845a

trbromley approved these changes Jan 20, 2021

View reviewed changes

Merge branch 'master' into default-gradient

5594a4f

josh146 merged commit 87d302c into master Jan 21, 2021

josh146 deleted the default-gradient branch January 21, 2021 17:48

mariaschuld mentioned this pull request Jan 28, 2021

Fix bug in tracking the number of device executions #1046

Merged

antalszava mentioned this pull request Jan 28, 2021

Update the state original tape device #1049

Merged

josh146 mentioned this pull request Mar 10, 2021

Jax tutorial PennyLaneAI/qml#229

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Change the default gradient method logic to prefer devices, and to allow backprop on child devices #1008

Change the default gradient method logic to prefer devices, and to allow backprop on child devices #1008

josh146 commented Jan 18, 2021 •

edited

Loading

josh146 commented Jan 18, 2021

josh146 commented Jan 18, 2021 •

edited

Loading

josh146 Jan 18, 2021

josh146 Jan 18, 2021

codecov bot commented Jan 18, 2021 •

edited

Loading

trbromley left a comment

co9olguy commented Jan 18, 2021

chaserileyroberts commented Jan 18, 2021

thisac left a comment

josh146 commented Jan 19, 2021 •

edited

Loading

trbromley left a comment

trbromley Jan 19, 2021

josh146 Jan 20, 2021

trbromley Jan 19, 2021

josh146 Jan 20, 2021

trbromley left a comment

trbromley Jan 20, 2021

josh146 Jan 21, 2021

Change the default gradient method logic to prefer devices, and to allow backprop on child devices #1008

Change the default gradient method logic to prefer devices, and to allow backprop on child devices #1008

Conversation

josh146 commented Jan 18, 2021 • edited Loading

josh146 commented Jan 18, 2021

josh146 commented Jan 18, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov bot commented Jan 18, 2021 • edited Loading

Codecov Report

trbromley left a comment

Choose a reason for hiding this comment

co9olguy commented Jan 18, 2021

chaserileyroberts commented Jan 18, 2021

thisac left a comment

Choose a reason for hiding this comment

josh146 commented Jan 19, 2021 • edited Loading

trbromley left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

trbromley left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

josh146 commented Jan 18, 2021 •

edited

Loading

josh146 commented Jan 18, 2021 •

edited

Loading

codecov bot commented Jan 18, 2021 •

edited

Loading

josh146 commented Jan 19, 2021 •

edited

Loading