Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change the default gradient method logic to prefer devices, and to allow backprop on child devices #1008

Merged
merged 18 commits into from
Jan 21, 2021

Conversation

josh146
Copy link
Member

@josh146 josh146 commented Jan 18, 2021

Context:

When using simulators, backpropagation scales significantly better for computing gradients than the parameter-shift rule as the number of parameters increases:

image

However, to use backprop, you need to explicitly load default.qubit.tf or default.qubit.autograd, which can cause the ability to perform backprop be a bit 'hidden'.

Description of the Change:

The logic for choosing the 'best' differentiation method has been altered to improve performance.

  • If the device provides its own gradient, this is now the preferred differentiation method.

  • If a device provides child devices that natively support classical backpropagation, this is now preferred over the parameter-shift rule.

    Devices define child devices via their capabilities() dictionary. For example, default.qubit supports child devices for TensorFlow, Autograd, and Jax:

    {
      "passthru_devices": {
          "tf": "default.qubit.tf",
          "autograd": "default.qubit.autograd",
          "jax": "default.qubit.jax"
      },
    }

As a result of this change, if the QNode diff_method is not explicitly provided, it is possible that the QNode will run on a child device of the device that was specifically provided:

dev = qml.device("default.qubit", wires=2)
qml.QNode(dev) # will default to backprop on default.qubit.autograd
qml.QNode(dev, interface="tf") # will default to backprop on default.qubit.tf
qml.QNode(dev, interface="jax") # will default to backprop on default.qubit.jax

Benefits:

  • Users will see a speed up during optimization tasks, as backprop will now be the default rather than the parameter-shift if using default.qubit.

Possible Drawbacks:

  • For circuits executed on the CPU, there can be some overhead on the forward pass when using TensorFlow, since intermediate values of the computation is being stored.

  • The QNode is using a different device to that provided by the user (dev). As a result, calling dev.state() after executing the QNode will not provide expected results. However, you can still call qnode.device.state. Since we aim to deprecate the device methods from being user facing, I'm not too worried about this change.

  • Backprop is only supported if analytic=True. If this is not the case, the logic simply fallsback to parameter-shift.

Related GitHub Issues: n/a

@josh146 josh146 added the review-ready 👌 PRs which are ready for review by someone from the core team. label Jan 18, 2021
@josh146
Copy link
Member Author

josh146 commented Jan 18, 2021

It looks like there are quite a few tests failing, because they don't specify diff_method= when creating the QNode, and just assume a parameter-shift QNode is being created.

These should be easily fixed by adding diff_method="parameter-shift" and making the tests explicit.

@josh146
Copy link
Member Author

josh146 commented Jan 18, 2021

Note: this PR is worth benchmarking, to ensure that it provides a better experience for users using the default settings, e.g.,

dev = qml.device("default.qubit", wires=2)
@qml.qnode(dev)

I imagine some demos might also need to be updated for this change 🤔

Comment on lines +148 to +150
self._tape, self.interface, diff_method, self.device = self.get_tape(
device, interface, diff_method
)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since it is now possible for the type of tape to be able to change the device :)

@@ -92,6 +92,7 @@ class DefaultQubitAutograd(DefaultQubit):
"CRX": autograd_ops.CRX,
"CRY": autograd_ops.CRY,
"CRZ": autograd_ops.CRZ,
"CRot": autograd_ops.CRot,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noticed this was missing due to tests suddenly failing in the new backprop default!

@codecov
Copy link

codecov bot commented Jan 18, 2021

Codecov Report

Merging #1008 (5594a4f) into master (dde4f15) will increase coverage by 0.00%.
The diff coverage is 97.22%.

Impacted file tree graph

@@           Coverage Diff           @@
##           master    #1008   +/-   ##
=======================================
  Coverage   97.91%   97.92%           
=======================================
  Files         151      151           
  Lines       11201    11220   +19     
=======================================
+ Hits        10968    10987   +19     
  Misses        233      233           
Impacted Files Coverage Δ
pennylane/devices/default_qubit.py 100.00% <ø> (ø)
pennylane/tape/qnode.py 97.92% <97.14%> (-0.35%) ⬇️
pennylane/devices/default_qubit_autograd.py 98.21% <100.00%> (ø)
pennylane/devices/autograd_ops.py 100.00% <0.00%> (+2.50%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update dde4f15...5594a4f. Read the comment docs.

@PennyLaneAI PennyLaneAI deleted a comment from shortcut-integration bot Jan 18, 2021
Copy link
Contributor

@trbromley trbromley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @josh146! Looks great, I had a few questions but would be happy to approve!

.github/CHANGELOG.md Show resolved Hide resolved
.github/CHANGELOG.md Outdated Show resolved Hide resolved
pennylane/tape/qnode.py Show resolved Hide resolved
pennylane/tape/qnode.py Show resolved Hide resolved
pennylane/tape/qnode.py Show resolved Hide resolved
pennylane/tape/qnode.py Show resolved Hide resolved
pennylane/tape/qnode.py Outdated Show resolved Hide resolved
pennylane/tape/qnode.py Outdated Show resolved Hide resolved
@co9olguy
Copy link
Member

It looks like there are quite a few tests failing, because they don't specify diff_method= when creating the QNode, and just assume a parameter-shift QNode is being created.

These should be easily fixed by adding diff_method="parameter-shift" and making the tests explicit.

Isn't this also an indication that there are some "rough edges" in the codebase itself (not just the test suite)? Shouldn't all standard use-cases work without caring about the specific diff method?

@chaserileyroberts
Copy link
Contributor

FYI: This change has awesome speedups for autograd, but seems to slow down the circuit evaluation a lot.

$ asv compare master default-gradient

All benchmarks:

       before           after         ratio
     [599e3186]       [b672829e]
     <master>         <default-gradient>
+      6.19±0.3ms       13.1±0.1ms     2.12  asv.core_suite.CircuitEvaluation.time_circuit(10, 3)
+      11.2±0.5ms       25.1±0.4ms     2.24  asv.core_suite.CircuitEvaluation.time_circuit(10, 6)
+      15.8±0.6ms       36.9±0.7ms     2.33  asv.core_suite.CircuitEvaluation.time_circuit(10, 9)
+     1.88±0.07ms       3.83±0.1ms     2.04  asv.core_suite.CircuitEvaluation.time_circuit(2, 3)
+      2.68±0.1ms       5.98±0.2ms     2.23  asv.core_suite.CircuitEvaluation.time_circuit(2, 6)
+      3.51±0.1ms       7.71±0.3ms     2.20  asv.core_suite.CircuitEvaluation.time_circuit(2, 9)
+      3.18±0.1ms       7.11±0.3ms     2.24  asv.core_suite.CircuitEvaluation.time_circuit(5, 3)
+      5.23±0.2ms      12.3±0.04ms     2.35  asv.core_suite.CircuitEvaluation.time_circuit(5, 6)
+      7.37±0.3ms       17.3±0.4ms     2.35  asv.core_suite.CircuitEvaluation.time_circuit(5, 9)
-      7.82±0.4ms       4.93±0.2ms     0.63  asv.core_suite.GradientComputation.time_gradient_autograd(2, 3)
-      21.6±0.7ms       8.09±0.3ms     0.37  asv.core_suite.GradientComputation.time_gradient_autograd(2, 6)
-        40.3±1ms       11.3±0.4ms     0.28  asv.core_suite.GradientComputation.time_gradient_autograd(5, 3)
-         142±2ms       19.4±0.8ms     0.14  asv.core_suite.GradientComputation.time_gradient_autograd(5, 6)
+      10.8±0.1ms       16.3±0.1ms     1.51  asv.core_suite.GradientComputation.time_gradient_tf(2, 3)
       27.0±0.3ms       29.4±0.3ms     1.09  asv.core_suite.GradientComputation.time_gradient_tf(2, 6)
       46.2±0.2ms         47.1±3ms     1.02  asv.core_suite.GradientComputation.time_gradient_tf(5, 3)
-         154±2ms       88.8±0.7ms     0.58  asv.core_suite.GradientComputation.time_gradient_tf(5, 6)
       8.01±0.3ms       8.11±0.2ms     1.01  asv.core_suite.GradientComputation.time_gradient_torch(2, 3)
       22.2±0.9ms       21.3±0.2ms     0.96  asv.core_suite.GradientComputation.time_gradient_torch(2, 6)
       39.3±0.6ms       39.4±0.1ms     1.00  asv.core_suite.GradientComputation.time_gradient_torch(5, 3)
        144±0.6ms          140±2ms     0.97  asv.core_suite.GradientComputation.time_gradient_torch(5, 6)

Copy link
Contributor

@thisac thisac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks nice @josh146. Also curious about the things that @trbromley commented on. Only, not sure exactly why the parameter shift diff method needs to be set for some tests.

Also, noticed that the docstrings for get_tape , and some of the following methods, need to be updated. The Returns: should include the returned device and the priority order should be switched for backprop and device in get_tape.

pennylane/tape/qnode.py Show resolved Hide resolved
pennylane/tape/qnode.py Show resolved Hide resolved
tests/collections/test_collections.py Outdated Show resolved Hide resolved
tests/tape/tapes/test_qnode.py Show resolved Hide resolved
@josh146
Copy link
Member Author

josh146 commented Jan 19, 2021

Isn't this also an indication that there are some "rough edges" in the codebase itself (not just the test suite)? Shouldn't all standard use-cases work without caring about the specific diff method?

@co9olguy: yes but also no!

  • A majority of the cases (~75), the test was explicitly assuming parameter-shift without passing diff_method="parameter-shift". That is, the test was explicitly mocking out a low level parameter-shift, or it was assuming an intermediate interface performing tensor unwrapping, or it was actually testing the parameter-shift gradient.

    In these cases, the tests should have been specifying diff_method="parameter-shift" anyway, so I added it in. This makes these tests more resilient to future changes of the default.

  • There were ~30 tests where this was not the case. These tests were high-level integration tests that should have worked independently of the differentiation method. In these cases, I tracked down the bugs in the source code and fixed them, so that the tests passed without needing to specify diff_method.

Considering we have almost 6000 test cases, the fact that only 100 failed after changing the default gradient method was very relieving to me 🙂 I thought more would fail.

FYI: This change has awesome speedups for autograd, but seems to slow down the circuit evaluation a lot.

@Thenerdstation: @mariaschuld noticed the same thing. Here are some more benchmarks (master on the left):

A couple of notes:

  • The forward pass evaluation is slower (~2x slower)
  • Computing gradients with Autograd is waaay faster! Almost 10x faster for larger circuits.
  • Computing gradients with TF is about 2-3x faster.

Just from intuition, I'm imagining that the slower forward pass is simply due to the overhead of storing intermediate statevectors in memory. I also imagine we have something like the following:

image

Edit: I labelled the axes backwards, please reverse the labels in your head :)

That is, the forward pass overhead dominates for fewer parameters, but the parameter-shift scaling dominates for more parameters.

Question: Is this a better default?

  • If the user is simply doing forward passes, then no. However, if they do not care about gradients, they could always specify interface=None, which removes all gradient overhead from the computation (this should even be faster that the parameter-shift current default).

  • If the user is doing optimization, then this default should be faster (autograd gradients are ~10x faster)

Also, noticed that the docstrings for get_tape , and some of the following methods, need to be updated. The Returns: should include the returned device and the priority order should be switched for backprop and device in get_tape.

Thanks for catching this @thisac! Have updated the docstrings in 74720de.

pennylane/tape/qnode.py Outdated Show resolved Hide resolved
pennylane/tape/qnode.py Outdated Show resolved Hide resolved
josh146 and others added 2 commits January 19, 2021 12:31
Co-authored-by: Tom Bromley <49409390+trbromley@users.noreply.github.com>
Copy link
Contributor

@trbromley trbromley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @josh146, I just want to have a quick go at testing this for some simple examples and then I can approve.

Comment on lines +154 to +156
# Store the differentiation method to be passed to JacobianTape.jacobian().
# Note that the tape accepts a different set of allowed methods than the QNode:
# best, analytic, numeric, device
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No problem for this PR, but what is the reason for the difference in method names between the QNode and the Jacobian tape?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's purely historical, they've always deviated 🙁 I wish it were not the case, and wouldn't mind changing it.

@@ -249,21 +257,39 @@ def _validate_backprop_method(device, interface):
# determine if the device supports backpropagation
backprop_interface = device.capabilities().get("passthru_interface", None)

# determine if the device has any child devices that support backpropagation
backprop_devices = device.capabilities().get("passthru_devices", None)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious how this line wouldn't be covered by tests, when following ones are 🤔 Maybe it's a bug?
image

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this doesn't seem possible 🤔

Copy link
Contributor

@trbromley trbromley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @josh146 💯

Comment on lines +616 to +618
self._tape, interface, diff_method, self.device = self.get_tape(
self._original_device, "torch", self.diff_method
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ohh wait, I think I get these parts now! The idea is if we change the interface, it'll automatically check if it can change the device under the hood to ideally maintain backprop support?

Do we need to do this for torch?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, because this allows us to go backward 🙂

E.g., a QNode running on default.qubit.tf and backprop will be able to be converted to a QNode running on default.qubit with parameter-shift!

Even better, if the user explicitly asked for backprop, the QNode remembers that preference (since we save it as self.diff_method), and calling qnode.to_torch() will raise an error since backprop is not supported. If self.diff_method was best, it will work with no issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
review-ready 👌 PRs which are ready for review by someone from the core team.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants