-
Notifications
You must be signed in to change notification settings - Fork 575
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change the default gradient method logic to prefer devices, and to allow backprop on child devices #1008
Conversation
…low backprop on child devices
It looks like there are quite a few tests failing, because they don't specify These should be easily fixed by adding |
Note: this PR is worth benchmarking, to ensure that it provides a better experience for users using the default settings, e.g., dev = qml.device("default.qubit", wires=2)
@qml.qnode(dev) I imagine some demos might also need to be updated for this change 🤔 |
self._tape, self.interface, diff_method, self.device = self.get_tape( | ||
device, interface, diff_method | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since it is now possible for the type of tape to be able to change the device :)
@@ -92,6 +92,7 @@ class DefaultQubitAutograd(DefaultQubit): | |||
"CRX": autograd_ops.CRX, | |||
"CRY": autograd_ops.CRY, | |||
"CRZ": autograd_ops.CRZ, | |||
"CRot": autograd_ops.CRot, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Noticed this was missing due to tests suddenly failing in the new backprop default!
Codecov Report
@@ Coverage Diff @@
## master #1008 +/- ##
=======================================
Coverage 97.91% 97.92%
=======================================
Files 151 151
Lines 11201 11220 +19
=======================================
+ Hits 10968 10987 +19
Misses 233 233
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @josh146! Looks great, I had a few questions but would be happy to approve!
Isn't this also an indication that there are some "rough edges" in the codebase itself (not just the test suite)? Shouldn't all standard use-cases work without caring about the specific diff method? |
FYI: This change has awesome speedups for autograd, but seems to slow down the circuit evaluation a lot.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks nice @josh146. Also curious about the things that @trbromley commented on. Only, not sure exactly why the parameter shift diff method needs to be set for some tests.
Also, noticed that the docstrings for get_tape
, and some of the following methods, need to be updated. The Returns:
should include the returned device and the priority order should be switched for backprop and device in get_tape
.
@co9olguy: yes but also no!
Considering we have almost 6000 test cases, the fact that only 100 failed after changing the default gradient method was very relieving to me 🙂 I thought more would fail.
@Thenerdstation: @mariaschuld noticed the same thing. Here are some more benchmarks (master on the left): A couple of notes:
Just from intuition, I'm imagining that the slower forward pass is simply due to the overhead of storing intermediate statevectors in memory. I also imagine we have something like the following: Edit: I labelled the axes backwards, please reverse the labels in your head :) That is, the forward pass overhead dominates for fewer parameters, but the parameter-shift scaling dominates for more parameters. Question: Is this a better default?
Thanks for catching this @thisac! Have updated the docstrings in 74720de. |
Co-authored-by: Tom Bromley <49409390+trbromley@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @josh146, I just want to have a quick go at testing this for some simple examples and then I can approve.
# Store the differentiation method to be passed to JacobianTape.jacobian(). | ||
# Note that the tape accepts a different set of allowed methods than the QNode: | ||
# best, analytic, numeric, device |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No problem for this PR, but what is the reason for the difference in method names between the QNode and the Jacobian tape?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's purely historical, they've always deviated 🙁 I wish it were not the case, and wouldn't mind changing it.
@@ -249,21 +257,39 @@ def _validate_backprop_method(device, interface): | |||
# determine if the device supports backpropagation | |||
backprop_interface = device.capabilities().get("passthru_interface", None) | |||
|
|||
# determine if the device has any child devices that support backpropagation | |||
backprop_devices = device.capabilities().get("passthru_devices", None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this doesn't seem possible 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @josh146 💯
self._tape, interface, diff_method, self.device = self.get_tape( | ||
self._original_device, "torch", self.diff_method | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ohh wait, I think I get these parts now! The idea is if we change the interface, it'll automatically check if it can change the device under the hood to ideally maintain backprop support?
Do we need to do this for torch?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, because this allows us to go backward 🙂
E.g., a QNode running on default.qubit.tf
and backprop will be able to be converted to a QNode running on default.qubit with parameter-shift!
Even better, if the user explicitly asked for backprop, the QNode remembers that preference (since we save it as self.diff_method
), and calling qnode.to_torch()
will raise an error since backprop is not supported. If self.diff_method
was best
, it will work with no issue.
Context:
When using simulators, backpropagation scales significantly better for computing gradients than the parameter-shift rule as the number of parameters increases:
However, to use backprop, you need to explicitly load
default.qubit.tf
ordefault.qubit.autograd
, which can cause the ability to perform backprop be a bit 'hidden'.Description of the Change:
The logic for choosing the 'best' differentiation method has been altered to improve performance.
If the device provides its own gradient, this is now the preferred differentiation method.
If a device provides child devices that natively support classical backpropagation, this is now preferred over the parameter-shift rule.
Devices define child devices via their
capabilities()
dictionary. For example,default.qubit
supports child devices for TensorFlow, Autograd, and Jax:As a result of this change, if the QNode
diff_method
is not explicitly provided, it is possible that the QNode will run on a child device of the device that was specifically provided:Benefits:
default.qubit
.Possible Drawbacks:
For circuits executed on the CPU, there can be some overhead on the forward pass when using TensorFlow, since intermediate values of the computation is being stored.
The QNode is using a different device to that provided by the user (
dev
). As a result, callingdev.state()
after executing the QNode will not provide expected results. However, you can still callqnode.device.state
. Since we aim to deprecate the device methods from being user facing, I'm not too worried about this change.Backprop is only supported if
analytic=True
. If this is not the case, the logic simply fallsback to parameter-shift.Related GitHub Issues: n/a