-
Notifications
You must be signed in to change notification settings - Fork 575
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Differentiable batch execute using autograd #1501
Conversation
Hello. You may have forgotten to update the changelog!
|
pennylane/_qubit_device.py
Outdated
def execute_and_gradients(self, circuits): | ||
res = [] | ||
jacs = [] | ||
|
||
for circuit in circuits: | ||
# Evaluations and gradients are paired, so that | ||
# we can re-use the device state for the adjoint method | ||
res.append(circuit.execute(self)) | ||
jacs.append(self.adjoint_jacobian(circuit, use_device_state=True)) | ||
|
||
return res, jacs | ||
|
||
def gradients(self, circuits): | ||
return [self.adjoint_jacobian(circuit) for circuit in circuits] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added these to the device API for consistency and to make the interface code cleaner.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is helpful, but it bakes into the parent class an assumption about supported gradient methods.
Someone (a plugin developer or a power user) might be tempted to access device gradients, but this method (found in the parent class of basically all plugins) will break (with an obscure error message) when users try to use it on a HW plugin.
Could we use the capabilities dictionary to register & check the support for adjoint, and either give an error, or better yet, use a different "device" gradient method when adjoint is not supported/optimal?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is helpful, but it bakes into the parent class an assumption about supported gradient methods.
Yep, I agree. I definitely think the final design decision of these two methods in the Device is TBD.
use a different "device" gradient method when adjoint is not supported/optimal?
Good idea. Unfortunately, we don't have any other available device methods I know of at the moment 😆 In core, it's either adjoint or nothing. But I can add validation here that checks the capabilities dictionary 👍
Somewhat larger in scope, but I spent a lot of time thinking about this problem yesterday, and prototyped up some approaches with user-defined device methods.
I kept coming back to the following thought - perhaps the best approach for a plugin developer is simply to fully overwrite Device.execute_and_gradients
and Device.gradients
?
My thought process was inspired a lot by what happened with batch execute:
-
We provide a 'simple' default
Device.batch_execute
, that just has a for loop over execution. -
The Braket plugin already fully overwrites
batch_execute
because they need to do a lot of custom parallel/batching.
So the most flexible thing for the future seems to be (a) provide a sensible (albeit slightly stupid) default implementation, and (b) allow devices to overwrite it as needed.
from .autograd import execute as execute_autograd | ||
|
||
|
||
def execute(tapes, device, gradient_fn, interface="autograd", accumulation="forward"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the qml.interfaces.batch.execute
wrapper function/dispatcher.
It simply dispatches to the correct interfaces (for now, just autograd), as well as performing all shared validation.
pennylane/gradients/vjp.py
Outdated
def _vector_jacobian_products(dys, jacs, reduction="append"): | ||
"""Compute the vector-Jacobian product for a given | ||
vector of gradient outputs dys and Jacobians jacs""" | ||
vjps = [] | ||
|
||
for dy, jac in zip(dys, jacs): | ||
|
||
if jacs is None: | ||
# The tape has no trainable parameters; the VJP | ||
# is simply none. | ||
vjps.append(None) | ||
continue | ||
|
||
if math.allclose(dy, 0): | ||
# If the dy vector is zero, then the | ||
# corresponding element of the VJP will be zero, | ||
# and we can avoid a quantum computation. | ||
num_params = math.reshape(jac, [-1, dy_row.shape[0]]).shape[0] | ||
vjp = math.convert_like(np.zeros([num_params]), dy) | ||
else: | ||
vjp = _vector_jacobian_product(dy, jac) | ||
|
||
getattr(vjps, reduction)(vjp) | ||
|
||
return vjps |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added this since it was missing in #1494 (and I needed a way to get a list of VJPs, given a list of Jacobians and dy
vectors).
pennylane/gradients/vjp.py
Outdated
dy_row = math.reshape(dy, [-1]) | ||
jac = math.reshape(jac, [dy_row.shape[0], -1]) | ||
return math.tensordot(jac, dy_row, [[0], [0]]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This fixes a bug I discovered while working on this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @josh146!
Seems nice and fairly clean. The only thing that felt unnatural was the ambiguity of what happens to gradient_fn
when execute_fn
already computes the gradient (inside batch.execute
). It was clear upon reading the docstring of batch.autograd.execute
, but was not clear what to expect without digging into that.
Would it make it more clear to explicitly set gradient_fn=None
in the accumulation == "forward"
branch?
pennylane/_qubit_device.py
Outdated
def execute_and_gradients(self, circuits): | ||
res = [] | ||
jacs = [] | ||
|
||
for circuit in circuits: | ||
# Evaluations and gradients are paired, so that | ||
# we can re-use the device state for the adjoint method | ||
res.append(circuit.execute(self)) | ||
jacs.append(self.adjoint_jacobian(circuit, use_device_state=True)) | ||
|
||
return res, jacs | ||
|
||
def gradients(self, circuits): | ||
return [self.adjoint_jacobian(circuit) for circuit in circuits] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is helpful, but it bakes into the parent class an assumption about supported gradient methods.
Someone (a plugin developer or a power user) might be tempted to access device gradients, but this method (found in the parent class of basically all plugins) will break (with an obscure error message) when users try to use it on a HW plugin.
Could we use the capabilities dictionary to register & check the support for adjoint, and either give an error, or better yet, use a different "device" gradient method when adjoint is not supported/optimal?
Co-authored-by: Nathan Killoran <co9olguy@users.noreply.github.com>
Codecov Report
@@ Coverage Diff @@
## master #1501 +/- ##
=======================================
Coverage 98.22% 98.23%
=======================================
Files 185 187 +2
Lines 13301 13370 +69
=======================================
+ Hits 13065 13134 +69
Misses 236 236
Continue to review full report at Codecov.
|
@co9olguy: have taken into account your suggestions, and written tests. This is now ready for proper review 🙂 |
@pytest.mark.parametrize("execute_kwargs", execute_kwargs) | ||
class TestAutogradExecuteIntegration: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: This entire test class has been adapted directly from https://github.com/PennyLaneAI/pennylane/blob/master/tests/interfaces/test_tape_autograd.py to work with the new interface function.
Thus, the new autograd.execute
function passes all the same integration tests as the current autograd interface.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it make sense to have all these tests depend on qml.tape.JacobianTape
? Isn't that designated for the chopping block? What will happen to these tests when we remove JacobianTape
?
Can we use the parent tape class instead for all these tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use the parent tape class instead for all these tests?
Not yet, but that is the plan once all PRs in this story are complete! Currently, there are a few auxiliary methods in the JacobianTape
(namely, JacobianTape._grad_method
, JacobianTape._update_gradient_info
, and JacobianTape._grad_method_validation
) that are needed for circuit gradients to work.
I actually realized this late -- I initially did try to write these tests using qml.tape.QuantumTape()
, only to run into a lot of failures.
I still want to delete the JacobianTape
, so there will just need to be a new ticket in this story to work out where to move the functionality currently in those class methods.
We could either move them to the parent class, or turn them into functions.
class TestHigherOrderDerivatives: | ||
"""Test that the autograd execute function can be differentiated""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a new test class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice job @josh146!
No real concerns with this PR, but I hope we can get some more devs experienced enough with the core of the codebase to be able to work on these files in future. They're quite intricate!
expected = np.array( | ||
[ | ||
[-np.sin(x), 0], | ||
[-np.sin(x) * np.cos(y) / 2, -np.cos(x) * np.sin(y) / 2], | ||
[np.cos(y) * np.sin(x) / 2, np.cos(x) * np.sin(y) / 2], | ||
] | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What will happen when Numpy stops supporting ragged arrays? 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We will need to transition to returning tuples of non-ragged arrays. E.g., for a QNode returning
return expval(obs1), probs(wires=0)
we should return a tuple with arrays of shape tuple[array[0], array[2]]
(rather than what we currently do, which is return array[array[0], array[2]]
).
This is also more consistent with how NumPy/TensorFlow/Torch already work - functions that return more than one value simply return a tuple of tensors;
>>> s, u, v = tf.linalg.svd(x)
"""Since the adjoint hessian is not a differentiable transform, | ||
higher-order derivatives are not supported.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could they be? 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I spent a lot of time thinking about this. It is possible to write the adjoint method as a pure function using autodiff, but I don't think it's worth it:
- Mainly, you lose the main benefit of the adjoint method --- the ability to re-use the existing device state
- More minor, but it keeps the adjoint computation in Python
So it is doable, but it would lead to a significant performance regression for users interested in just the first derivative!
I think a better approach is to simply write the low-level adjoint method to support nth derivatives manually.
Co-authored-by: Nathan Killoran <co9olguy@users.noreply.github.com>
[ch7165] |
Context: This PR adds support for differentiable batch execution of circuits using autograd.
Description of the change:
A new subpackage has been added,
pennylane/interfaces/batch
. This is a temporary directory --- oncebatch_execute
is the default pathway in PL, the old interfaces will be removed.This PR adds the following:
qml.interfaces.batch.execute()
- a high level wrapper function, that dispatches to the correct autodiff interface. Currently, only autograd is supported. In addition, this high-level wrapper also contains shared code for determining the correct forward/backward execution function.qml.interfaces.autograd
- a module containing an autograd custom gradient function forbatch_execute
.Device.execute_and_gradients
: new Device method for computing tape results alongside tape gradientsDevice.gradients
: new Device method for computing tape gradients. Can be considered the gradient equivalent ofDevice.batch_execute
.Benefits:
Execute tapes in a batch, with the output remaining differentiable.
Supports gradient transforms and device execution methods
Since gradient transforms are differentiable, nth order higher derivatives are supported. Compared to PL master, this allows nth order derivatives of everything, including expval, var, probs, tensor products, non-two-term shifts, etc.
Supports both forward and backward accumulation modes.
Gradient transforms only support
mode="backward"
, due to there not being much of a use-case for computing shift-rule gradients during the forward pass.Device methods support
mode="forward"
andmode="backward"
. Forward mode is the default when using a device method, since it is often the case that the device state on the forward pass is re-used to compute gradients.Potential drawbacks:
The recursive method used to compute higher-derivatives is not so smart. For example, when computing the Hessian, all matrix elements are computed, rather than just the upper-tridiagonal.
Device gradient methods are not differentiable, and therefore when using device methods, higher order derivatives are not accessible unless directly supported by the device method.
In Add a simple API for transforms that generate multiple tapes #1493, work is being done to create a standardized API for gradient transforms. Until then, we simply assume that any
gradient_fn
within thepennylane.gradients
module is a transform.All gradient transforms, and all device gradients (e.g., adjoint) are supported. The reversible method, however, is not currently supported, since it is not a transform nor a device method.
Issues: n/a