Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Differentiable batch execute using autograd #1501

Merged
merged 27 commits into from
Aug 16, 2021
Merged

Differentiable batch execute using autograd #1501

merged 27 commits into from
Aug 16, 2021

Conversation

josh146
Copy link
Member

@josh146 josh146 commented Aug 5, 2021

Context: This PR adds support for differentiable batch execution of circuits using autograd.

Description of the change:

A new subpackage has been added, pennylane/interfaces/batch. This is a temporary directory --- once batch_execute is the default pathway in PL, the old interfaces will be removed.

This PR adds the following:

  • qml.interfaces.batch.execute() - a high level wrapper function, that dispatches to the correct autodiff interface. Currently, only autograd is supported. In addition, this high-level wrapper also contains shared code for determining the correct forward/backward execution function.

  • qml.interfaces.autograd - a module containing an autograd custom gradient function for batch_execute.

  • Device.execute_and_gradients: new Device method for computing tape results alongside tape gradients

  • Device.gradients: new Device method for computing tape gradients. Can be considered the gradient equivalent of Device.batch_execute.

Benefits:

  • Execute tapes in a batch, with the output remaining differentiable.

  • Supports gradient transforms and device execution methods

  • Since gradient transforms are differentiable, nth order higher derivatives are supported. Compared to PL master, this allows nth order derivatives of everything, including expval, var, probs, tensor products, non-two-term shifts, etc.

  • Supports both forward and backward accumulation modes.

    • Gradient transforms only support mode="backward", due to there not being much of a use-case for computing shift-rule gradients during the forward pass.

    • Device methods support mode="forward" and mode="backward". Forward mode is the default when using a device method, since it is often the case that the device state on the forward pass is re-used to compute gradients.

Potential drawbacks:

  • The recursive method used to compute higher-derivatives is not so smart. For example, when computing the Hessian, all matrix elements are computed, rather than just the upper-tridiagonal.

  • Device gradient methods are not differentiable, and therefore when using device methods, higher order derivatives are not accessible unless directly supported by the device method.

  • In Add a simple API for transforms that generate multiple tapes #1493, work is being done to create a standardized API for gradient transforms. Until then, we simply assume that any gradient_fn within the pennylane.gradients module is a transform.

  • All gradient transforms, and all device gradients (e.g., adjoint) are supported. The reversible method, however, is not currently supported, since it is not a transform nor a device method.

Issues: n/a

@josh146 josh146 requested a review from co9olguy August 5, 2021 17:59
@github-actions
Copy link
Contributor

github-actions bot commented Aug 5, 2021

Hello. You may have forgotten to update the changelog!
Please edit .github/CHANGELOG.md with:

  • A one-to-two sentence description of the change. You may include a small working example for new features.
  • A link back to this PR.
  • Your name (or GitHub username) in the contributors section.

Comment on lines 829 to 842
def execute_and_gradients(self, circuits):
res = []
jacs = []

for circuit in circuits:
# Evaluations and gradients are paired, so that
# we can re-use the device state for the adjoint method
res.append(circuit.execute(self))
jacs.append(self.adjoint_jacobian(circuit, use_device_state=True))

return res, jacs

def gradients(self, circuits):
return [self.adjoint_jacobian(circuit) for circuit in circuits]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added these to the device API for consistency and to make the interface code cleaner.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is helpful, but it bakes into the parent class an assumption about supported gradient methods.

Someone (a plugin developer or a power user) might be tempted to access device gradients, but this method (found in the parent class of basically all plugins) will break (with an obscure error message) when users try to use it on a HW plugin.

Could we use the capabilities dictionary to register & check the support for adjoint, and either give an error, or better yet, use a different "device" gradient method when adjoint is not supported/optimal?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is helpful, but it bakes into the parent class an assumption about supported gradient methods.

Yep, I agree. I definitely think the final design decision of these two methods in the Device is TBD.

use a different "device" gradient method when adjoint is not supported/optimal?

Good idea. Unfortunately, we don't have any other available device methods I know of at the moment 😆 In core, it's either adjoint or nothing. But I can add validation here that checks the capabilities dictionary 👍

Somewhat larger in scope, but I spent a lot of time thinking about this problem yesterday, and prototyped up some approaches with user-defined device methods.

I kept coming back to the following thought - perhaps the best approach for a plugin developer is simply to fully overwrite Device.execute_and_gradients and Device.gradients?

My thought process was inspired a lot by what happened with batch execute:

  • We provide a 'simple' default Device.batch_execute, that just has a for loop over execution.

  • The Braket plugin already fully overwrites batch_execute because they need to do a lot of custom parallel/batching.

So the most flexible thing for the future seems to be (a) provide a sensible (albeit slightly stupid) default implementation, and (b) allow devices to overwrite it as needed.

from .autograd import execute as execute_autograd


def execute(tapes, device, gradient_fn, interface="autograd", accumulation="forward"):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the qml.interfaces.batch.execute wrapper function/dispatcher.

It simply dispatches to the correct interfaces (for now, just autograd), as well as performing all shared validation.

Comment on lines 31 to 55
def _vector_jacobian_products(dys, jacs, reduction="append"):
"""Compute the vector-Jacobian product for a given
vector of gradient outputs dys and Jacobians jacs"""
vjps = []

for dy, jac in zip(dys, jacs):

if jacs is None:
# The tape has no trainable parameters; the VJP
# is simply none.
vjps.append(None)
continue

if math.allclose(dy, 0):
# If the dy vector is zero, then the
# corresponding element of the VJP will be zero,
# and we can avoid a quantum computation.
num_params = math.reshape(jac, [-1, dy_row.shape[0]]).shape[0]
vjp = math.convert_like(np.zeros([num_params]), dy)
else:
vjp = _vector_jacobian_product(dy, jac)

getattr(vjps, reduction)(vjp)

return vjps
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added this since it was missing in #1494 (and I needed a way to get a list of VJPs, given a list of Jacobians and dy vectors).

Comment on lines 26 to 28
dy_row = math.reshape(dy, [-1])
jac = math.reshape(jac, [dy_row.shape[0], -1])
return math.tensordot(jac, dy_row, [[0], [0]])
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fixes a bug I discovered while working on this PR.

Copy link
Member

@co9olguy co9olguy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @josh146!

Seems nice and fairly clean. The only thing that felt unnatural was the ambiguity of what happens to gradient_fn when execute_fn already computes the gradient (inside batch.execute). It was clear upon reading the docstring of batch.autograd.execute, but was not clear what to expect without digging into that.

Would it make it more clear to explicitly set gradient_fn=None in the accumulation == "forward" branch?

Comment on lines 829 to 842
def execute_and_gradients(self, circuits):
res = []
jacs = []

for circuit in circuits:
# Evaluations and gradients are paired, so that
# we can re-use the device state for the adjoint method
res.append(circuit.execute(self))
jacs.append(self.adjoint_jacobian(circuit, use_device_state=True))

return res, jacs

def gradients(self, circuits):
return [self.adjoint_jacobian(circuit) for circuit in circuits]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is helpful, but it bakes into the parent class an assumption about supported gradient methods.

Someone (a plugin developer or a power user) might be tempted to access device gradients, but this method (found in the parent class of basically all plugins) will break (with an obscure error message) when users try to use it on a HW plugin.

Could we use the capabilities dictionary to register & check the support for adjoint, and either give an error, or better yet, use a different "device" gradient method when adjoint is not supported/optimal?

pennylane/gradients/vjp.py Outdated Show resolved Hide resolved
pennylane/interfaces/batch/__init__.py Outdated Show resolved Hide resolved
pennylane/interfaces/batch/__init__.py Outdated Show resolved Hide resolved
pennylane/interfaces/batch/__init__.py Outdated Show resolved Hide resolved
pennylane/interfaces/batch/autograd.py Show resolved Hide resolved
pennylane/interfaces/batch/autograd.py Outdated Show resolved Hide resolved
pennylane/interfaces/batch/autograd.py Outdated Show resolved Hide resolved
pennylane/interfaces/batch/autograd.py Outdated Show resolved Hide resolved
pennylane/interfaces/batch/__init__.py Show resolved Hide resolved
Co-authored-by: Nathan Killoran <co9olguy@users.noreply.github.com>
@codecov
Copy link

codecov bot commented Aug 6, 2021

Codecov Report

Merging #1501 (bf505e9) into master (566ee89) will increase coverage by 0.00%.
The diff coverage is 100.00%.

Impacted file tree graph

@@           Coverage Diff           @@
##           master    #1501   +/-   ##
=======================================
  Coverage   98.22%   98.23%           
=======================================
  Files         185      187    +2     
  Lines       13301    13370   +69     
=======================================
+ Hits        13065    13134   +69     
  Misses        236      236           
Impacted Files Coverage Δ
pennylane/_device.py 96.94% <100.00%> (+0.13%) ⬆️
pennylane/gradients/vjp.py 100.00% <100.00%> (ø)
pennylane/interfaces/batch/__init__.py 100.00% <100.00%> (ø)
pennylane/interfaces/batch/autograd.py 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 566ee89...bf505e9. Read the comment docs.

@josh146 josh146 changed the title [WIP] Differentiable batch execute using autograd Differentiable batch execute using autograd Aug 9, 2021
@josh146 josh146 requested a review from co9olguy August 9, 2021 12:55
@josh146 josh146 added the review-ready 👌 PRs which are ready for review by someone from the core team. label Aug 9, 2021
@josh146
Copy link
Member Author

josh146 commented Aug 9, 2021

@co9olguy: have taken into account your suggestions, and written tests. This is now ready for proper review 🙂

Comment on lines +175 to +176
@pytest.mark.parametrize("execute_kwargs", execute_kwargs)
class TestAutogradExecuteIntegration:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: This entire test class has been adapted directly from https://github.com/PennyLaneAI/pennylane/blob/master/tests/interfaces/test_tape_autograd.py to work with the new interface function.

Thus, the new autograd.execute function passes all the same integration tests as the current autograd interface.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense to have all these tests depend on qml.tape.JacobianTape? Isn't that designated for the chopping block? What will happen to these tests when we remove JacobianTape?

Can we use the parent tape class instead for all these tests?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use the parent tape class instead for all these tests?

Not yet, but that is the plan once all PRs in this story are complete! Currently, there are a few auxiliary methods in the JacobianTape (namely, JacobianTape._grad_method, JacobianTape._update_gradient_info, and JacobianTape._grad_method_validation) that are needed for circuit gradients to work.

I actually realized this late -- I initially did try to write these tests using qml.tape.QuantumTape(), only to run into a lot of failures.

I still want to delete the JacobianTape, so there will just need to be a new ticket in this story to work out where to move the functionality currently in those class methods.

We could either move them to the parent class, or turn them into functions.

Comment on lines +516 to +517
class TestHigherOrderDerivatives:
"""Test that the autograd execute function can be differentiated"""
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a new test class.

Copy link
Member

@co9olguy co9olguy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice job @josh146!

No real concerns with this PR, but I hope we can get some more devs experienced enough with the core of the codebase to be able to work on these files in future. They're quite intricate!

pennylane/_device.py Show resolved Hide resolved
pennylane/_device.py Outdated Show resolved Hide resolved
pennylane/_device.py Outdated Show resolved Hide resolved
pennylane/_device.py Outdated Show resolved Hide resolved
pennylane/interfaces/batch/autograd.py Show resolved Hide resolved
tests/interfaces/test_batch_autograd.py Show resolved Hide resolved
Comment on lines +487 to +493
expected = np.array(
[
[-np.sin(x), 0],
[-np.sin(x) * np.cos(y) / 2, -np.cos(x) * np.sin(y) / 2],
[np.cos(y) * np.sin(x) / 2, np.cos(x) * np.sin(y) / 2],
]
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What will happen when Numpy stops supporting ragged arrays? 🤔

Copy link
Member Author

@josh146 josh146 Aug 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will need to transition to returning tuples of non-ragged arrays. E.g., for a QNode returning

return expval(obs1), probs(wires=0)

we should return a tuple with arrays of shape tuple[array[0], array[2]] (rather than what we currently do, which is return array[array[0], array[2]]).

This is also more consistent with how NumPy/TensorFlow/Torch already work - functions that return more than one value simply return a tuple of tensors;

>>> s, u, v = tf.linalg.svd(x)

tests/interfaces/test_batch_autograd.py Show resolved Hide resolved
tests/interfaces/test_batch_autograd.py Show resolved Hide resolved
Comment on lines +562 to +563
"""Since the adjoint hessian is not a differentiable transform,
higher-order derivatives are not supported."""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could they be? 🤔

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I spent a lot of time thinking about this. It is possible to write the adjoint method as a pure function using autodiff, but I don't think it's worth it:

  • Mainly, you lose the main benefit of the adjoint method --- the ability to re-use the existing device state
  • More minor, but it keeps the adjoint computation in Python

So it is doable, but it would lead to a significant performance regression for users interested in just the first derivative!

I think a better approach is to simply write the low-level adjoint method to support nth derivatives manually.

Co-authored-by: Nathan Killoran <co9olguy@users.noreply.github.com>
Base automatically changed from vjp-transform to master August 16, 2021 12:57
@josh146 josh146 merged commit 683fbfa into master Aug 16, 2021
@josh146 josh146 deleted the batch-autograd branch August 16, 2021 14:14
@josh146
Copy link
Member Author

josh146 commented Aug 18, 2021

[ch7165]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
review-ready 👌 PRs which are ready for review by someone from the core team.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants