Differentiable batch execute using autograd #1501

josh146 · 2021-08-05T17:59:35Z

Context: This PR adds support for differentiable batch execution of circuits using autograd.

Description of the change:

A new subpackage has been added, pennylane/interfaces/batch. This is a temporary directory --- once batch_execute is the default pathway in PL, the old interfaces will be removed.

This PR adds the following:

qml.interfaces.batch.execute() - a high level wrapper function, that dispatches to the correct autodiff interface. Currently, only autograd is supported. In addition, this high-level wrapper also contains shared code for determining the correct forward/backward execution function.
qml.interfaces.autograd - a module containing an autograd custom gradient function for batch_execute.
Device.execute_and_gradients: new Device method for computing tape results alongside tape gradients
Device.gradients: new Device method for computing tape gradients. Can be considered the gradient equivalent of Device.batch_execute.

Benefits:

Execute tapes in a batch, with the output remaining differentiable.
Supports gradient transforms and device execution methods
Since gradient transforms are differentiable, nth order higher derivatives are supported. Compared to PL master, this allows nth order derivatives of everything, including expval, var, probs, tensor products, non-two-term shifts, etc.
Supports both forward and backward accumulation modes.
- Gradient transforms only support mode="backward", due to there not being much of a use-case for computing shift-rule gradients during the forward pass.
- Device methods support mode="forward" and mode="backward". Forward mode is the default when using a device method, since it is often the case that the device state on the forward pass is re-used to compute gradients.

Potential drawbacks:

The recursive method used to compute higher-derivatives is not so smart. For example, when computing the Hessian, all matrix elements are computed, rather than just the upper-tridiagonal.
Device gradient methods are not differentiable, and therefore when using device methods, higher order derivatives are not accessible unless directly supported by the device method.
In Add a simple API for transforms that generate multiple tapes #1493, work is being done to create a standardized API for gradient transforms. Until then, we simply assume that any gradient_fn within the pennylane.gradients module is a transform.
All gradient transforms, and all device gradients (e.g., adjoint) are supported. The reversible method, however, is not currently supported, since it is not a transform nor a device method.

Issues: n/a

github-actions · 2021-08-05T17:59:55Z

Hello. You may have forgotten to update the changelog!
Please edit .github/CHANGELOG.md with:

A one-to-two sentence description of the change. You may include a small working example for new features.
A link back to this PR.
Your name (or GitHub username) in the contributors section.

josh146 · 2021-08-05T18:00:20Z

pennylane/_qubit_device.py

+    def execute_and_gradients(self, circuits):
+        res = []
+        jacs = []
+
+        for circuit in circuits:
+            # Evaluations and gradients are paired, so that
+            # we can re-use the device state for the adjoint method
+            res.append(circuit.execute(self))
+            jacs.append(self.adjoint_jacobian(circuit, use_device_state=True))
+
+        return res, jacs
+
+    def gradients(self, circuits):
+        return [self.adjoint_jacobian(circuit) for circuit in circuits]


I added these to the device API for consistency and to make the interface code cleaner.

This is helpful, but it bakes into the parent class an assumption about supported gradient methods.

Someone (a plugin developer or a power user) might be tempted to access device gradients, but this method (found in the parent class of basically all plugins) will break (with an obscure error message) when users try to use it on a HW plugin.

Could we use the capabilities dictionary to register & check the support for adjoint, and either give an error, or better yet, use a different "device" gradient method when adjoint is not supported/optimal?

This is helpful, but it bakes into the parent class an assumption about supported gradient methods.

Yep, I agree. I definitely think the final design decision of these two methods in the Device is TBD.

use a different "device" gradient method when adjoint is not supported/optimal?

Good idea. Unfortunately, we don't have any other available device methods I know of at the moment 😆 In core, it's either adjoint or nothing. But I can add validation here that checks the capabilities dictionary 👍

Somewhat larger in scope, but I spent a lot of time thinking about this problem yesterday, and prototyped up some approaches with user-defined device methods.

I kept coming back to the following thought - perhaps the best approach for a plugin developer is simply to fully overwrite Device.execute_and_gradients and Device.gradients?

My thought process was inspired a lot by what happened with batch execute:

We provide a 'simple' default Device.batch_execute, that just has a for loop over execution.

The Braket plugin already fully overwrites batch_execute because they need to do a lot of custom parallel/batching.

So the most flexible thing for the future seems to be (a) provide a sensible (albeit slightly stupid) default implementation, and (b) allow devices to overwrite it as needed.

josh146 · 2021-08-05T18:01:28Z

pennylane/interfaces/batch/__init__.py

+from .autograd import execute as execute_autograd
+
+
+def execute(tapes, device, gradient_fn, interface="autograd", accumulation="forward"):


This is the qml.interfaces.batch.execute wrapper function/dispatcher.

It simply dispatches to the correct interfaces (for now, just autograd), as well as performing all shared validation.

josh146 · 2021-08-05T18:06:51Z

pennylane/gradients/vjp.py

+def _vector_jacobian_products(dys, jacs, reduction="append"):
+    """Compute the vector-Jacobian product for a given
+    vector of gradient outputs dys and Jacobians jacs"""
+    vjps = []
+
+    for dy, jac in zip(dys, jacs):
+
+        if jacs is None:
+            # The tape has no trainable parameters; the VJP
+            # is simply none.
+            vjps.append(None)
+            continue
+
+        if math.allclose(dy, 0):
+            # If the dy vector is zero, then the
+            # corresponding element of the VJP will be zero,
+            # and we can avoid a quantum computation.
+            num_params = math.reshape(jac, [-1, dy_row.shape[0]]).shape[0]
+            vjp = math.convert_like(np.zeros([num_params]), dy)
+        else:
+            vjp = _vector_jacobian_product(dy, jac)
+
+        getattr(vjps, reduction)(vjp)
+
+    return vjps


Added this since it was missing in #1494 (and I needed a way to get a list of VJPs, given a list of Jacobians and dy vectors).

josh146 · 2021-08-05T18:07:21Z

pennylane/gradients/vjp.py

    dy_row = math.reshape(dy, [-1])
+    jac = math.reshape(jac, [dy_row.shape[0], -1])
    return math.tensordot(jac, dy_row, [[0], [0]])


This fixes a bug I discovered while working on this PR.

pennylane/interfaces/batch/__init__.py

co9olguy

Thanks @josh146!

Seems nice and fairly clean. The only thing that felt unnatural was the ambiguity of what happens to gradient_fn when execute_fn already computes the gradient (inside batch.execute). It was clear upon reading the docstring of batch.autograd.execute, but was not clear what to expect without digging into that.

Would it make it more clear to explicitly set gradient_fn=None in the accumulation == "forward" branch?

co9olguy · 2021-08-05T18:15:35Z

pennylane/_qubit_device.py

+    def execute_and_gradients(self, circuits):
+        res = []
+        jacs = []
+
+        for circuit in circuits:
+            # Evaluations and gradients are paired, so that
+            # we can re-use the device state for the adjoint method
+            res.append(circuit.execute(self))
+            jacs.append(self.adjoint_jacobian(circuit, use_device_state=True))
+
+        return res, jacs
+
+    def gradients(self, circuits):
+        return [self.adjoint_jacobian(circuit) for circuit in circuits]


This is helpful, but it bakes into the parent class an assumption about supported gradient methods.

Someone (a plugin developer or a power user) might be tempted to access device gradients, but this method (found in the parent class of basically all plugins) will break (with an obscure error message) when users try to use it on a HW plugin.

Could we use the capabilities dictionary to register & check the support for adjoint, and either give an error, or better yet, use a different "device" gradient method when adjoint is not supported/optimal?

pennylane/gradients/vjp.py

pennylane/interfaces/batch/__init__.py

pennylane/interfaces/batch/autograd.py

pennylane/interfaces/batch/__init__.py

Co-authored-by: Nathan Killoran <co9olguy@users.noreply.github.com>

codecov · 2021-08-06T14:09:12Z

Codecov Report

Merging #1501 (bf505e9) into master (566ee89) will increase coverage by 0.00%.
The diff coverage is 100.00%.

@@           Coverage Diff           @@
##           master    #1501   +/-   ##
=======================================
  Coverage   98.22%   98.23%           
=======================================
  Files         185      187    +2     
  Lines       13301    13370   +69     
=======================================
+ Hits        13065    13134   +69     
  Misses        236      236

Impacted Files	Coverage Δ
pennylane/_device.py	`96.94% <100.00%> (+0.13%)`	⬆️
pennylane/gradients/vjp.py	`100.00% <100.00%> (ø)`
pennylane/interfaces/batch/__init__.py	`100.00% <100.00%> (ø)`
pennylane/interfaces/batch/autograd.py	`100.00% <100.00%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 566ee89...bf505e9. Read the comment docs.

josh146 · 2021-08-09T12:55:39Z

@co9olguy: have taken into account your suggestions, and written tests. This is now ready for proper review 🙂

josh146 · 2021-08-09T12:57:09Z

tests/interfaces/test_batch_autograd.py

+@pytest.mark.parametrize("execute_kwargs", execute_kwargs)
+class TestAutogradExecuteIntegration:


Note: This entire test class has been adapted directly from https://github.com/PennyLaneAI/pennylane/blob/master/tests/interfaces/test_tape_autograd.py to work with the new interface function.

Thus, the new autograd.execute function passes all the same integration tests as the current autograd interface.

Does it make sense to have all these tests depend on qml.tape.JacobianTape? Isn't that designated for the chopping block? What will happen to these tests when we remove JacobianTape?

Can we use the parent tape class instead for all these tests?

Can we use the parent tape class instead for all these tests?

Not yet, but that is the plan once all PRs in this story are complete! Currently, there are a few auxiliary methods in the JacobianTape (namely, JacobianTape._grad_method, JacobianTape._update_gradient_info, and JacobianTape._grad_method_validation) that are needed for circuit gradients to work.

I actually realized this late -- I initially did try to write these tests using qml.tape.QuantumTape(), only to run into a lot of failures.

I still want to delete the JacobianTape, so there will just need to be a new ticket in this story to work out where to move the functionality currently in those class methods.

We could either move them to the parent class, or turn them into functions.

josh146 · 2021-08-09T12:57:19Z

tests/interfaces/test_batch_autograd.py

+class TestHigherOrderDerivatives:
+    """Test that the autograd execute function can be differentiated"""


This is a new test class.

co9olguy

Nice job @josh146!

No real concerns with this PR, but I hope we can get some more devs experienced enough with the core of the codebase to be able to work on these files in future. They're quite intricate!

pennylane/_device.py

pennylane/interfaces/batch/autograd.py

tests/interfaces/test_batch_autograd.py

co9olguy · 2021-08-14T21:03:04Z

tests/interfaces/test_batch_autograd.py

+        expected = np.array(
+            [
+                [-np.sin(x), 0],
+                [-np.sin(x) * np.cos(y) / 2, -np.cos(x) * np.sin(y) / 2],
+                [np.cos(y) * np.sin(x) / 2, np.cos(x) * np.sin(y) / 2],
+            ]
+        )


What will happen when Numpy stops supporting ragged arrays? 🤔

We will need to transition to returning tuples of non-ragged arrays. E.g., for a QNode returning

return expval(obs1), probs(wires=0)

we should return a tuple with arrays of shape tuple[array[0], array[2]] (rather than what we currently do, which is return array[array[0], array[2]]).

This is also more consistent with how NumPy/TensorFlow/Torch already work - functions that return more than one value simply return a tuple of tensors;

>>> s, u, v = tf.linalg.svd(x)

tests/interfaces/test_batch_autograd.py

co9olguy · 2021-08-14T21:06:15Z

tests/interfaces/test_batch_autograd.py

+        """Since the adjoint hessian is not a differentiable transform,
+        higher-order derivatives are not supported."""


Could they be? 🤔

I spent a lot of time thinking about this. It is possible to write the adjoint method as a pure function using autodiff, but I don't think it's worth it:

Mainly, you lose the main benefit of the adjoint method --- the ability to re-use the existing device state

More minor, but it keeps the adjoint computation in Python

So it is doable, but it would lead to a significant performance regression for users interested in just the first derivative!

I think a better approach is to simply write the low-level adjoint method to support nth derivatives manually.

Co-authored-by: Nathan Killoran <co9olguy@users.noreply.github.com>

josh146 · 2021-08-18T09:28:33Z

[ch7165]

josh146 added 8 commits August 4, 2021 22:38

Added differentiable VJP transform

0c57919

linting

674604b

more tests

688f4a2

linting

9a8476b

add tests

0413307

add comment

6b44284

fix

35e1848

more

67e216a

josh146 requested a review from co9olguy August 5, 2021 17:59

josh146 commented Aug 5, 2021

View reviewed changes

pennylane/interfaces/batch/__init__.py Show resolved Hide resolved

typos

d0e40f8

co9olguy reviewed Aug 5, 2021

View reviewed changes

Apply suggestions from code review

89bdd8d

Co-authored-by: Nathan Killoran <co9olguy@users.noreply.github.com>

josh146 added 6 commits August 9, 2021 15:51

fixes

f415d9f

Merge branch 'master' into vjp-transform

a4592da

merge

9153c45

add tests

6c5dc72

more tests

11f20b3

renamed

122194c

josh146 changed the title ~~[WIP] Differentiable batch execute using autograd~~ Differentiable batch execute using autograd Aug 9, 2021

josh146 requested a review from co9olguy August 9, 2021 12:55

josh146 added the review-ready 👌 PRs which are ready for review by someone from the core team. label Aug 9, 2021

josh146 commented Aug 9, 2021

View reviewed changes

typo

e98c835

josh146 mentioned this pull request Aug 9, 2021

Adds a differentiable VJP transform #1494

Merged

mariaschuld and others added 5 commits August 9, 2021 19:19

Merge branch 'master' into vjp-transform

3cbfc22

merge master

b5e4665

Merge branch 'master' into vjp-transform

59572bf

merge master

378bcd4

merge master

1ca227a

josh146 mentioned this pull request Aug 11, 2021

Add caching to the autograd batch interface #1508

Merged

co9olguy approved these changes Aug 14, 2021

View reviewed changes

Apply suggestions from code review

2057b86

Co-authored-by: Nathan Killoran <co9olguy@users.noreply.github.com>

Base automatically changed from vjp-transform to master August 16, 2021 12:57

josh146 added 4 commits August 16, 2021 21:26

more

6d77f3e

linting

c1ccb0d

linting

6aebd37

suggested changes

bf505e9

josh146 merged commit 683fbfa into master Aug 16, 2021

josh146 deleted the batch-autograd branch August 16, 2021 14:14

josh146 mentioned this pull request Aug 17, 2021

Differentiable batch execute using TensorFlow #1542

Merged

This was referenced Aug 18, 2021

Differentiable batch execute using PyTorch #1549

Merged

Add a simple API for transforms that generate multiple tapes #1493

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Differentiable batch execute using autograd #1501

Differentiable batch execute using autograd #1501

josh146 commented Aug 5, 2021 •

edited

Loading

github-actions bot commented Aug 5, 2021

josh146 Aug 5, 2021

co9olguy Aug 5, 2021

josh146 Aug 6, 2021

josh146 Aug 5, 2021

josh146 Aug 5, 2021

josh146 Aug 5, 2021

co9olguy left a comment

co9olguy Aug 5, 2021

codecov bot commented Aug 6, 2021 •

edited

Loading

josh146 commented Aug 9, 2021

josh146 Aug 9, 2021

co9olguy Aug 14, 2021

josh146 Aug 15, 2021

josh146 Aug 9, 2021

co9olguy left a comment

co9olguy Aug 14, 2021

josh146 Aug 15, 2021 •

edited

Loading

co9olguy Aug 14, 2021

josh146 Aug 15, 2021

josh146 commented Aug 18, 2021

		from .autograd import execute as execute_autograd


		def execute(tapes, device, gradient_fn, interface="autograd", accumulation="forward"):

		@pytest.mark.parametrize("execute_kwargs", execute_kwargs)
		class TestAutogradExecuteIntegration:

		class TestHigherOrderDerivatives:
		"""Test that the autograd execute function can be differentiated"""

		"""Since the adjoint hessian is not a differentiable transform,
		higher-order derivatives are not supported."""

Differentiable batch execute using autograd #1501

Differentiable batch execute using autograd #1501

Conversation

josh146 commented Aug 5, 2021 • edited Loading

github-actions bot commented Aug 5, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

co9olguy left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov bot commented Aug 6, 2021 • edited Loading

Codecov Report

josh146 commented Aug 9, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

co9olguy left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

josh146 Aug 15, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

josh146 commented Aug 18, 2021

josh146 commented Aug 5, 2021 •

edited

Loading

codecov bot commented Aug 6, 2021 •

edited

Loading

josh146 Aug 15, 2021 •

edited

Loading