Add adjoint differentiation method #1032

trbromley · 2021-01-24T19:49:45Z

This method adds the diff_method="adjoint" method based on the paper here.

trbromley

Thanks @josh146! This device-based method is really nice! (sorry for this long train-of-thought comment)

What are the pros and cons of dev.rewind_jacobian() vs. RewindTape? Here it looks like we might be able to save a bit on QNode-level testing 🤔 Otherwise it seems to be a more philosophical question of when a Jacobian method should be associated to a device vs. a tape. For rewind, we require the device to be statevector-based and have some methods (e.g., _apply_unitary) so it is somewhat device based, but general enough to be in QubitDevice 🤔 Maybe the only true device-independent methods are finite-diff and param-shift, which only require access to expectation values.

On the other hand, one downside of having it as a device-based method is that one cannot access a RewindTape. Although, maybe this doesn't matter, since one could always do dev.rewind_jacobian(tape) for any choice of tape. I find the border between tape and device can be a bit unclear at times 🤔

Overall, maybe we should just see which one is fastest! I don't see any reason for one to be faster than the other though. In which case let's just choose one and see if it causes any problems down the line.

pennylane/_qubit_device.py

pennylane/tape/qnode.py

tests/tape/interfaces/test_qnode_tf.py

pennylane/_qubit_device.py

josh146

Otherwise it seems to be a more philosophical question of when a Jacobian method should be associated to a device vs. a tape.

I definitely see this as more of a 'philosophical' design decision.

For rewind, we require the device to be statevector-based and have some methods (e.g., _apply_unitary) so it is somewhat device based, but general enough to be in QubitDevice 🤔 Maybe the only true device-independent methods are finite-diff and param-shift, which only require access to expectation values.

I definitely agree here. The way I see it:

Parameter-shift and finite-diff inspect the provided circuit, and create new circuits for gradient evaluations (that can be batch evaluated).
- There is no dependence on device-level control, it is fully device independent.
- There is also value to end users for being able to access/draw/manipulate the generated tapes.
Rewind could be written in the above form as well, but there are subtleties.
- It accesses the low-level device API (including current private methods of default.qubit).
- It will likely be easier to 'cache' the forward pass evaluation than if we did it at the tape level

The big one for me though: since we are modifying the state of a device, rather than generating device independent tapes for later evaluation, this feels much more like it should live on the device. Is there a use case for being able to generate rewind tapes independent of devices?

Note: regardless which approach we go with, we will need to make device._apply_operation and device._apply_unitary proper abstract methods of our device API in order to support external devices and not just default.qubit.

On the other hand, one downside of having it as a device-based method is that one cannot access a RewindTape. Although, maybe this doesn't matter, since one could always do dev.rewind_jacobian(tape) for any choice of tape. I find the border between tape and device can be a bit unclear at times 🤔

Think of it like this:

The tape is simply a data-structure for representing a quantum circuit.
A device accepts a tape, and executes it, returning numeric results.

If we need a serializable representation of the circuit, than it should be a tape. Otherwise, it doesn't need to be.

Overall, maybe we should just see which one is fastest! I don't see any reason for one to be faster than the other though. In which case let's just choose one and see if it causes any problems down the line.

Let's run ASV on it!

pennylane/_qubit_device.py

pennylane/tape/qnode.py

tests/tape/interfaces/test_qnode_tf.py

trbromley · 2021-01-25T14:36:07Z

Comparison between the last commit on this branch (a426cca - left column) and the last commit for RewindTape (ffac43d - right column):

       before           after         ratio
     [a426cca2]       [ffac43d4]
     <rewind-on-device>       <rewind_tape>
       6.09±0.3ms       5.87±0.2ms     0.96  asv.core_suite.CircuitEvaluation.time_circuit(10, 3)
       10.7±0.7ms       10.5±0.2ms     0.98  asv.core_suite.CircuitEvaluation.time_circuit(10, 6)
       15.2±0.2ms       15.4±0.3ms     1.01  asv.core_suite.CircuitEvaluation.time_circuit(10, 9)
       2.27±0.2ms       2.07±0.1ms    ~0.91  asv.core_suite.CircuitEvaluation.time_circuit(2, 3)
       2.96±0.2ms       2.94±0.1ms     1.00  asv.core_suite.CircuitEvaluation.time_circuit(2, 6)
       4.39±0.3ms       3.61±0.2ms    ~0.82  asv.core_suite.CircuitEvaluation.time_circuit(2, 9)
       3.86±0.2ms       3.36±0.1ms    ~0.87  asv.core_suite.CircuitEvaluation.time_circuit(5, 3)
       5.67±0.3ms       5.35±0.3ms     0.94  asv.core_suite.CircuitEvaluation.time_circuit(5, 6)
       7.56±0.2ms       8.05±0.5ms     1.07  asv.core_suite.CircuitEvaluation.time_circuit(5, 9)
      4.65±0.08ms      4.75±0.08ms     1.02  asv.core_suite.GradientComputation.time_gradient_autograd(2, 3)
       7.54±0.2ms       8.11±0.3ms     1.08  asv.core_suite.GradientComputation.time_gradient_autograd(2, 6)
       10.6±0.2ms       10.2±0.2ms     0.97  asv.core_suite.GradientComputation.time_gradient_autograd(5, 3)
       19.1±0.7ms       19.4±0.8ms     1.01  asv.core_suite.GradientComputation.time_gradient_autograd(5, 6)
       9.62±0.7ms         9.51±1ms     0.99  asv.core_suite.GradientComputation.time_gradient_tf(2, 3)
+     13.7±0.08ms        24.5±10ms     1.78  asv.core_suite.GradientComputation.time_gradient_tf(2, 6)
       16.5±0.1ms        24.6±10ms    ~1.49  asv.core_suite.GradientComputation.time_gradient_tf(5, 3)
         30.6±1ms         38.5±5ms    ~1.26  asv.core_suite.GradientComputation.time_gradient_tf(5, 6)
      4.39±0.06ms         9.27±6ms    ~2.11  asv.core_suite.GradientComputation.time_gradient_torch(2, 3)
       7.20±0.3ms         13.2±6ms    ~1.83  asv.core_suite.GradientComputation.time_gradient_torch(2, 6)
       9.68±0.2ms         13.3±4ms    ~1.37  asv.core_suite.GradientComputation.time_gradient_torch(5, 3)
       17.9±0.7ms       18.1±0.3ms     1.01  asv.core_suite.GradientComputation.time_gradient_torch(5, 6)
          148±5ms         151±10ms     1.02  asv.core_suite.Optimization.time_optimization_autograd
          257±8ms          265±6ms     1.03  asv.core_suite.Optimization.time_optimization_tf
          141±1ms          142±3ms     1.01  asv.core_suite.Optimization.time_optimization_torch
       6.66±0.9ms       5.91±0.1ms    ~0.89  asv.device_suite.CircuitEvaluation.time_circuit('default.qubit', 10, 3)
       11.7±0.8ms       10.6±0.2ms    ~0.91  asv.device_suite.CircuitEvaluation.time_circuit('default.qubit', 10, 6)
-       28.0±10ms       15.0±0.2ms     0.53  asv.device_suite.CircuitEvaluation.time_circuit('default.qubit', 10, 9)
      2.18±0.06ms       2.12±0.1ms     0.97  asv.device_suite.CircuitEvaluation.time_circuit('default.qubit', 2, 3)
      3.03±0.09ms       2.88±0.2ms     0.95  asv.device_suite.CircuitEvaluation.time_circuit('default.qubit', 2, 6)
      4.00±0.04ms       3.98±0.3ms     1.00  asv.device_suite.CircuitEvaluation.time_circuit('default.qubit', 2, 9)
       3.60±0.2ms       3.49±0.2ms     0.97  asv.device_suite.CircuitEvaluation.time_circuit('default.qubit', 5, 3)
         7.49±2ms       5.69±0.2ms    ~0.76  asv.device_suite.CircuitEvaluation.time_circuit('default.qubit', 5, 6)
-        11.1±3ms      7.44±0.08ms     0.67  asv.device_suite.CircuitEvaluation.time_circuit('default.qubit', 5, 9)

Comparison between the last commit on this branch (a426cca - left column) and the last commit on master (cf3cdb1 - right column).

       before           after         ratio
     [a426cca2]       [cf3cdb17]
     <rewind-on-device>       <master>  
+      6.09±0.3ms       15.9±0.6ms     2.61  asv.core_suite.CircuitEvaluation.time_circuit(10, 3)
+      10.7±0.7ms         28.7±2ms     2.69  asv.core_suite.CircuitEvaluation.time_circuit(10, 6)
+      15.2±0.2ms         40.5±1ms     2.67  asv.core_suite.CircuitEvaluation.time_circuit(10, 9)
+      2.27±0.2ms       4.75±0.3ms     2.09  asv.core_suite.CircuitEvaluation.time_circuit(2, 3)
+      2.96±0.2ms       6.59±0.3ms     2.23  asv.core_suite.CircuitEvaluation.time_circuit(2, 6)
+      4.39±0.3ms       8.68±0.2ms     1.98  asv.core_suite.CircuitEvaluation.time_circuit(2, 9)
+      3.86±0.2ms       8.22±0.1ms     2.13  asv.core_suite.CircuitEvaluation.time_circuit(5, 3)
+      5.67±0.3ms       14.2±0.3ms     2.50  asv.core_suite.CircuitEvaluation.time_circuit(5, 6)
+      7.56±0.2ms         20.6±1ms     2.73  asv.core_suite.CircuitEvaluation.time_circuit(5, 9)
+     4.65±0.08ms       6.20±0.4ms     1.33  asv.core_suite.GradientComputation.time_gradient_autograd(2, 3)
+      7.54±0.2ms       9.96±0.5ms     1.32  asv.core_suite.GradientComputation.time_gradient_autograd(2, 6)
+      10.6±0.2ms       13.6±0.5ms     1.28  asv.core_suite.GradientComputation.time_gradient_autograd(5, 3)
+      19.1±0.7ms         25.5±2ms     1.33  asv.core_suite.GradientComputation.time_gradient_autograd(5, 6)
+      9.62±0.7ms       21.3±0.8ms     2.21  asv.core_suite.GradientComputation.time_gradient_tf(2, 3)
+     13.7±0.08ms         36.1±1ms     2.63  asv.core_suite.GradientComputation.time_gradient_tf(2, 6)
+      16.5±0.1ms         52.2±1ms     3.16  asv.core_suite.GradientComputation.time_gradient_tf(5, 3)
+        30.6±1ms          103±8ms     3.38  asv.core_suite.GradientComputation.time_gradient_tf(5, 6)
+     4.39±0.06ms       10.1±0.7ms     2.31  asv.core_suite.GradientComputation.time_gradient_torch(2, 3)
+      7.20±0.3ms       28.5±0.6ms     3.96  asv.core_suite.GradientComputation.time_gradient_torch(2, 6)
+      9.68±0.2ms         51.6±1ms     5.33  asv.core_suite.GradientComputation.time_gradient_torch(5, 3)
+      17.9±0.7ms         207±20ms    11.55  asv.core_suite.GradientComputation.time_gradient_torch(5, 6)
+         148±5ms         217±10ms     1.46  asv.core_suite.Optimization.time_optimization_autograd
+         257±8ms         821±50ms     3.20  asv.core_suite.Optimization.time_optimization_tf
+         141±1ms       1.29±0.05s     9.12  asv.core_suite.Optimization.time_optimization_torch
+      6.66±0.9ms       16.6±0.3ms     2.50  asv.device_suite.CircuitEvaluation.time_circuit('default.qubit', 10, 3)
+      11.7±0.8ms       31.7±0.6ms     2.71  asv.device_suite.CircuitEvaluation.time_circuit('default.qubit', 10, 6)
        28.0±10ms       44.7±0.3ms    ~1.59  asv.device_suite.CircuitEvaluation.time_circuit('default.qubit', 10, 9)
+     2.18±0.06ms         4.67±1ms     2.14  asv.device_suite.CircuitEvaluation.time_circuit('default.qubit', 2, 3)
+     3.03±0.09ms         7.87±1ms     2.60  asv.device_suite.CircuitEvaluation.time_circuit('default.qubit', 2, 6)
+     4.00±0.04ms         11.8±2ms     2.95  asv.device_suite.CircuitEvaluation.time_circuit('default.qubit', 2, 9)
+      3.60±0.2ms         10.8±2ms     2.99  asv.device_suite.CircuitEvaluation.time_circuit('default.qubit', 5, 3)
+        7.49±2ms         16.7±2ms     2.23  asv.device_suite.CircuitEvaluation.time_circuit('default.qubit', 5, 6)
+        11.1±3ms         23.8±2ms     2.14  asv.device_suite.CircuitEvaluation.time_circuit('default.qubit', 5, 9)

.github/CHANGELOG.md

pennylane/_qubit_device.py

Co-authored-by: antalszava <antalszava@gmail.com>

trbromley · 2021-01-25T22:47:53Z

Since adjoint_jacobian is in QubitDevice, but _apply_operation and _apply_unitary are in DefaultQubit, we should at least define _apply_operation and _apply_unitary in QubitDevice, then just raise NotImplementedError

That's a good point! However, are we sure that these are the methods we want to "make official" going forward? Would it be worth waiting in case we decide to do a more major refactor of the inner workings of default.qubit (with potentially some parts being renamed and/or moved to QubitDevice)?

pennylane/operation.py

thisac

Really great work! 🚀 Only a small number of comments/thoughts, but it seems to be working very well. Haven't looked through the tests yet, but I'll do that right away.

pennylane/tape/qnode.py

pennylane/_qubit_device.py

pennylane/operation.py

pennylane/_qubit_device.py

.github/CHANGELOG.md

pennylane/operation.py

thisac · 2021-01-25T23:27:16Z

pennylane/tape/qnode.py

+                QNode._get_parameter_shift_tape(device),
+                interface,
+                device,
+                {"method": "analytic"},


What's the difference between method and jacobian_method and why would both be needed?

pennylane/tape/qnode.py

Co-authored-by: Theodor <theodor@xanadu.ai>

trbromley

Thanks @thisac!

.github/CHANGELOG.md

pennylane/_qubit_device.py

pennylane/operation.py

pennylane/tape/qnode.py

thisac · 2021-01-26T01:52:32Z

Didn't really have any comments on the tests. I think they look great (although I'm a bit tired right now, so I might have been a bit quick on checking the final tests). 😆

josh146

Looks great @trbromley, @albi3ro, et. al! I left comments and suggestions, but nothing that would block merging.

One thought that crossed my mind - if this is very specific to default.qubit, should this just be a method on default.qubit instead?

pennylane/_qubit_device.py

josh146 · 2021-01-26T03:44:25Z

pennylane/_qubit_device.py

+
+        phi = self._reshape(self.state, [2] * self.num_wires)
+
+        lambdas = [self._apply_operation(phi, obs) for obs in tape.observables]


Just double checking, but we can't do the following:

new_tape = some_func(old_tape) self.execute(new_tape)

because here we are applying hermitian matrices to the statevector, which the tape/device won't understand how to do without going low level?

The new_tape for a specific obs would look like:

with qml.QuantumTape() as new_tape: qml.QubitStateVector(phi, wires=range(wires)) obs # actually not sure how this line would look qml.state()

I don't see a problem if it's PauliX but for an arbitrary Hermitian this wouldn't (?) work. Maybe we could hack it in to work though, but we'd lose the state being normalized.

I also feel like the "many-tapes" approach might be inefficient, e.g., to make lots of tapes and then do device execution (which includes lots of postprocessing), when we just need to evolve the state by one gate.

pennylane/tape/qnode.py

tests/test_qubit_device.py

josh146 · 2021-01-26T04:00:49Z

tests/test_qubit_device.py

+        grad_F = tape.jacobian(dev, method="numeric")
+        grad_D = dev.adjoint_jacobian(tape)


After the functional refactor, this will be a lot more consistent.

tests/test_qubit_device.py

pennylane/operation.py

Co-authored-by: Josh Izaac <josh146@gmail.com>

… rewind-on-device

co9olguy · 2021-01-26T16:45:55Z

🚀

josh146 and others added 4 commits January 24, 2021 23:41

Move rewind diff method to device

9fc4301

Move rewind diff method to device

994a732

more

f402118

Fix import

a426cca

trbromley commented Jan 24, 2021

View reviewed changes

josh146 reviewed Jan 25, 2021

View reviewed changes

trbromley mentioned this pull request Jan 25, 2021

Add new differentiation method based on rewinding the tape [PR1] #1029

Closed

trbromley added 4 commits January 25, 2021 10:07

Merge branch 'master' into rewind-on-device

0c62766

Apply suggestions

ccd080b

Add to changelog

d823c52

Rename rewind to adjoint

8cccca8

albi3ro reviewed Jan 25, 2021

View reviewed changes

.github/CHANGELOG.md Outdated Show resolved Hide resolved

albi3ro reviewed Jan 25, 2021

View reviewed changes

pennylane/_qubit_device.py Outdated Show resolved Hide resolved

albi3ro reviewed Jan 25, 2021

View reviewed changes

pennylane/_qubit_device.py Outdated Show resolved Hide resolved

trbromley added 10 commits January 25, 2021 13:38

Add tests

89a78fa

Add tests

5b2e094

Add tests

902514f

Remove spacing

1ee5eeb

Add to tests

fe35a8b

Add skips

91c75ad

Fix CI

ef67430

Fix CI

9ad2026

Add docstring

8982090

Respond to comments

55afc93

trbromley changed the title ~~Rewind on device [WIP]~~ Add rewind differentiation method Jan 25, 2021

trbromley self-assigned this Jan 25, 2021

trbromley added the review-ready 👌 PRs which are ready for review by someone from the core team. label Jan 25, 2021

trbromley changed the title ~~Add rewind differentiation method~~ Add adjoint differentiation method Jan 25, 2021

trbromley marked this pull request as ready for review January 25, 2021 20:09

trbromley and others added 4 commits January 25, 2021 17:21

Update pennylane/_qubit_device.py

dfb6afa

Co-authored-by: antalszava <antalszava@gmail.com>

Move operation_derivative

e0a9d85

Apply black

6c0daa6

Tidy imports

e975cef

trbromley added 2 commits January 25, 2021 17:50

Reword

21ba5c7

Move position

4cf10fc

trbromley commented Jan 25, 2021

View reviewed changes

pennylane/operation.py Show resolved Hide resolved

Update docstring:

523997e

thisac reviewed Jan 25, 2021

View reviewed changes

Apply suggestions from code review

ad042a1

Co-authored-by: Theodor <theodor@xanadu.ai>

trbromley commented Jan 26, 2021

View reviewed changes

Add test for coverage

ce7865f

josh146 approved these changes Jan 26, 2021

View reviewed changes

josh146 and others added 12 commits January 26, 2021 12:06

Merge branch 'master' into rewind-on-device

d1e24b5

Merge branch 'master' into rewind-on-device

ff5123c

Add note

d098434

Update import

8235ca6

Update pennylane/_qubit_device.py

46daf2a

Co-authored-by: Josh Izaac <josh146@gmail.com>

Move tests

0501bf2

Merge branch 'rewind-on-device' of github.com:XanaduAI/pennylane into…

561b473

… rewind-on-device

tidy

6f9d1d8

Update

83c47ae

Fix test

c431279

Update docstring

f5f20b7

Update docstring

e585da5

trbromley merged commit 16e3080 into master Jan 26, 2021

trbromley deleted the rewind-on-device branch January 26, 2021 15:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add adjoint differentiation method #1032

Add adjoint differentiation method #1032

trbromley commented Jan 24, 2021 •

edited

Loading

trbromley left a comment

josh146 left a comment

trbromley commented Jan 25, 2021 •

edited

Loading

trbromley commented Jan 25, 2021

thisac left a comment

thisac Jan 25, 2021

trbromley left a comment

thisac commented Jan 26, 2021

josh146 left a comment

josh146 Jan 26, 2021

trbromley Jan 26, 2021

josh146 Jan 26, 2021

co9olguy commented Jan 26, 2021


		phi = self._reshape(self.state, [2] * self.num_wires)

		lambdas = [self._apply_operation(phi, obs) for obs in tape.observables]

		grad_F = tape.jacobian(dev, method="numeric")
		grad_D = dev.adjoint_jacobian(tape)

Add adjoint differentiation method #1032

Add adjoint differentiation method #1032

Conversation

trbromley commented Jan 24, 2021 • edited Loading

trbromley left a comment

Choose a reason for hiding this comment

josh146 left a comment

Choose a reason for hiding this comment

trbromley commented Jan 25, 2021 • edited Loading

trbromley commented Jan 25, 2021

thisac left a comment

Choose a reason for hiding this comment

thisac Jan 25, 2021

Choose a reason for hiding this comment

trbromley left a comment

Choose a reason for hiding this comment

thisac commented Jan 26, 2021

josh146 left a comment

Choose a reason for hiding this comment

josh146 Jan 26, 2021

Choose a reason for hiding this comment

trbromley Jan 26, 2021

Choose a reason for hiding this comment

josh146 Jan 26, 2021

Choose a reason for hiding this comment

co9olguy commented Jan 26, 2021

trbromley commented Jan 24, 2021 •

edited

Loading

trbromley commented Jan 25, 2021 •

edited

Loading