Refactor the tape.jacobian method to batch execute tapes #840

mariaschuld · 2020-10-07T06:41:42Z

Note: the below is written by @josh146

Context:

In the current core, the gradient methods freely intersperse quantum computation and classical processing. That is, the quantum device will be executed internally and directly whenever a quantum result is needed.

It is advantageous to instead split up the quantum and classical processing:

Generate all the quantum tapes corresponding to quantum processing required
Execute all tapes
Classically post-process the evaluated tape results.

This allows us finer control over the quantum execution --- for example, if a device supports execution of multiple circuits at once.

Description of changes:

The Operation class now has the __copy__ special method defined. This allows us to ensure that, when a shallow copy is performed of an operation, the mutable list storing the operation parameters is also shallow copied. Both the old operation and the copied operation will continue to share the same parameter data,
```
>>> import copy
>>> op = qml.RX(0.2, wires=0)
>>> op2 = copy.copy(op)
>>> op.data[0] is op2.data[0]
True
```
however the list container is not a reference:
```
>>> op.data is op2.data
False
```
This allows us to modify the parameters of the copied operation, without mutating
the parameters of the original operation:
```
>>> op2.data[0] = 1
>>> op
RX(0.2, wires=[0])
>>> op2
RX(1, wires=[0])
```
Likewise, the Tensor.__copy__ method is defined to make sure that the obs list is shallow copied
Likewise, the MeasurementProcess.__copy__ method is defined to make sure that the obs list is shallow copied
The QuantumTape.copy method has been tweaked so that:
- Optionally, the tape's operations are shallow copied in addition to the tape by passing the copy_operations=True boolean flag. This allows the copied tape's parameters to be mutated without affecting the original tape's parameters. (Note: the two tapes will share parameter data until one of the tapes has their parameter list modified.)
- Copied tapes continue to share the same caching dictionary as the original tape.
- Copied tapes can be cast to another QuantumTape subclass by passing the tape_cls keyword argument.
All gradient methods, including finite-differences, the qubit/CV parameter-shift rules, and reversible differentiation, have been modified. Rather than returning the evaluated gradients directly, they now return a tuple containing the required quantum and classical processing steps:
```
def gradient_method(idx, param, **options):
    # generate the quantum tapes that must be computed
    # to determine the quantum gradient
    tapes = quantum_gradient_tapes(self)

    def processing_fn(results):
        # perform classical processing on the evaluated tapes
        # returning the evaluated quantum gradient
        return classical_processing(results)

    return tapes, processing_fn
```
This is similar in structure to TensorFlow's custom_gradient API.

In addition, we make use of the ability to cast copied tapes to different QuantumTape subclasses to ensure that any copied tape created by the gradient methods no longer has an interface. This is because the we are currently behind the interface boundary; we can safely assume that all tape parameters have already been unwrapped where needed.
Finally, the tape.jacobian() method has been modified to follow the following structure:
- Loop over all parameters, and accumulate the (a) quantum tapes, and (b) classical post-processing functions required.
- Evaluate the accumulated quantum tapes
- Apply the post-processing functions to the evaluated tape results

Benefits:

Quantum and classical processing is cleanly separated; in the future, we can take advantage of devices that support batched execution.
Tapes can now be safely shallow copied and parameters mutated without affecting all tapes.
The new form of the gradient methods suggests a rough API that we might want to standardize at some point, to make it easier for developers to contribute new gradient methods.
When the tapes are copied, we are essentially doing two shallow copies; a shallow copy of the tape structure, and a shallow copy of the operation list. This is much more efficient than attempting to do a recursive deep copy.
The reversible method is now very cleanly written as a tape transform with state output, rather than a list of operations and calls to private device methods.
The gradient methods are now written in the form tape -> tape(s) + classical processing. In the future, we might want to elevate them as user-facing QNode transforms, for use by advanced users that want to see the generated quantum gradient tapes.

Drawbacks:

While removing the device/execution specific logic from the gradient methods was generally pretty smooth, both the CV parameter-shift and reversible backprop methods continue to require access to the device.
- The CV parameter shift rule requires the device to ensure that the PolyXP observable is supported if second order shifts are required. If not, it falls back to finite-differences.
- The reversible method requires access to the device's pre-rotated statevector.
Both methods also require the device wire labeling, as they manipulate observables.

Currently, we get around this by having the tape.jacobian method simply pass the device as a keyword argument to the gradient methods. We proposing leaving this for now; while inelegant, there is no clear and quick solution. (A long term solution would perhaps be to have the gradient methods register the operations/conditions they support.)
The gradient methods continue to act on single parameters in isolation, with tape.jacobian() performing a loop over parameters and attempting to compute fallback methods. This is not ideal, for two reasons:
- If a particular gradient method has a weird constraint that determines fallback (e.g., the CV param shift rule), the method needs to take it into account, or override tape.jacobian(). This spreads logic for fallback between both tape.jacobian and the gradient method.
- More importantly, a lot of gradient methods require knowledge from previous parameter gradient calculations; for example, where the gradient of parameter n can be made more efficient by re-using info from the calculation for parameter n-1, n-2, etc.. The current approach, where gradient methods are computed in isolation per gradient method call, does not allow for this.
  
  First-order finite differences (requiring a pre-execution to determine y0) and the variance parameter-shift methods, are two current examples that are hampered by the existing approach.
Thus, there is flexibility and efficiency gains in allowing gradient method to generate tapes for all parameters, rather than single parameters in isolation. Of course, a downside is that we lose the ability for users to fine-grain the method manually per parameter. Although, I'm not so convinced this is a common/wanted use-case; it is more likely that users want to fine-grain the method per gate, not per parameter.

codecov · 2020-10-07T06:47:24Z

Codecov Report

Merging #840 into master will increase coverage by 0.02%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master     #840      +/-   ##
==========================================
+ Coverage   97.79%   97.82%   +0.02%     
==========================================
  Files         137      137              
  Lines        9207     9358     +151     
==========================================
+ Hits         9004     9154     +150     
- Misses        203      204       +1

Impacted Files	Coverage Δ
pennylane/tape/qnode.py	`98.17% <ø> (-0.61%)`	⬇️
pennylane/_device.py	`96.00% <100.00%> (+0.14%)`	⬆️
pennylane/_qubit_device.py	`98.87% <100.00%> (+0.04%)`	⬆️
pennylane/operation.py	`95.69% <100.00%> (+0.24%)`	⬆️
pennylane/tape/measure.py	`97.14% <100.00%> (+0.26%)`	⬆️
pennylane/tape/tapes/cv_param_shift.py	`99.34% <100.00%> (+0.13%)`	⬆️
pennylane/tape/tapes/jacobian_tape.py	`99.21% <100.00%> (+0.23%)`	⬆️
pennylane/tape/tapes/qubit_param_shift.py	`100.00% <100.00%> (ø)`
pennylane/tape/tapes/reversible.py	`100.00% <100.00%> (ø)`
pennylane/tape/tapes/tape.py	`99.25% <100.00%> (+0.04%)`	⬆️
... and 1 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 939b80f...8250b54. Read the comment docs.

…to gradient_sprint

Co-authored-by: Nathan Killoran <co9olguy@users.noreply.github.com>

…to gradient_sprint

mariaschuld · 2020-10-15T08:13:02Z

pennylane/tape/tapes/reversible.py


-        vec1_indices = ABC[: device.num_wires]
+        mat = np.reshape(obs.matrix, [2] * len(obs.wires) * 2)


But that now makes sense, I was wondering why the device is used.

One thing I was wondering too is hardware compatibility. Since we can always write a qubit circuit with PaulZ observables (by rotating the basis), and PauliZ is unitary, can't we implement reversible diff by applying the original unitary, observable, and diffed unitary inverse and measure the overlap with the zero state?

pennylane/tape/tapes/reversible.py

Co-authored-by: Nathan Killoran <co9olguy@users.noreply.github.com>

co9olguy · 2020-10-15T22:21:40Z

tests/tape/tapes/test_tape.py

+        # Since they have unique operations, mutating the parameters
+        # on one tape will *not* affect the parameters on another tape
+        new_params = [np.array([0, 0]), 0.2]
+        tape.set_parameters(new_params)


I wonder if we'll ever hit a point where this nonstandard copying behaviour impacts developers or power users?

Good question. By default, tape.copy() will perform a single shallow copy --- as users or developers would expect. It is only when copy_operations=True (so something the user/developer will have to explicitly request) does the second shallow copy of operations take place. Since this is not default behaviour, I'm not too worried.

I'm more worried about the non-standard behaviour of deep copying. Users/developers expect deep copy to copy everything; however the parameters are not copied (since PyTorch doesn't allow this). I think fixing this is a good thing; it's easy to get everything working with copy.deepcopy(tape), and then realize only later that while autograd and TF work, Torch doesn't.

Perhaps the deep copy could try and copy the parameters, and only skip them if an exception occurs (e.g., Torch)?

tests/tape/tapes/test_tape.py

co9olguy · 2020-10-15T22:24:16Z

tests/tape/tapes/test_tape.py

+        res2 = copied_tape.execute(dev)
+
+        assert np.allclose(res1, res2, atol=tol, rtol=0)
+        spy.assert_called_once()


Awesome tests 💪

trbromley · 2020-10-19T15:52:52Z

pennylane/operation.py

+
+        return copied_op
+
+    def __deepcopy__(self, memo):


Is there an issue/forum post to see more context?

pennylane/tape/tapes/cv_param_shift.py

pennylane/tape/tapes/jacobian_tape.py

trbromley · 2020-10-19T16:02:50Z

pennylane/tape/tapes/jacobian_tape.py

+                """
+                res0 = np.array(results[0])
+                res1 = np.array(results[1])
+                return (res0 - res1) / h


would this be 2h rather than h?

No, should be h here, because the shift is +- h/2 😆

I have to admit, I did a double take, since I'm so used to seeing 2h in the denominator of centered finite differences.

pennylane/tape/tapes/qubit_param_shift.py

Co-authored-by: Tom Bromley <49409390+trbromley@users.noreply.github.com>

* add batch_execute functions to device and qubit_device * black * backup * polish * make tests pass * add test for batch_execute * polish * polish2 * skip tests if interface not imported * fix test * Update pennylane/_device.py Co-authored-by: Josh Izaac <josh146@gmail.com> * Josh review * backup * reset fix implemented, still 4 cache tests not passing * backup * improve tests * polisg * Remove tests and update docstrings * Update pennylane/_qubit_device.py Co-authored-by: Josh Izaac <josh146@gmail.com> * Update pennylane/_device.py Co-authored-by: Josh Izaac <josh146@gmail.com> * Update tests/test_device.py Co-authored-by: Josh Izaac <josh146@gmail.com> * Update tests/test_device.py Co-authored-by: Nathan Killoran <co9olguy@users.noreply.github.com> * Update tests/test_device.py Co-authored-by: Nathan Killoran <co9olguy@users.noreply.github.com> * Update tests/test_qubit_device.py Co-authored-by: Nathan Killoran <co9olguy@users.noreply.github.com> * Update tests/test_qubit_device.py Co-authored-by: Nathan Killoran <co9olguy@users.noreply.github.com> * Update tests/test_qubit_device.py Co-authored-by: Nathan Killoran <co9olguy@users.noreply.github.com> * Update tests/test_device.py Co-authored-by: Nathan Killoran <co9olguy@users.noreply.github.com> * improve tests * add empty tape tests * newline Co-authored-by: Josh Izaac <josh146@gmail.com> Co-authored-by: trbromley <brotho02@gmail.com> Co-authored-by: Nathan Killoran <co9olguy@users.noreply.github.com>

dummy change

e20db32

mariaschuld and others added 28 commits October 7, 2020 09:23

rewrite num_diff method internally

a39eedc

undo dummy change needed to open pr

b339b86

more

6fee9d4

use deep copy

c57c69e

fix conflicts

52a8f2d

rewrite param shift method

a99957d

more

0723274

more

59fd4cf

backup

a52363d

fix conflicts

358a5fa

polish param shift method

5587f69

docs

7faea05

backup

e444c9c

fix conflicts

5340aee

typos

e905622

backup

7673f96

parameter shift ready to go

dbd48fb

make postprocessing of finite diff depend on order

df0ea32

fix operation copies

d23caa5

Merge branch 'gradient_sprint' of github.com:PennyLaneAI/pennylane in…

4829691

…to gradient_sprint

backup

9256452

Merge branch 'gradient_sprint' of github.com:PennyLaneAI/pennylane in…

f1fda0f

…to gradient_sprint

attempt to fix tests

99866ec

fixing tests

c699f24

temporarily fix copying

8d499d1

finish param shift var

15516db

blacking the code

f39eeed

blacking the code

c4b891e

mariaschuld and others added 13 commits October 14, 2020 08:12

Update pennylane/tape/tapes/reversible.py

b2d909d

Co-authored-by: Nathan Killoran <co9olguy@users.noreply.github.com>

Merge branch 'master' into gradient_sprint

c838fb4

update copy semantics

c5d885d

Update pennylane/tape/tapes/reversible.py

f17658d

Co-authored-by: Nathan Killoran <co9olguy@users.noreply.github.com>

more

3a0f799

suggested changes

46cee4c

suggested changes

369c6d3

Update pennylane/tape/tapes/reversible.py

f23ad30

Co-authored-by: Nathan Killoran <co9olguy@users.noreply.github.com>

suggested changes

e660a9f

Merge branch 'gradient_sprint' of github.com:PennyLaneAI/pennylane in…

22eb014

…to gradient_sprint

added copying tests

a7b239c

more copying tests

75439dd

another test

04ceb5d

mariaschuld commented Oct 15, 2020

View reviewed changes

mariaschuld and others added 2 commits October 15, 2020 10:15

Update pennylane/tape/tapes/reversible.py

1d95213

Update pennylane/tape/tapes/cv_param_shift.py

088460c

Co-authored-by: Nathan Killoran <co9olguy@users.noreply.github.com>

josh146 requested review from co9olguy and trbromley October 15, 2020 12:51

co9olguy approved these changes Oct 15, 2020

View reviewed changes

josh146 and others added 2 commits October 16, 2020 15:31

Update tests/tape/tapes/test_tape.py

942806e

Merge branch 'master' into gradient_sprint

1c6a279

trbromley approved these changes Oct 19, 2020

View reviewed changes

josh146 and others added 6 commits October 20, 2020 08:43

Apply suggestions from code review

11315a4

Co-authored-by: Tom Bromley <49409390+trbromley@users.noreply.github.com>

Merge branch 'master' into gradient_sprint

c110d47

Merge branch 'master' into gradient_sprint

f679a6c

update changelog

663b7ba

fix docstrings

8250b54

josh146 merged commit 2f2ea15 into master Oct 20, 2020

josh146 deleted the gradient_sprint branch October 20, 2020 11:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor the tape.jacobian method to batch execute tapes #840

Refactor the tape.jacobian method to batch execute tapes #840

mariaschuld commented Oct 7, 2020 •

edited by josh146

Loading

codecov bot commented Oct 7, 2020 •

edited

Loading

mariaschuld Oct 15, 2020

co9olguy Oct 15, 2020

josh146 Oct 16, 2020

co9olguy Oct 15, 2020

trbromley Oct 19, 2020

trbromley Oct 19, 2020

josh146 Oct 20, 2020


		vec1_indices = ABC[: device.num_wires]
		mat = np.reshape(obs.matrix, [2] * len(obs.wires) * 2)

Refactor the tape.jacobian method to batch execute tapes #840

Refactor the tape.jacobian method to batch execute tapes #840

Conversation

mariaschuld commented Oct 7, 2020 • edited by josh146 Loading

codecov bot commented Oct 7, 2020 • edited Loading

Codecov Report

mariaschuld Oct 15, 2020

Choose a reason for hiding this comment

co9olguy Oct 15, 2020

Choose a reason for hiding this comment

josh146 Oct 16, 2020

Choose a reason for hiding this comment

co9olguy Oct 15, 2020

Choose a reason for hiding this comment

trbromley Oct 19, 2020

Choose a reason for hiding this comment

trbromley Oct 19, 2020

Choose a reason for hiding this comment

josh146 Oct 20, 2020

Choose a reason for hiding this comment

mariaschuld commented Oct 7, 2020 •

edited by josh146

Loading

codecov bot commented Oct 7, 2020 •

edited

Loading