Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor the tape.jacobian method to batch execute tapes #840

Merged
merged 93 commits into from
Oct 20, 2020

Conversation

mariaschuld
Copy link
Contributor

@mariaschuld mariaschuld commented Oct 7, 2020

Note: the below is written by @josh146

Context:

In the current core, the gradient methods freely intersperse quantum computation and classical processing. That is, the quantum device will be executed internally and directly whenever a quantum result is needed.

It is advantageous to instead split up the quantum and classical processing:

  1. Generate all the quantum tapes corresponding to quantum processing required

  2. Execute all tapes

  3. Classically post-process the evaluated tape results.

This allows us finer control over the quantum execution --- for example, if a device supports execution of multiple circuits at once.

Description of changes:

  • The Operation class now has the __copy__ special method defined. This allows us to ensure that, when a shallow copy is performed of an operation, the mutable list storing the operation parameters is also shallow copied. Both the old operation and the copied operation will continue to share the same parameter data,

    >>> import copy
    >>> op = qml.RX(0.2, wires=0)
    >>> op2 = copy.copy(op)
    >>> op.data[0] is op2.data[0]
    True

    however the list container is not a reference:

    >>> op.data is op2.data
    False

    This allows us to modify the parameters of the copied operation, without mutating
    the parameters of the original operation:

    >>> op2.data[0] = 1
    >>> op
    RX(0.2, wires=[0])
    >>> op2
    RX(1, wires=[0])
  • Likewise, the Tensor.__copy__ method is defined to make sure that the obs list is shallow copied

  • Likewise, the MeasurementProcess.__copy__ method is defined to make sure that the obs list is shallow copied

  • The QuantumTape.copy method has been tweaked so that:

    • Optionally, the tape's operations are shallow copied in addition to the tape by passing the copy_operations=True boolean flag. This allows the copied tape's parameters to be mutated without affecting the original tape's parameters. (Note: the two tapes will share parameter data until one of the tapes has their parameter list modified.)

    • Copied tapes continue to share the same caching dictionary as the original tape.

    • Copied tapes can be cast to another QuantumTape subclass by passing the tape_cls keyword argument.

  • All gradient methods, including finite-differences, the qubit/CV parameter-shift rules, and reversible differentiation, have been modified. Rather than returning the evaluated gradients directly, they now return a tuple containing the required quantum and classical processing steps:

    def gradient_method(idx, param, **options):
        # generate the quantum tapes that must be computed
        # to determine the quantum gradient
        tapes = quantum_gradient_tapes(self)
    
        def processing_fn(results):
            # perform classical processing on the evaluated tapes
            # returning the evaluated quantum gradient
            return classical_processing(results)
    
        return tapes, processing_fn

    This is similar in structure to TensorFlow's custom_gradient API.

    In addition, we make use of the ability to cast copied tapes to different QuantumTape subclasses to ensure that any copied tape created by the gradient methods no longer has an interface. This is because the we are currently behind the interface boundary; we can safely assume that all tape parameters have already been unwrapped where needed.

  • Finally, the tape.jacobian() method has been modified to follow the following structure:

    • Loop over all parameters, and accumulate the (a) quantum tapes, and (b) classical post-processing functions required.

    • Evaluate the accumulated quantum tapes

    • Apply the post-processing functions to the evaluated tape results

Benefits:

  • Quantum and classical processing is cleanly separated; in the future, we can take advantage of devices that support batched execution.

  • Tapes can now be safely shallow copied and parameters mutated without affecting all tapes.

  • The new form of the gradient methods suggests a rough API that we might want to standardize at some point, to make it easier for developers to contribute new gradient methods.

  • When the tapes are copied, we are essentially doing two shallow copies; a shallow copy of the tape structure, and a shallow copy of the operation list. This is much more efficient than attempting to do a recursive deep copy.

  • The reversible method is now very cleanly written as a tape transform with state output, rather than a list of operations and calls to private device methods.

  • The gradient methods are now written in the form tape -> tape(s) + classical processing. In the future, we might want to elevate them as user-facing QNode transforms, for use by advanced users that want to see the generated quantum gradient tapes.

Drawbacks:

  • While removing the device/execution specific logic from the gradient methods was generally pretty smooth, both the CV parameter-shift and reversible backprop methods continue to require access to the device.

    • The CV parameter shift rule requires the device to ensure that the PolyXP observable is supported if second order shifts are required. If not, it falls back to finite-differences.

    • The reversible method requires access to the device's pre-rotated statevector.

    Both methods also require the device wire labeling, as they manipulate observables.

    Currently, we get around this by having the tape.jacobian method simply pass the device as a keyword argument to the gradient methods. We proposing leaving this for now; while inelegant, there is no clear and quick solution. (A long term solution would perhaps be to have the gradient methods register the operations/conditions they support.)

  • The gradient methods continue to act on single parameters in isolation, with tape.jacobian() performing a loop over parameters and attempting to compute fallback methods. This is not ideal, for two reasons:

    • If a particular gradient method has a weird constraint that determines fallback (e.g., the CV param shift rule), the method needs to take it into account, or override tape.jacobian(). This spreads logic for fallback between both tape.jacobian and the gradient method.

    • More importantly, a lot of gradient methods require knowledge from previous parameter gradient calculations; for example, where the gradient of parameter n can be made more efficient by re-using info from the calculation for parameter n-1, n-2, etc.. The current approach, where gradient methods are computed in isolation per gradient method call, does not allow for this.

      First-order finite differences (requiring a pre-execution to determine y0) and the variance parameter-shift methods, are two current examples that are hampered by the existing approach.

    Thus, there is flexibility and efficiency gains in allowing gradient method to generate tapes for all parameters, rather than single parameters in isolation. Of course, a downside is that we lose the ability for users to fine-grain the method manually per parameter. Although, I'm not so convinced this is a common/wanted use-case; it is more likely that users want to fine-grain the method per gate, not per parameter.

@codecov
Copy link

codecov bot commented Oct 7, 2020

Codecov Report

Merging #840 into master will increase coverage by 0.02%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #840      +/-   ##
==========================================
+ Coverage   97.79%   97.82%   +0.02%     
==========================================
  Files         137      137              
  Lines        9207     9358     +151     
==========================================
+ Hits         9004     9154     +150     
- Misses        203      204       +1     
Impacted Files Coverage Δ
pennylane/tape/qnode.py 98.17% <ø> (-0.61%) ⬇️
pennylane/_device.py 96.00% <100.00%> (+0.14%) ⬆️
pennylane/_qubit_device.py 98.87% <100.00%> (+0.04%) ⬆️
pennylane/operation.py 95.69% <100.00%> (+0.24%) ⬆️
pennylane/tape/measure.py 97.14% <100.00%> (+0.26%) ⬆️
pennylane/tape/tapes/cv_param_shift.py 99.34% <100.00%> (+0.13%) ⬆️
pennylane/tape/tapes/jacobian_tape.py 99.21% <100.00%> (+0.23%) ⬆️
pennylane/tape/tapes/qubit_param_shift.py 100.00% <100.00%> (ø)
pennylane/tape/tapes/reversible.py 100.00% <100.00%> (ø)
pennylane/tape/tapes/tape.py 99.25% <100.00%> (+0.04%) ⬆️
... and 1 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 939b80f...8250b54. Read the comment docs.


vec1_indices = ABC[: device.num_wires]
mat = np.reshape(obs.matrix, [2] * len(obs.wires) * 2)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But that now makes sense, I was wondering why the device is used.

One thing I was wondering too is hardware compatibility. Since we can always write a qubit circuit with PaulZ observables (by rotating the basis), and PauliZ is unitary, can't we implement reversible diff by applying the original unitary, observable, and diffed unitary inverse and measure the overlap with the zero state?

pennylane/tape/tapes/reversible.py Outdated Show resolved Hide resolved
mariaschuld and others added 2 commits October 15, 2020 10:15
Co-authored-by: Nathan Killoran <co9olguy@users.noreply.github.com>
# Since they have unique operations, mutating the parameters
# on one tape will *not* affect the parameters on another tape
new_params = [np.array([0, 0]), 0.2]
tape.set_parameters(new_params)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we'll ever hit a point where this nonstandard copying behaviour impacts developers or power users?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. By default, tape.copy() will perform a single shallow copy --- as users or developers would expect. It is only when copy_operations=True (so something the user/developer will have to explicitly request) does the second shallow copy of operations take place. Since this is not default behaviour, I'm not too worried.

I'm more worried about the non-standard behaviour of deep copying. Users/developers expect deep copy to copy everything; however the parameters are not copied (since PyTorch doesn't allow this). I think fixing this is a good thing; it's easy to get everything working with copy.deepcopy(tape), and then realize only later that while autograd and TF work, Torch doesn't.

Perhaps the deep copy could try and copy the parameters, and only skip them if an exception occurs (e.g., Torch)?

tests/tape/tapes/test_tape.py Outdated Show resolved Hide resolved
res2 = copied_tape.execute(dev)

assert np.allclose(res1, res2, atol=tol, rtol=0)
spy.assert_called_once()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome tests 💪


return copied_op

def __deepcopy__(self, memo):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there an issue/forum post to see more context?

pennylane/tape/tapes/cv_param_shift.py Outdated Show resolved Hide resolved
pennylane/tape/tapes/jacobian_tape.py Outdated Show resolved Hide resolved
"""
res0 = np.array(results[0])
res1 = np.array(results[1])
return (res0 - res1) / h
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would this be 2h rather than h?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, should be h here, because the shift is +- h/2 😆

I have to admit, I did a double take, since I'm so used to seeing 2h in the denominator of centered finite differences.

pennylane/tape/tapes/qubit_param_shift.py Outdated Show resolved Hide resolved
josh146 and others added 6 commits October 20, 2020 08:43
Co-authored-by: Tom Bromley <49409390+trbromley@users.noreply.github.com>
* add batch_execute functions to device and qubit_device

* black

* backup

* polish

* make tests pass

* add test for batch_execute

* polish

* polish2

* skip tests if interface not imported

* fix test

* Update pennylane/_device.py

Co-authored-by: Josh Izaac <josh146@gmail.com>

* Josh review

* backup

* reset fix implemented, still 4 cache tests not passing

* backup

* improve tests

* polisg

* Remove tests and update docstrings

* Update pennylane/_qubit_device.py

Co-authored-by: Josh Izaac <josh146@gmail.com>

* Update pennylane/_device.py

Co-authored-by: Josh Izaac <josh146@gmail.com>

* Update tests/test_device.py

Co-authored-by: Josh Izaac <josh146@gmail.com>

* Update tests/test_device.py

Co-authored-by: Nathan Killoran <co9olguy@users.noreply.github.com>

* Update tests/test_device.py

Co-authored-by: Nathan Killoran <co9olguy@users.noreply.github.com>

* Update tests/test_qubit_device.py

Co-authored-by: Nathan Killoran <co9olguy@users.noreply.github.com>

* Update tests/test_qubit_device.py

Co-authored-by: Nathan Killoran <co9olguy@users.noreply.github.com>

* Update tests/test_qubit_device.py

Co-authored-by: Nathan Killoran <co9olguy@users.noreply.github.com>

* Update tests/test_device.py

Co-authored-by: Nathan Killoran <co9olguy@users.noreply.github.com>

* improve tests

* add empty tape tests

* newline

Co-authored-by: Josh Izaac <josh146@gmail.com>
Co-authored-by: trbromley <brotho02@gmail.com>
Co-authored-by: Nathan Killoran <co9olguy@users.noreply.github.com>
@josh146 josh146 merged commit 2f2ea15 into master Oct 20, 2020
@josh146 josh146 deleted the gradient_sprint branch October 20, 2020 11:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants