Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for the parameter-shift hessian to the Torch interface #1129

Merged
merged 17 commits into from
Mar 26, 2021

Conversation

josh146
Copy link
Member

@josh146 josh146 commented Mar 9, 2021

Context:

The Torch custom gradient interface currently returns a non-differentiable gradient.

Description of the Change:

Modifies the Torch interface so that the Jacobian also defines a custom gradient --- the vector-Hessian product --- where the Hessian is computed by an efficient implementation of the parameter-shift rule.

Torch QNodes can now be doubly differentiated, on both simulators and hardware:

import pennylane as qml
from pennylane import numpy as np
import torch

dev = qml.device('default.qubit', wires=1)

@qml.qnode(dev, interface='torch', diff_method="parameter-shift")
def circuit(p):
    qml.RY(p[0], wires=0)
    qml.RX(p[1], wires=0)
    return qml.probs(0)

x = torch.tensor([1.0, 2.0], requires_grad=True)
jac_fn = lambda x: torch.autograd.functional.jacobian(circuit, x, create_graph=True)
>>> circuit(x)
tensor([0.3876, 0.6124], dtype=torch.float64, grad_fn=<SqueezeBackward0>)
>>> jac_fn(x)
tensor([[ 0.1751, -0.2456],
        [-0.1751,  0.2456]], grad_fn=<ViewBackward>)
>>> torch.autograd.functional.jacobian(jac_fn, x)
tensor([[[ 0.1124,  0.3826],
         [ 0.3826,  0.1124]],
        [[-0.1124, -0.3826],
         [-0.3826, -0.1124]]])

In addition, a slight tweak was made to the Jacobian/Hessian evaluation logic in order to avoid redundant Jacobian/Hessian computations.

Benefits:

  • Torch QNodes now support double derivatives on hardware and software

  • The Jacobian/Hessian is now only computed once for given input parameters, and re-used for multiple VJPs/VHPs.

Possible Drawbacks:

  • 3rd derivatives and higher are not supported. We could potentially support higher derivatives using recursion.

  • Currently, Jacobian parameter-shifts are not being re-used for the Hessian parameter-shifts. This requires more thinking.

  • I realised while writing this PR that the parameter-shift Hessian logic is based on the formula given here: https://arxiv.org/abs/2008.06517. However, this formula assumes all gates support the 2-term parameter-shift; this is not the case of the controlled rotation. Therefore, computing the Hessian of the controlled rotations will give the incorrect result!

  • Long term, we should consider implementing parameter-shift logic as follows:

    • Low level, we provide functions for computing vjp and vhp directly. This avoids redundant parameter-shift evaluations.
    • Parameter-shifts for the same value are cached and re-used
    • Recursion for arbitrary order derivatives.

Related GitHub Issues: n/a

@github-actions
Copy link
Contributor

github-actions bot commented Mar 9, 2021

Hello. You may have forgotten to update the changelog!
Please edit .github/CHANGELOG.md with:

  • A one-to-two sentence description of the change. You may include a small working example for new features.
  • A link back to this PR.
  • Your name (or GitHub username) in the contributors section.

Comment on lines 733 to 738
if self.output_dim is not None:
hessian = np.zeros(
(len(params), len(params), np.prod(self.output_dim)), dtype=float
)
else:
hessian = np.zeros((len(params), len(params)), dtype=float)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A small modification to allow Hessians to be computed for vector-valued QNodes

@codecov
Copy link

codecov bot commented Mar 9, 2021

Codecov Report

Merging #1129 (388a0be) into master (2069021) will increase coverage by 0.00%.
The diff coverage is 100.00%.

Impacted file tree graph

@@           Coverage Diff           @@
##           master    #1129   +/-   ##
=======================================
  Coverage   98.11%   98.12%           
=======================================
  Files         144      144           
  Lines       10794    10813   +19     
=======================================
+ Hits        10591    10610   +19     
  Misses        203      203           
Impacted Files Coverage Δ
pennylane/interfaces/torch.py 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2069021...388a0be. Read the comment docs.

Copy link
Contributor

@trbromley trbromley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @josh146! I have just left some questions mainly for my understanding, once clarified I'll be happy to approve.

>>> jacobian(circuit, x)
tensor([[ 0.1751, -0.2456],
[-0.1751, 0.2456]], grad_fn=<ViewBackward>)
>>> hessian(circuit, x)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really cool!
So when it's the hessian of a vector output, are we just lining up the individual Hessian matrices? E.g. like here. I guess this is pretty standard.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep! With a disclaimer that I'm not entirely sure how pytorch is ordering the dimensions here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The output below agrees with TF and Autograd, but should double check this

@@ -2,6 +2,39 @@

<h3>New features since last release</h3>

* Computing second derivatives and Hessians of QNodes is now supported when
using the PyTorch interface.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So for my understanding, we'd also like to add for other interfaces, but Torch is the first one we're doing (maybe because it's easier)?

So this is a continuation of #961?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've got #1131 (autograd) and #1110 (tf) also open :) Just decided to split it into three to help with code review. But feel free to review the others as well if interested!

Comment on lines 65 to 67
* Takes advantage of closure, to cache computed gradient matrices via
the ctx.saved_grad_matrices attribute, to avoid gradient matrices being
computed multiple redundant times.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool!

and *not* the full GradMatrix, differentiating vector-valued
functions will result in multiple backward passes.
"""
if grad_matrix_fn in ctx.saved_grad_matrices:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we make saved_grad_matrices an attribute of ctx rather than standalone? Is ctx.saved_grad_matrices used somewhere behind the scenes?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried this, for some reason without using the context PyTorch doesn't 'see' the saved_grad_matrices closure variable 🤔 I chalked it up to PyTorch doing something funky with the class behind the scenes

return jacobian

@staticmethod
def backward(ctx_, ddy): # pragma: no cover
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we turn off coverage?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even though we have gradient tests for the torch interface, for some reason pycoverage can't see that the backward() method is called 🤔 So it will show up as not covered otherwise.

pennylane/tape/interfaces/torch.py Outdated Show resolved Hide resolved
@@ -527,6 +528,149 @@ def circuit():
assert isinstance(res[0], torch.Tensor)
assert isinstance(res[1], torch.Tensor)

def test_hessian(self, dev_name, diff_method, mocker, tol):
"""Test hessian calculation of a scalar valued QNode"""
if diff_method not in {"parameter-shift", "backprop"}:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So if diff_method="backprop" it still calculates the hessian using param-shift? Or it just uses a backprop hessian?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The torch interface doesn't yet support backprop, but this should be possible soon (since Torch1.8 supports differentiating complex matrix multiplication, see #929).

Note that this class is not parametrized over diff_method="backprop", so that case never occurs here. I left in the if statement just as a reminder that backprop should also be tested once #929 is merged.

Do you think we should delete this if statement for now?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, so in this test we'll currently not see diff_method="backprop", but hopefully will soon? Kind of feels like the case where we should set up a separate test fail using "backprop" and then mark that as a strict xfail.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we set the xfail flag inside an if statement?

Comment on lines 658 to 671
expected_hess = [
[
[-np.cos(a) * np.cos(b), np.sin(a) * np.sin(b)],
[np.sin(a) * np.sin(b), -np.cos(a) * np.cos(b)]
],
[
[-0.5 * np.cos(a) * np.cos(b), 0.5 * np.sin(a) * np.sin(b)],
[0.5 * np.sin(a) * np.sin(b), -0.5 * np.cos(a) * np.cos(b)]
],
[
[0.5 * np.cos(a) * np.cos(b), -0.5 * np.sin(a) * np.sin(b)],
[-0.5 * np.sin(a) * np.sin(b), 0.5 * np.cos(a) * np.cos(b)]
]
]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the qnode is ragged, but the Hessian is a more standard (3, 2, 2) shape? Is this what PyTorch would expect?
E.g., not something like [hessian1, hessian2] with hessian1 of shape (2, 2) and hessian2 of shape (2, 2, 2)?

I don't mind either way, just wondering.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a very good point.

Currently, if the QNode returns 'ragged' output (e.g., expval + probs, or multiple probs on different wires), we are flattening the output due to a very ancient restriction we had that QNodes always return NumPy arrays. This restriction was made back when we only had an Autograd interface to support, and no probs.

However, it's probably time we updated this:

  • If the QNode return statement returns a tuple, the user should get a tuple as output, not a NumPy array.
  • This avoids the whole issue of ragged arrays altogether

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would actually be a cool thing to sort out now-ish, before usage of return expval, probs etc. becomes entrenched (at the moment, it's quite rare)

Copy link
Member Author

@josh146 josh146 Mar 19, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got curious, and coded this up.

TensorFlow

import pennylane as qml
import tensorflow as tf

dev = qml.device("default.qubit", wires=2)

@qml.qnode(dev, interface="tf")
def circuit(x):
    qml.RX(x, wires=0)
    return qml.expval(qml.PauliZ(0)), qml.probs(wires=[0, 1])

x = tf.Variable(0.5, dtype=tf.float64)

with tf.GradientTape() as tape:
    res = circuit(x)

where:

>>> print(res)
(<tf.Tensor: shape=(), dtype=float64, numpy=0.8775825618903726>, <tf.Tensor: shape=(4,), dtype=float64, numpy=array([0.93879128, 0.        , 0.06120872, 0.        ])>)

Unfortunately, TF does not let you call Jacobian on a tuple :(

Traceback (most recent call last):
  File "test.py", line 54, in <module>
    grad = tape.jacobian(res, x)
  File "/home/josh/miniconda3/envs/37/lib/python3.7/site-packages/tensorflow/python/eager/backprop.py", line 1166, in jacobian
    target_static_shape = target.shape
AttributeError: 'tuple' object has no attribute 'shape'

You can do grad = [tape.jacobian(r, x) for r in res], but it is slower since you lose vectorization.

If your cost function is scalar valued though, everything works well:

with tf.GradientTape() as tape:
    expval, probs = circuit(x)
    res = expval + tf.reduce_sum(probs)

>>>print(res)
tf.Tensor(1.8775825618903725, shape=(), dtype=float64)
>>> grad = tape.jacobian(res, x)
>>> print(grad)
tf.Tensor(-0.479425538604203, shape=(), dtype=float64)

Torch

I think this is fully possible, but can't get the backwards call working. Sounds like it would result in a much more complicate torch backward(grad_output) method, since we can no longer assume that grad_output is flat - it will now have to match the 'shape' of the forward pass output tuple.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @josh146, interesting investigation! Looks like returning a tuple is more expected, but harder to support with the different interfaces. In which case, I'm happy to keep it as is here and maybe we come back to think about this question.

Do we expect this use case to become common, e.g., users regularly requesting return expval, probs? In practice, I've typically requested probs in isolation.

Copy link
Member Author

@josh146 josh146 Mar 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think returning a tuple can be supported on all interfaces, it just requires a bit more work. But definitely doable.

I've typically requested probs in isolation.

agree.

There is one case I am worried about though:

return expval(..), expval(..)

Currently this returns a length-2 numpy array. Under the new suggestion above, this would return a tuple of numpy scalars. I think this is more intuitive from a UI perspective, but would be a very invasive change.

If the user wants an array as output, they could do something like?

return np.array([expval(...), expval(...)])

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is one case I am worried about though

Good point, although I'd worry that making such a change may be annoying since probably most use cases involve treating the output as a numpy array even if it was technically specified as a tuple return.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd worry that making such a change may be annoying

It might not be... indexing res[0] will continue to work with a tuple, and NumPy functions such as np.sum(res) also work with tuples. It might... be okay?

Definitely having the QNode always return a tuple will be better for long term flexibility

Copy link
Contributor

@trbromley trbromley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @josh146! 💯

josh146 and others added 2 commits March 26, 2021 00:07
Copy link
Contributor

@albi3ro albi3ro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very excited to have this functionality!

It passes tests, adds functionality, and has no obvious issues, so I approve.

@josh146 josh146 merged commit ede35f4 into master Mar 26, 2021
@josh146 josh146 deleted the hessian_torch branch March 26, 2021 16:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants