### https://pennylane.ai/qml/demos/tutorial_backprop.html

### parameter shift

gradient of the expectation value of some measured observable on a circuit composed of parameterised Pauli rotations can be expressed via parameter shift. closed form solution involves an analytic difference of shifted parameters multiplied by the expectation

In [2]:
import numpy as np
import pennylane as qml


qml.enable_tape()

In [2]:
np.random.seed(42)
dev = qml.device("default.qubit", wires=3)

@qml.qnode(dev, diff_method="parameter-shift")
def circuit(params):
    qml.RX(params[0], wires=0)
    qml.RY(params[1], wires=1)
    qml.RZ(params[2], wires=2)

    qml.broadcast(qml.CNOT, wires=[0, 1, 2], pattern="ring")

    qml.RX(params[3], wires=0)
    qml.RY(params[4], wires=1)
    qml.RZ(params[5], wires=2)

    qml.broadcast(qml.CNOT, wires=[0, 1, 2], pattern="ring")
    return qml.expval(qml.PauliY(0) @ qml.PauliZ(2))

In [6]:
print(circuit.draw())

 0: ──RX(0.0581)──╭C──────────────────╭X──RX(0.708)──╭C──────╭X──╭┤ ⟨Y ⊗ Z⟩ 
 1: ──RY(0.866)───╰X──╭C──RY(0.0206)──│──────────────╰X──╭C──│───│┤         
 2: ──RZ(0.601)───────╰X──────────────╰C──RZ(0.97)───────╰X──╰C──╰┤ ⟨Y ⊗ Z⟩ 



the ring pattern defines the order of 2-qubit gates (here the final gate is between the last and first wires)

In [5]:
params = np.random.random([6])

print("Parameters:", params)
print("Expectation value:", circuit(params))

Parameters: [0.05808361 0.86617615 0.60111501 0.70807258 0.02058449 0.96990985]
Expectation value: -0.014055527571701393


In [7]:
grad_function = qml.grad(circuit)
print(grad_function(params)[0])

[-1.11146214e-02 -5.87918653e-04 -5.55111512e-17 -1.50168133e-02
 -6.82724681e-01  2.77555756e-17]


defining the diff_method as "parameter-shift" specified that the qnode should be differentiable

### benchmarking

observing the scaling behaviours of parameter-shift c.f. other methods

In [8]:
dev = qml.device("default.qubit", wires=4)

@qml.qnode(dev, diff_method="parameter-shift", mutable=False)
def circuit(params):
    qml.templates.StronglyEntanglingLayers(params, wires=[0, 1, 2, 3])
    return qml.expval(qml.PauliZ(0) @ qml.PauliZ(1) @ qml.PauliZ(2) @ qml.PauliZ(3))

mutable=False means that the circuit structure cannot be changed, and reduces processing overhead

In [9]:
params = qml.init.strong_ent_layers_normal(n_wires=4, n_layers=15)
print(params.size)
print(circuit(params))

180
0.9195547012778201


In [10]:
import timeit

reps = 3
num = 10
times = timeit.repeat("circuit(params)", globals=globals(), number=num, repeat=reps)
forward_time = min(times) / num

print(f"Forward pass (best of {reps}): {forward_time} sec per loop")

Forward pass (best of 3): 0.006087124199996197 sec per loop


In [11]:
grad_fn = qml.grad(circuit)
circuit.qtape = None

times = timeit.repeat("grad_fn(params)", globals=globals(), number=num, repeat=reps)
backward_time = min(times) / num

print(f"Gradient computation (best of {reps}): {backward_time} sec per loop")

Gradient computation (best of 3): 2.498047869000004 sec per loop


* parameter-shift requires 2p evaluations for p parameters, reverse-mode autodifferentiation requires just one
* the tradeoff is increased memory usage
* inability of storing intermediate results on quantum devices disables the use of backpropagation 
* for now we are bottlenecked to use of parameter-shift on NISQ regime devices

diff_method of "backprop" only supported by devices with applicable backends e.g. default.qubit has backends written in TensorFlow and AutoGrad

In [3]:
import tensorflow as tf

device = qml.device("default.qubit", wires=4)

In [5]:
@qml.qnode(device, diff_method="backprop", interface="tf")
def circuit(params):
    qml.templates.StronglyEntanglingLayers(params, wires=[0, 1, 2, 3])
    return qml.expval(qml.PauliZ(0) @ qml.PauliZ(1) @ qml.PauliZ(2) @ qml.PauliZ(3))

# initialize circuit parameters
params = qml.init.strong_ent_layers_normal(n_wires=4, n_layers=15)
params = tf.Variable(params)
print(circuit(params))

tf.Tensor(0.9302922703918127, shape=(), dtype=float64)


In [6]:
import timeit

reps = 3
num = 10
times = timeit.repeat("circuit(params)", globals=globals(), number=num, repeat=reps)
forward_time = min(times) / num
print(f"Forward pass (best of {reps}): {forward_time} sec per loop")

Forward pass (best of 3): 0.11127507390000062 sec per loop


In [7]:
with tf.GradientTape(persistent=True) as tape:
    res = circuit(params)

times = timeit.repeat("tape.gradient(res, params)", globals=globals(), number=num, repeat=reps)
backward_time = min(times) / num
print(f"Backward pass (best of {reps}): {backward_time} sec per loop")

Backward pass (best of 3): 0.16511801840000118 sec per loop


we get extra overhead from TensorFlow for forward passes, but backwards passes are an order of magnitude better than using parameter-shift

it can be seen from the end of the tutorial link that backprop does not scale linearly with the number of parameters, while parameter-shift does. this makes a huge difference when implementing larger variational circuits