# Quantum gradients with backpropagation [[Link]](https://pennylane.ai/qml/demos/tutorial_backprop.html)

In PennyLane, anyquantum device, whether a hardware device or a simulator, can be trained using the [parameter-shift rule](https://pennylane.ai/qml/glossary/parameter_shift.html) to compute quantum gradients.

Indeed, the parameter-shift rule is ideally suited to hardware devices, as it does not require any knowledge about the internal workings of the device; 
it is sufficient to treat the device as a 'black box', and to query it with different input values in order to determine the gradient.

However, in real, we do have access to the internal (classical) computations being performed. <br>
This allows us to take advantage of other methods of computing the gradient, such as backpropagation, which may be advantageous in certain regimes. 

In this tutorial, we will compare and contrast the parameter-shift rule against backpropagation, using the PennyLane `default.qubit` device.

## The parameter-shift rule
---

The parameter-shift rule states that, given a variational quantum circuit $U(\boldsymbol{\theta})$ composed of parameterized Pauli rotations, and some measured observable $\hat{B}$, <br>
the derivative of the expectation value $\langle\hat{B}\rangle(\boldsymbol{\theta}) = \langle 0|U(\boldsymbol{\theta})^\dagger\hat{B}U(\boldsymbol{\theta})|0\rangle$ w.r.t. the input circuit parameters $\theta$ is given by
$$
\nabla_{\theta_i}\langle\hat{B}\rangle(\boldsymbol{\theta}) 
= \frac{1}{2}\left[\langle\hat{B}\rangle\left(\boldsymbol{\theta}+\frac{\pi}{2}\hat{\mathbf{e}}_i\right)-\langle\hat{B}\rangle\left(\boldsymbol{\theta}-\frac{\pi}{2}\hat{\mathbf{e}}_i\right)\right].
$$
Thus, the gradient of the expectation value can be calculated by evaluating the same variational quantum circuit, but with shifted parameter values (hence the name, parameter-shift rule!).

Let us have a go implementing the parameter-shift ulre manually in PennyLane.

In [1]:
import pennylane as qml
from pennylane import numpy as np

In [2]:
# set the random seed
np.random.seed(42)

In [3]:
# create a device to execute the circuit on
dev = qml.device("default.qubit", wires=3)

@qml.qnode(dev, diff_method="parameter-shift")
def circuit(params):
    qml.RX(params[0], wires=0)
    qml.RY(params[1], wires=1)
    qml.RZ(params[2], wires=2)

    qml.broadcast(qml.CNOT, wires=[0, 1, 2], pattern="ring")

    qml.RX(params[3], wires=0)
    qml.RY(params[4], wires=1)
    qml.RZ(params[5], wires=2)

    qml.broadcast(qml.CNOT, wires=[0, 1, 2], pattern="ring")
    return qml.expval(qml.PauliY(0) @ qml.PauliZ(2))

Let us test the variational circuit evaluation with some parameter input:

In [4]:
# initial parameters
params = np.random.random([6])

print("Parameters:", params)
print("Expectation value:", circuit(params))

Parameters: [0.37454012 0.95071431 0.73199394 0.59865848 0.15601864 0.15599452]
Expectation value: -0.11971365706871569


Picture of the excuted quantum circuit:

In [5]:
print(circuit.draw())

 0: ──RX(0.375)──╭C─────────────────╭X──RX(0.599)──╭C──────╭X──╭┤ ⟨Y ⊗ Z⟩ 
 1: ──RY(0.951)──╰X──╭C──RY(0.156)──│──────────────╰X──╭C──│───│┤         
 2: ──RZ(0.732)──────╰X─────────────╰C──RZ(0.156)──────╰X──╰C──╰┤ ⟨Y ⊗ Z⟩ 



Now that we have defined our variational circuit QNode, we can construct a function that computes the gradient of the $i$th parameter using the parameter-shift rule.

In [8]:
def parameter_shift_term(qnode, params, i): #ith parameter
    shifted = params.copy()
    shifted[i] += np.pi/2
    forward = qnode(shifted)  # forward evaluation

    shifted[i] -= np.pi
    backward = qnode(shifted) # backward evaluation

    return 0.5 * (forward - backward)

In [9]:
# gradient with respect to the first parameter
print(parameter_shift_term(circuit, params, 0))

-0.06518877224958117


In order to compute the gradient w.r.t. *all* parameters, we need to loop over the index `i`:

In [11]:
def parameter_shift(qnode, params):
    gradients = np.zeros([len(params)])

    for i in range(len(params)):
        gradients[i] = parameter_shift_term(qnode, params, i)

    return gradients

print(parameter_shift(circuit, params))

[-6.51887722e-02 -2.72891905e-02  0.00000000e+00 -9.33934621e-02
 -7.61067572e-01  8.32667268e-17]


We can compare this to PennyLane's *built-in* parameter-shift feature by using the `qml.grad` function. 
Remember, when we defined the QNode, we specified that we wanted it to be differentiable using the parameter-shift method (`diff_method="parameter-shift"`).

In [12]:
grad_function = qml.grad(circuit)
print(grad_function(params))

[-6.51887722e-02 -2.72891905e-02  0.00000000e+00 -9.33934621e-02
 -7.61067572e-01  8.32667268e-17]


If you count the number of quantum evaluations, you will notice that <br>
we had to evaluate the circuit `2*len(params)` number of times in order to compute the quantum gradient w.r.t. all parameters. 

While reasonably fast for a small number of parameters, <br>
as the number of parameters in our quantum circuit grows, so does both
1. the circuit depth (and thus the time taken to evaluate each expectation value or 'forward' pass)
2. the number of parameter-shift evaluations required.

Both of these factors increase the time taken to compute the gradient with respect to all parameters.

## Benchmarking
---

Consider an example with a significantly larger number of parameters. <br>
We'll use the `StronglyEntanglingLayers` template to make a more complicated QNode.

In [13]:
dev = qml.device("default.qubit", wires=4)

@qml.qnode(dev, diff_method="parameter-shift", mutable=False)
def circuit(params):
    qml.templates.StronglyEntanglingLayrs(params, wires=[0, 1, 2, 3])
    return qml.expval(qml.PauliZ(0) @ qml.PauliZ(1) @ qml.PauliZ(2) @ qml.PauliZ(3))