# Quantum gradients

What values of $\theta$, $\phi$, and $\omega$ *minimize* the value of the measurement?

<img src="fig/circuit_3.svg" width=400>

The output value is a function of the input values,

\begin{equation}
\langle Z \rangle = f(\theta, \phi, \omega),
\end{equation}

so we can take its gradient with respect to each of the input parameters.

In [None]:
import pennylane as qml
from pennylane import numpy as np

dev = qml.device('default.qubit', wires=3)

def circuit(theta, phi, omega):
    qml.RX(theta, wires=0)
    qml.RY(phi, wires=1)
    qml.RY(omega, wires=2)
    
    qml.CNOT(wires=[0, 1])
    qml.CNOT(wires=[1, 2])
    qml.CNOT(wires=[2, 0])
    
    return qml.expval(qml.PauliZ(wires=2))

qnode = qml.QNode(circuit, dev)

In [None]:
theta = np.array(0.1)
phi = np.array(0.2)
omega = np.array(0.3)

qml.grad(qnode)(theta, phi, omega)

## Training a quantum circuit

We can learn the optimal values using gradient descent. We first set up a cost function:

In [None]:
def cost(theta, phi, omega):
    return qnode(theta, phi, omega)

Next we set up an optimizer; PennyLane has a number of [built-in optimizers](https://pennylane.readthedocs.io/en/stable/introduction/optimizers.html).

In [None]:
opt = qml.GradientDescentOptimizer(stepsize=0.1)

Now, we iterate...

In [None]:
n_iter = 100

In [None]:
for _ in range(n_iter):
    theta, phi, omega = opt.step(cost, theta, phi, omega)

In [None]:
cost(theta, phi, omega)

Let's take a closer look at what happened

In [None]:
import matplotlib.pyplot as plt

costs = []

opt = qml.GradientDescentOptimizer(stepsize=0.1)

theta = np.array(0.1)
phi = np.array(0.2)
omega = np.array(0.3)

for _ in range(n_iter):
    (theta, phi, omega), cost_val = opt.step_and_cost(cost, theta, phi, omega)
    costs.append(cost_val)

In [None]:
plt.plot(costs)

## Parameter-shift rules

Hardware devices cannot compute gradients analytically; they rely on methods such as parameter-shift rules:

\begin{equation}
\frac{\partial f}{\partial \theta} = c [ f(\theta + s) - f(\theta - s) ]
\end{equation}

In [None]:
theta = np.array(0.1)
phi = np.array(0.2)
omega = np.array(0.3)

qml.grad(qnode)(theta, phi, omega)

Let's try applying parameter shift rules instead:

In [None]:
c = 1 / np.sqrt(2)
s = np.pi / 4

c * (qnode(theta + s, phi, omega) - qnode(theta - s, phi, omega))

It is straightforward in PennyLane to select the gradient method when constructing a QNode:

In [None]:
qnode = qml.QNode(circuit, dev, diff_method='parameter-shift')

qml.grad(qnode)(theta, phi, omega)

## Quantum-aware optimizers

Gradient descent is a general-purpose optimization technique. Other methods exist that leverage the fact that we are optimizing over quantum circuits, are gradient-free, or make use of information from the parameter-shift rules. 

Examples:
 
 - Quantum natural gradient [(``QNGOptimizer``)](https://pennylane.readthedocs.io/en/stable/code/api/pennylane.QNGOptimizer.html)
 - [``RotosolveOptimizer``](https://pennylane.readthedocs.io/en/stable/code/api/pennylane.RotosolveOptimizer.html) / [``RotoselectOptimizer``](https://pennylane.readthedocs.io/en/stable/code/api/pennylane.RotoselectOptimizer.html)
 - [``ShotAdaptiveOptimizer``](https://pennylane.readthedocs.io/en/stable/code/api/pennylane.ShotAdaptiveOptimizer.html)
 

Let's explore the `ShotAdaptiveOptimizer` (a.k.a. Rosalin optimizer). This is useful in a hardware setting where it is beneficial to take as a few shots and measurements as possible. A running average of the gradient and the *variance* of the gradient are stored, and this information is used to create a "shot budget" that distributes shots over different parts of the cost function.

Suppose we want to minimize a cost function involving a linear combination of expectation values:

\begin{equation}
C(\theta, \phi, \omega) = 2 \langle X_1 \rangle + 4 \langle Z_1 \rangle - \langle X_0 X_2 \rangle
\end{equation}


(The minimum cost would be -6, corresponding to the first two expectation values being -1, and the third having a value of 1.)

In [None]:
coeffs = [2, 4, -1]
obs = [
    qml.PauliX(1),
    qml.PauliZ(1),
    qml.PauliX(0) @ qml.PauliX(2)
]

H = qml.Hamiltonian(coeffs, obs)

In [None]:
def circuit(params, wires):
    qml.RX(params[0], wires=0)
    qml.RY(params[1], wires=1)
    qml.RY(params[2], wires=2)
    
    qml.CNOT(wires=[0, 1])
    qml.CNOT(wires=[1, 2])
    qml.CNOT(wires=[2, 0])
    
dev = qml.device('default.qubit', wires=3, shots=1000)    
    
cost = qml.ExpvalCost(circuit, H, dev)

In [None]:
cost([theta, phi, omega])

The shot adaptive optimizer will allocated shots in a way proportional to the size of the coefficients in the cost function.

In [None]:
params = np.array([theta, phi, omega]).copy()

opt = qml.ShotAdaptiveOptimizer(min_shots=10)

In [None]:
for i in range(50):
    params = opt.step(cost, params)
    print(f"Step {i}: cost = {cost(params):.2f},\t shots_used = {opt.shots_used},\t total_shots_used = {opt.total_shots_used}")