<a href="https://colab.research.google.com/github/MatteoRobbiati/notebooks/blob/main/QTI-QML-tutorial/QML_tutorial.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# QTI-TH Forum: a snapshot of Quantum Machine Learning

In the first part of the notebook we see how to concretely implement the QML ingredients using `qibo`. We need:

1. A parameteric model $\mathcal{M}$;
2. A way to embed input data $x$ into $\mathcal{M}$;
3. A predictor for estimating the output $y$;
4. A loss function $\mathcal{J}$;
5. An optimizer $\mathcal{O}$.

In [None]:
# install qibo
!pip install qibo

In [None]:
# import qibo's packages
import qibo
from qibo import gates, hamiltonians, derivative
from qibo.models import Circuit

# some useful python package
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
sns.set_theme(style='whitegrid', font_scale=1.5)

# to interact with the operating system
import os

# numpy backend is enough for a 1-qubit model
qibo.set_backend('numpy')

### 1. Model $\mathcal{M}$: a variational quantum circuit

We are going to use a variational quantum circuit as Machine Learning model.
In order to do this, we will use the circuit's parameters as variational parameters during the train. 

In [None]:
nqubits = 1
layers = 2

c = Circuit(nqubits)
for q in range(nqubits):
  # an Hadamard gate at the beginning
  c.add(gates.H(q=q))
  # and a sequence of rotation layers as model
  for l in range(layers):
    c.add(gates.RY(q=q, theta=0))
    c.add(gates.RY(q=q, theta=0))
    c.add(gates.RZ(q=q, theta=0))
  c.add(gates.M(0))

print(c.summary())

### 2. Embedding of the input data $x$ into $\mathcal{M}$

This specific sequence of gates is important if we want to implement the following ansatz:

<center><img src="https://github.com/MatteoRobbiati/notebooks/blob/main/QTI-QML-tutorial/images/re-uploading.png?raw=true" alt="drawing" width="800"/></center>


In [None]:
def inject_data(circuit, parameters, x):
  """Prototype function for the embedding of x."""
  params = []
  index = 0
  
  for q in range(nqubits):
    for l in range(layers):
      # embed the first feature
      params.append(parameters[index] * x)
      params.append(parameters[index + 1])
      params.append(parameters[index + 2])
      index += 3

  circuit.set_parameters(params)
  return circuit

In [None]:
# setting new parameters
nparams = len(c.get_parameters())
initial_parameters = np.random.randn(nparams) * 5

### 3. Predictor:  $\hat{y} = \langle \hat{O} \rangle$

Applying the circuit $C(x, \theta)$ to an initial state $ |q_i\rangle \equiv | 0 \rangle$, we get the final state of the qubit $|q_f\rangle$. In this example, we decide to use as predictor of $y$ the expected value of a non-interacting Pauli Z observable.

In [None]:
# define an hamiltonian
# a Pauli-Z
h = hamiltonians.Z(nqubits)

# a dummy value for x
x = 0.2

# set them into the circuit together with an x
c = inject_data(c, initial_parameters, x)

# evaluating E[O]
h.expectation(c.execute(nshots=1000).state())

### 4. MSE loss function: $\mathcal{J}_{mse}$

We will quantify the goodness of our model using:

$$ \mathcal{J}_{mse} = \frac{1}{N_{data}} \sum_{i=1}^{N_{data}}\bigl(y_{meas,i} - y_{est,i}\bigr)^2,$$

Where $y_{est,i}$ is the expected value of $Z$ over the final 1-qubit state once $C(x_i, \theta)$ has been applied:

$$ y_{est,i} = \langle q_f | Z  | q_f \rangle  = \langle 0 | C(x_i,\theta)^{\dagger} Z C(x_i,\theta) | 0 \rangle.  $$

---


### 5. A gradient descent optimization!

We want to perform a gradient descent optimization. For doing this, we need to calculate the gradients of the loss function with respect to the variational parameters. We will use a formula, called **Parameter Shift Rule**, thanks to which we are able to calculate exactly these derivatives by executing the same circuit twice after a proper shift of the target parameter.

---

#### Quick explanation
Let me call $f(\mu)$ the expected value of our observable $Z$ over the final state $|q_f \rangle$ obtained by applying a circuit $C(\theta)$ to an initial state: $|q_f\rangle  \equiv C(\theta)|q_i\rangle$ and such that $\mu \in \theta$:

$$ f(\mu) = \langle q_i | C(\theta)^{\dagger} Z C(\theta)| q_i\rangle. $$

In a few words, if some conditions (you can find reference [here](https://arxiv.org/abs/1811.11184)) are satisfied, we can evaluate the derivative of $f(\mu)$ as follows:

$$ \partial_{\mu} f = r\bigl[ f(\mu^+) - f(\mu^-) \bigr],$$

with $\mu^{\pm}$ are two specific shifted values of $\mu$ (you can find all into the reference).

---

### 5.1 Last step: derivative of $J$

Since we need the derivative of the loss function, we will follow this procedure:

- injecting $x_i$ into the circuit obtaining $C(x_i, \theta)$;
- calculating prediction as $f$;
- for each $\mu \in \theta$ we calculate $f(\mu^{\pm})$;
- use PSR for calculating each $\partial_{\mu}f$.

## Let's put it all together in a variational quantum regressor

In [None]:
class vqregressor:

  def __init__(self, data, labels, layers, nqubits=1):
    """Class constructor."""
    # some general features of the QML model
    self.nqubits = nqubits
    self.layers = layers
    self.data = data
    self.labels = labels

    # initialize the circuit and extract the number of parameters
    self.circuit = self.ansatz(nqubits, layers)
    print(self.circuit.draw())

    # get the number of parameters
    self.nparams = len(self.circuit.get_parameters())
    # set the initial value of the variational parameters
    self.params = np.random.randn(self.nparams)
    # scaling factor for custom parameter shift rule
    self.scale_factors = np.ones(self.nparams)

    # define the observable
    self.h = hamiltonians.Z(nqubits)

# ---------------------------- ANSATZ ------------------------------------------

  def ansatz(self, nqubits, layers):
    """Here we implement the variational model ansatz."""
    c = Circuit(nqubits)
    for q in range(nqubits):
      c.add(gates.H(q=q))
      for l in range(layers):
        c.add(gates.RY(q=q, theta=0))
        c.add(gates.RY(q=q, theta=0))
        c.add(gates.RZ(q=q, theta=0))
    c.add(gates.M(0))

    return c

# --------------------------- RE-UPLOADING -------------------------------------

  def inject_data(self, x):
    """Here we combine x and params in order to perform re-uploading."""
    params = []
    index = 0
    
    for q in range(self.nqubits):
      for l in range(self.layers):
        # embed X
        params.append(self.params[index] * x)
        params.append(self.params[index + 1])
        params.append(self.params[index + 2])
        # update scale factors 
        # equal to x only when x is involved
        self.scale_factors[index] = x
        # we have three parameters per layer
        index += 3

    # update circuit's parameters
    self.circuit.set_parameters(params)


# ------------------------------- PREDICTIONS ----------------------------------

  def one_prediction(self, x):
    """This function calculates one prediction with fixed x."""
    self.inject_data(x)

    return self.h.expectation(self.circuit.execute().state())


  def predict_sample(self):
    """This function returns all predictions."""
    predictions = []
    for x in self.data:
      predictions.append(self.one_prediction(x))

    return predictions


# ------------------------ PERFORMING GRADIENT DESCENT -------------------------


  def circuit_derivative(self):
    """Derivatives of the expected value of the target observable with respect 
    to the variational parameters of the circuit are performed via parameter-shift
    rule (PSR)."""
    dcirc = np.zeros(self.nparams)   
    
    for par in range(self.nparams):
      # read qibo documentation for more information about this PSR implementation
      dcirc[par] = qibo.derivative.parameter_shift(
          circuit = self.circuit, 
          hamiltonian = self.h, 
          parameter_index = par, 
          scale_factor = self.scale_factors[par]
          )
    
    return dcirc


  def evaluate_loss_gradients(self):
    """This function calculates the derivative of the loss function with respect
    to the variational parameters of the model."""

    # we need the derivative of the loss
    # nparams-long vector
    dloss = np.zeros(self.nparams)
    # we also keep track of the loss value
    loss = 0

    # cycle on all the sample
    for x, y in zip(self.data, self.labels):
      # calculate prediction
      prediction = self.one_prediction(x)
      # calculate loss 
      mse = (prediction - y)
      loss += mse**2
      # derivative of E[O] with respect all thetas
      dcirc = self.circuit_derivative()
      # calculate dloss
      dloss += 2 * mse * dcirc

    return dloss, loss/len(self.data)
  

  def gradient_descent(self, learning_rate, epochs):
    """This function performs a full gradient descent strategy."""

    # we create a folder
    os.system("mkdir -p ./live-plotting")
    # we clean it if already exists
    os.system("rm ./live-plotting/*.png")

    # we want to keep track of the loss function
    loss_history = []

    # the gradient descent strategy
    for epoch in range(epochs):
      dloss, loss = self.evaluate_loss_gradients()
      loss_history.append(loss)
      self.params -= learning_rate * dloss
      print(f'Loss at epoch: {epoch + 1} ', loss)

      self.show_predictions(f'Epoch {epoch +1}', save=True)
    
    return loss_history


# ---------------------- PLOTTING FUNCTION -------------------------------------

  def show_predictions(self, title, save=False):
    """This function shows the obtained results through a scatter plot."""

    # calculate prediction
    predictions = self.predict_sample()

    # draw the results
    plt.figure(figsize=(12,8))
    plt.title(title)
    plt.xlabel('x')
    plt.ylabel('y')
    plt.scatter(self.data, self.labels, color='orange', alpha=0.6, label='Original', s=70, marker='o')
    plt.scatter(self.data, predictions, color='purple', alpha=0.6, label='Predictions', s=70, marker='o')

    plt.legend()

    # we save all the images during the training in order to see the evolution
    if save:
      plt.savefig(f'./live-plotting/'+str(title)+'.png')
      plt.close()

    plt.show()

### Generate a sample

We are going to fit a very simple function:

$$ y = \sin(2x).$$

In [None]:
ndata = 30
# random data
data = np.random.uniform(-1, 1, ndata)
# labeling them
labels = np.sin(2*data)

In [None]:
# initialize the QML algorithm
VQR = vqregressor(layers=1, data=data, labels=labels)

In [None]:
# show initial (WRONG) predictions
VQR.show_predictions('Without training')

In [None]:
# set the training hyper-parameters
epochs = 50
learning_rate = 0.025

# perform the training
history = VQR.gradient_descent(learning_rate=learning_rate, epochs=epochs)

In [None]:
# showing loss history
plt.figure(figsize=(10,6))
plt.title('Loss history')
plt.xlabel('Epoch')
plt.ylabel('Loss value')
plt.plot(history, lw=3, c='purple', alpha=0.7)
plt.show()

In [None]:
# final results
VQR.show_predictions('After training')

### Let's visualize the training with a gif

In [None]:
from PIL import Image

images = []

for epoch in range(epochs):
  images.append(Image.open("./live-plotting/Epoch " + str(epoch + 1) + ".png"))

first_image = images[0]
first_image.save("./training.gif", format="GIF", append_images=images,
               save_all=True, duration=100, loop=0)

## A real gradient descent on hardware

<center><img src="https://github.com/MatteoRobbiati/notebooks/blob/main/QTI-QML-tutorial/images/on-hdw.png?raw=true" alt="drawing" width="800"/></center>

Reference: [arXiv:2210.10787](https://arxiv.org/abs/2210.10787).