# Tutorial 3 - Variational classifier

In Tutorial 3 we show how to use openqml to implement variational quantum classifiers - quantum circuits that can be trained from labelled data how to classify new data samples. The architecture is inspired by Farhi & Neven (2018 arXiv:1802.06002) as well as Schuld, Bocharov, Wiebe and Svore (2018 arXiv:1804.00633). 

We will first show that the variational quantum classifier can reproduce the parity function

$$ f: x \in \{0,1\}^{\otimes n} \rightarrow y = \begin{cases} 1 \text{  if uneven number of ones in } x \\ 0 \text{ else} \end{cases}.$$

This optimization example is supposed to demonstrate how to encode binary inputs into the initial state of the variational circuit, which is simply a computational basis state.

We then show how to encode real vectors as amplitude vectors (*amplitude encoding*) and train the model to recognise the first two classes of flowers in the Iris dataset.

## 1. Fitting the parity function 

### Imports

In [1]:
import openqml as qm
from openqml import numpy as onp
import numpy as np
from openqml.optimize import AdagradOptimizer

### Quantum function

We create a quantum device with four "wires" or qubits.

In [2]:
dev = qm.device('default.qubit', wires=4)

Variational classifiers usually define a "layer" or "block", which is an elementary circuit architecture that gets repeated to build the variational circuit.

<IMage>

Our circuit layer consists of an arbitrary rotation on every qubit, as well as CNOTs that entangle each qubit with its neighbour.

<IMage>

In [3]:
def layer(W):

    qm.Rot(W[0, 0], W[0, 1], W[0, 2], [0])
    qm.Rot(W[1, 0], W[1, 1], W[1, 2], [1])
    qm.Rot(W[2, 0], W[2, 1], W[2, 2], [2])
    qm.Rot(W[3, 0], W[3, 1], W[3, 2], [3])

    qm.CNOT([0, 1])
    qm.CNOT([1, 2])
    qm.CNOT([2, 3])
    qm.CNOT([3, 0])

We also need a way to encode data inputs $x$ into the circuit, so that the measured output depends on the inputs. In this first example, the inputs are bitstrings, which we encode into the state of the qubits. The quantum state $|\psi \rangle $ after state preparation is a computational basis state that has 1's where $x$ has 1s, for example

$$ x = 0101 \rightarrow |\psi \rangle = |0101 \rangle . $$

We use the `BasisState` function, which expects `x` to be a list of zeros and ones, i.e. `[0,1,0,1]`.

*Note: OpenQML wraps the arguemtns of a quantum node, so they cannot be evaluated directly by the user. For example, this will NOT work:*

    for i in range(len(x)):
        if x[i] == 1:
            qm.PauliX([i])


In [4]:
def statepreparation(x):

    qm.BasisState(x, wires=[0, 1, 2, 3])

Now we can elegantly define the quantum node as a state preparation routine, followed by a repitition of the layer structure. Borrowing from machine learning, we call the parameters `weights`.

In [23]:
@qm.qnode(dev)
def circuit(weights, x=None):

    statepreparation(x)

    for W in weights:
        layer(W)

    return qm.expval.PauliZ(0)

Different from previous tutorials, the quantum node takes the data as a keyword argument `x` (with the default value `None`). Keyword arguments of a quantum node are considered as fixed when calculating a gradient, so they are never trained.

If we want to add a "classical" bias parameter, the variational quantum classifer also needs some post-processing. We define the final model by a classical node that uses the first variable, and feeds the remainder into the quantum node. Before this, we reshape the list of remaining variables for easy use in the quantum node.

In [24]:
def variational_classifier(vars, x=None, shape=None):

    weights = onp.reshape(vars[1:], shape)
    outp = circuit(weights, x=x)

    return outp + vars[0]

### Objective

The objective or cost in supervised learning is the sum of a loss and a regularizer. We use the standard square-loss that measures the distance between target labels and model predictions. 

In [7]:
def square_loss(labels, predictions):

    loss = 0
    for l, p in zip(labels, predictions):
        loss += (l-p)**2
    loss = loss/len(labels)

    return loss

To monitor how many inputs the current classifier predicted correctly, we also define the accuracy given target labels and model predictions. 

In [8]:
def accuracy(labels, predictions):

    loss = 0
    for l, p in zip(labels, predictions):
        if abs(l-p) < 1e-5:
            loss += 1
    loss = loss/len(labels)

    return loss

For learning tasks, the cost depends on the data - here the features and labels considered in the iteration of the optimization routine.

In [9]:
def cost(weights, features, labels, shape=None):

    predictions = [variational_classifier(weights, x=f, shape=shape) for f in features]

    return square_loss(labels, predictions) 

### Optimization

Let us load and preprocess some data. 

In [18]:
data = np.loadtxt("parity.txt")
X = data[:, :-1]
Y = data[:, -1]
Y = Y*2 - np.ones(len(Y))  # shift label from {0, 1} to {-1, 1}

for i in range(5):
    print('X = {}, Y = {: d}'.format(X[i], int(Y[i])))
print('...')


X = [0. 0. 0. 0.], Y = -1
X = [0. 0. 0. 1.], Y =  1
X = [0. 0. 1. 0.], Y =  1
X = [0. 0. 1. 1.], Y = -1
X = [0. 1. 0. 0.], Y =  1
...


Initialize the variables randomly. The first variable in the list is used as a bias, while the rest is fed into the gates of the variational circuit.

In [19]:
num_qubits = 4
num_layers = 2
vars_init = 0.01*np.random.randn(num_qubits*3*num_layers+1)
shp = (num_layers, num_qubits, 3)

vars_init

array([-0.00905118, -0.02874512, -0.00481697, -0.00214992,  0.00365692,
       -0.0042674 ,  0.01175576, -0.02070308,  0.0030833 ,  0.00140951,
       -0.00352826, -0.01086024, -0.00698275, -0.00913977, -0.02521236,
       -0.00460899, -0.01045835, -0.00162891, -0.0111162 , -0.00258407,
       -0.00739962, -0.00221939, -0.006418  ,  0.0116421 ,  0.01146174])

Create an optimizer and choose a batch size...

In [20]:
o = AdagradOptimizer(0.5)
batch_size = 5

...and train the optimizer. We track the accuracy - the share of correctly classified data samples. For this we compute the outputs of the variational classifier and turn them into predictions in $\{-1,1\}$ by taking the sign of the output.

In [25]:
vars = vars_init
for it in range(5):

    # Update the weights by one optimizer step
    batch_index = np.random.randint(0, len(X), (batch_size, ))
    X_batch = X[batch_index]
    Y_batch = Y[batch_index]
    vars = o.step(lambda v: cost(v, X, Y, shape=shp), vars)

    # Compute accuracy
    predictions = [np.sign(variational_classifier(vars, x=x, shape=shp)) for x in X]
    acc = accuracy(Y, predictions)

    print("Iter: {:5d} | Cost: {:0.7f} | Accuracy: {:0.7f} "
          "".format(it+1, cost(vars, X, Y), acc))

Iter:     1 | Cost: 1.5312400 | Accuracy: 0.3750000 
Iter:     2 | Cost: 1.1661283 | Accuracy: 0.5000000 
Iter:     3 | Cost: 0.7626209 | Accuracy: 0.8125000 
Iter:     4 | Cost: 0.4389153 | Accuracy: 1.0000000 
Iter:     5 | Cost: 0.0476631 | Accuracy: 1.0000000 


## 2. Iris classification

To encoding real-valued vectors into the amplitudes of a quantum state, we simply overwrite the statepreparation function. Since we use four qubits, the quantum state has 16 amplitudes, and we encode $x$ into the first two qubits and again in the last two qubits. 

To save simulating a lengthy state preparation routine we use a hack: Compute $x \otimes x$ manually and set the amplitude vector to this value. *Note: Since we normalized $x$ earlier, $x \otimes x$ is also normalized!* 

In [None]:
def statepreparation(x):

    qm.QubitStateVector(onp.kron(x, x), wires=[0, 1, 2, 3])

We then load the Iris data set.

In [None]:
data = np.loadtxt("iris_scaled.txt")
X = data[:, :-1]
normalization = np.sqrt(np.sum(X ** 2, -1))
X = (X.T / normalization).T  # normalize each feature vector
Y = data[:, -1]
Y = Y*2 - np.ones(len(Y))  # shift label from {0, 1} to {-1, 1}

for i in range(5):
    print("X= {}, Y = {}".format(X[i], Y[i]))
print("...")

As above, the data is split into training and validation set.

In [None]:
num_data = len(X)
num_train = int(0.75*num_data)
index = np.random.permutation(range(num_data))
X_train = X[index[: num_train]]
Y_train = Y[index[: num_train]]
X_val = X[index[num_train: ]]
Y_val = Y[index[num_train: ]]

Again we optimize the cost.

In [None]:
o = GradientDescentOptimizer(0.01)
batch_size = 3

weights = np.array(weights0)
for iteration in range(1):

    # Update the weights by one optimizer step
    batch_index = np.random.randint(0, num_train, (batch_size, ))
    X_train_batch = X_train[batch_index]
    Y_train_batch = Y_train[batch_index]
    weights = o.step(lambda w: cost(w, X_train_batch, Y_train_batch), weights)

    # Compute predictions on train and validation set
    predictions_train = [np.sign(variational_classifier(weights, x=x)) for x in X_train]
    predictions_val = [np.sign(variational_classifier(weights, x=x)) for x in X_val]

    # Compute accuracy on train and validation set
    acc_train = accuracy(Y_train, predictions_train)
    acc_val = accuracy(Y_val, predictions_val)

    print("Iter: {:5d} | Cost: {:0.7f} | Acc train: {:0.7f} | Acc validation: {:0.7f} "
          "".format(iteration+1, cost(weights, X, Y), acc_train, acc_val))

We can plot the continuous output of the variational classifier for the first two dimensions of the Iris data set.

In [None]:
import matplotlib.pyplot as plt

plt.figure()
cm = plt.cm.RdBu

# make data for decision regions
xx, yy = np.meshgrid(np.linspace(-1.1, 1.1, 20), np.linspace(-1.1, 1.1, 20))
X_grid = [np.array([x, y]) for x, y in zip(xx.flatten(), yy.flatten())]
predictions_grid = [variational_classifier(weights, x=x) for x in X_grid]
Z = np.reshape(predictions_grid, xx.shape)

# plot decision regions
cnt = plt.contourf(xx, yy, Z, levels=np.arange(-1, 1.1, 0.1), cmap=cm, alpha=.8, extend='both')
plt.colorbar(cnt, ticks=[-1, 0, 1])

# plot data
trf0 = [d for i, d in enumerate(X_train) if Y_train[i] == -1]
trf1 = [d for i, d in enumerate(X_train) if Y_train[i] == 1]
plt.scatter([c[0] for c in trf1], [c[1] for c in trf1], c='r', marker='^', edgecolors='k')
plt.scatter([c[0] for c in trf0], [c[1] for c in trf0], c='r', marker='o', edgecolors='k')
tes0 = [d for i, d in enumerate(X_val) if Y_val[i] == -1]
tes1 = [d for i, d in enumerate(X_val) if Y_val[i] == 1]
plt.scatter([c[0] for c in tes1], [c[1] for c in tes1], c='g', marker='^', edgecolors='k')
plt.scatter([c[0] for c in tes0], [c[1] for c in tes0], c='g', marker='o', edgecolors='k')

plt.xlim(-1, 1)
plt.ylim(-1, 1)

In [None]:

def regularizer(weights):
    """L2 Regularizer penalty on weights

    Args:
        weights (array[float]): The array of trainable weights
    Returns:
        float: regularization penalty
    """
    w_flat = weights.flatten()

    # Compute the l2 norm
    reg = onp.abs(onp.inner(w_flat, w_flat))

    return reg