# Variational classifier with PennyLane

This is a [Xanadu tutorial](https://pennylane.ai/qml/demos/tutorial_variational_classifier.html) for PennyLane's library wich has the purpose to implement a variational quantum classifier. Here we will train a quantum circuit with labelled data and then use it to classify new data.

It can be shown that de variational quantum classifier can reproduce the parity function

$$f : x\in \{0,1\} ^{\otimes n} \rightarrow y = \left\{\begin{array}{lll}
1& \text{if uneven number of ones in } x\\
0 &\text{otherwise}
\end{array} \right.$$

I think this parity function is not unique, so then we could construct the function depending on what we would like to classify.

Here it is shown how to encode real vectors as amplitud vectors, this is called amplitude encoding, and with this we can train the model to recognize the first two classes of flowers in the Iris dataset.

--------

We begin by importing the libraries

In [1]:
import pennylane as qml
from pennylane import numpy as np
from pennylane.optimize import NesterovMomentumOptimizer

In [2]:
# We now initialice a quantum circuit with 4 qubits
dev = qml.device("default.qubit", wires = 4)

Now we have to define our layer or block. This is an elementary circuit architecture that is repeated to build the variational circuit. This layers can be modified according on what we need. The Layer proposed in this guide consist on arbitrary rotations on each qubit as well as CNOT gates that entangle each qubit with its neightbours.

In [14]:
def layer(W):
    #qml.Rot(\phi,\theta,\omega,wire)
    qml.Rot(W[0, 0], W[0, 1], W[0, 2], wires=0)
    qml.Rot(W[1, 0], W[1, 1], W[1, 2], wires=1)
    qml.Rot(W[2, 0], W[2, 1], W[2, 2], wires=2)
    qml.Rot(W[3, 0], W[3, 1], W[3, 2], wires=3)

    qml.CNOT(wires=[0, 1])
    qml.CNOT(wires=[1, 2])
    qml.CNOT(wires=[2, 3])
    qml.CNOT(wires=[3, 0])

Now we have to find the way to encode data inputs $x$ into the circuit, then, we have to define a quantum state of qubits which represents the $x$ imput. In this case the inputs are bitstrings, so directly they define
$$ x = 0101 \rightarrow |\psi\rangle = |0101\rangle $$
There is a function called BasisState. Which prepares n wires in a given list of zeros and ones.

In [15]:
def statePreparation(x):
    #x is a np.array, for example [0,0,1,1]
    #the function prepares the wires in the state |0011>
    
    qml.BasisState(x, wires=[0, 1, 2, 3])

Now a Qnode is defined as a routine for the state preparation, followed by a repetition of the layer structure. So, given a determined amount of weights (which are the analogous of the $\theta$ parameters) the layer is repeated using each of this weights.

In [16]:
@qml.qnode(dev)
def circuit(weights,x):
    
    
    statePreparation(x)
    
    for W in weights:
        layer(W)
        
    return qml.expval(qml.PauliZ(0))

In this case we are giving to the Qnode the data as a keyword argument x. This Keyword arguments are fixed when calculating a gradient, this means that the traning is done over the weights!!

Also we can add a "classical" bias, this is the amount of error from the algorithm to predict some value, it is important to take into account that low bias does not necessarily means that the algorithm is good, since when there is an overfitted algorithm the bias tends to 0. So now we built up the classifier, var will have the variables that we need, this is the weights and the bias, then we put all together in the ciruit to generate the classification. 

In [17]:
def variational_classifier(var, x):
    
    weights = var[0]
    bias = var[1]
    return circuit(weights, x) + bias


## Cost

Now we calculate the cost function or the Residual Sum Squares (RSS), this is the parameter that shows the error between the predicted and the real value of a datapoint first we calculate the swuare loss between real labels and predictions

In [18]:
def square_loss(labels, predictions):
    loss = 0
    for l, p in zip(labels, predictions):
        loss = loss + (l - p) ** 2

    loss = loss / len(labels)
    return loss

Now we can define de accuracy function, this function will give us the percentage of correctly predicted labels 


In [19]:
def accuracy(labels, predictions):

    loss = 0
    for l, p in zip(labels, predictions):
        if abs(l - p) < 1e-5:
            loss = loss + 1
    loss = loss / len(labels)

    return loss
    

Now we calculate the cost function as the square loss between the real labels and the predictions from the algorithm

In [20]:
def cost(var, X, Y):
    predictions = [variational_classifier(var, x) for x in X]
    return square_loss(Y, predictions)


# Optimization

Here we are going to load the data from the file `variational_classifier/data/parity.txt`. This data has five columns where, the first four contain the information of the stat, this means the $x$ that we define earlier, and the fifth column is $1$ if $x$ has an uneven numer of ones or $0$ otherwise 

In [21]:
data = np.loadtxt("variational_classifier/data/parity.txt")
X = np.array(data[:, :-1], requires_grad=False)
Y = np.array(data[:, -1], requires_grad=False)
Y = Y * 2 - np.ones(len(Y))  # shift label from {0, 1} to {-1, 1}

for i in range(5):
    print("X = {}, Y = {: d}".format(X[i], int(Y[i])))

print("...")

X = [0. 0. 0. 0.], Y = -1
X = [0. 0. 0. 1.], Y =  1
X = [0. 0. 1. 0.], Y =  1
X = [0. 0. 1. 1.], Y = -1
X = [0. 1. 0. 0.], Y =  1
...


Now we initialize random variables in order to test the algorithm. Here we generate random numbers where the first one is the bias and the other are going to be the weights.

In [22]:
np.random.seed(0)
num_qubits = 4
num_layers = 2
var_init = (0.01 * np.random.randn(num_layers, num_qubits, 3), 0.0)

print(var_init)

(tensor([[[ 0.01764052,  0.00400157,  0.00978738],
         [ 0.02240893,  0.01867558, -0.00977278],
         [ 0.00950088, -0.00151357, -0.00103219],
         [ 0.00410599,  0.00144044,  0.01454274]],

        [[ 0.00761038,  0.00121675,  0.00443863],
         [ 0.00333674,  0.01494079, -0.00205158],
         [ 0.00313068, -0.00854096, -0.0255299 ],
         [ 0.00653619,  0.00864436, -0.00742165]]], requires_grad=True), 0.0)


Next step is to create the optimizer, this is provide by PannyLane as  [`NesterovMomentumOptimizer`](https://pennylane.readthedocs.io/en/stable/code/api/pennylane.NesterovMomentumOptimizer.html#pennylane.NesterovMomentumOptimizer). This is a class that optimize the gradien-descent algorithm using what is called the Nesterov momentum. Then we define the batch_size and train the optimizer.

In [23]:
opt = NesterovMomentumOptimizer(0.5)
batch_size = 5

Here we are going to look at the accuracy of the algorithm, taking the correctly classified data. For this we first compute what should be the correct output and then we take the prediction as follows

In [26]:
var = var_init

for it in range(35):
        
        batch_index = np.random.randint(0, len(X), (batch_size,))
        # Take 5 random arrays of x, i.e. the form [0,1,0,0]
        X_batch = X[batch_index]
        
        # Has the answers of the previous x
        Y_batch = Y[batch_index]
        
        # Computes the cost function in order to optimize it
        var = opt.step(lambda v: cost(v, X_batch, Y_batch), var)
        
        # Compute the accuracy
        predictions = [np.sign(variational_classifier(var, x)) for x in X]
        acc = accuracy(Y, predictions)

        
        print(
        "Iter: {:5d} | Cost: {:0.7f} | Accuracy: {:0.7f} ".format(
            it + 1, cost(var, X, Y), acc
        )
        )
        

Iter:     1 | Cost: 1.9978697 | Accuracy: 0.5000000 
Iter:     2 | Cost: 2.6040860 | Accuracy: 0.5000000 
Iter:     3 | Cost: 1.9514070 | Accuracy: 0.5000000 
Iter:     4 | Cost: 1.2102827 | Accuracy: 0.5000000 
Iter:     5 | Cost: 0.6132715 | Accuracy: 0.8750000 
Iter:     6 | Cost: 0.5385945 | Accuracy: 0.7500000 
Iter:     7 | Cost: 0.1777492 | Accuracy: 1.0000000 
Iter:     8 | Cost: 0.0799840 | Accuracy: 1.0000000 
Iter:     9 | Cost: 0.0952246 | Accuracy: 1.0000000 
Iter:    10 | Cost: 0.0450807 | Accuracy: 1.0000000 
Iter:    11 | Cost: 0.0380732 | Accuracy: 1.0000000 
Iter:    12 | Cost: 0.0224965 | Accuracy: 1.0000000 
Iter:    13 | Cost: 0.0261320 | Accuracy: 1.0000000 
Iter:    14 | Cost: 0.0124017 | Accuracy: 1.0000000 
Iter:    15 | Cost: 0.0289288 | Accuracy: 1.0000000 
Iter:    16 | Cost: 0.0104481 | Accuracy: 1.0000000 
Iter:    17 | Cost: 0.0148988 | Accuracy: 1.0000000 
Iter:    18 | Cost: 0.0103778 | Accuracy: 1.0000000 
Iter:    19 | Cost: 0.0066370 | Accuracy: 1.00

As you can see. The cost function which is related to the diference between the predictions and the true labels decrease along the iterations, showing that the algorithm is effectively making better predictions each time. Also we can see the accuracy increasing, showing us, again, that the algorithm is working.