# Variational classifier 

## Fitting the parity function

In [1]:
import pennylane as qml
from pennylane import numpy as np
from pennylane.optimize import NesterovMomentumOptimizer

Declare a quantum device with 4 qubits

In [2]:
dev = qml.device("default.qubit", wires=4)

### Variational circuit
Now we can define a function for the generic layer of the classifier circuit (building block). In the following, $W$ represents the matrix of weights. The row represents the qubit, while the column is the parameter index.

In [3]:
def layer(W):

    qml.Rot(W[0, 0], W[0, 1], W[0, 2], wires=0)
    qml.Rot(W[1, 0], W[1, 1], W[1, 2], wires=1)
    qml.Rot(W[2, 0], W[2, 1], W[2, 2], wires=2)
    qml.Rot(W[3, 0], W[3, 1], W[3, 2], wires=3)

    qml.CNOT(wires=[0, 1])
    qml.CNOT(wires=[1, 2])
    qml.CNOT(wires=[2, 3])
    qml.CNOT(wires=[3, 0])
    

### State preparation
This is trivial for this example, because we only need to prepare bitstrings. For this sake, we can use Pennylane's `BasisState` function.

In [4]:
def statepreparation(x):
    qml.BasisState(x, wires=[0, 1, 2, 3])

### Circuit
The circuit can be seen as a 'quantum node'. Here we put the state preparation routine and the variational one in sequence, which gives us the complete circuit

In [5]:
@qml.qnode(dev)
def circuit(weights, x):

    statepreparation(x)

    for W in weights:
        layer(W)

    return qml.expval(qml.PauliZ(0))

### Adding the bias
This must be added classically. We can see this operation as a classical node, that, after the quantum one has evaluated the circuit, adds the bias $b$.

In [6]:
def variational_classifier(var, x):
    weights = var[0]
    bias = var[1]

    return circuit(weights, x) + bias

## Cost function and accuracy

In [7]:
def square_loss(labels, predictions):
    loss = 0

    for l,p in zip(labels, predictions):
        loss += (l - p) ** 2

    loss /= len(labels)

    return loss

In [8]:
def accuracy(labels, predictions):

    acc = 0
    for l, p in zip(labels, predictions):
        if abs(l - p) < 1e-5:
            acc = acc + 1
    acc = acc / len(labels)

    return acc

In machine learning (and in PennyLane), the cost depends on the features (data), on the labels and on the parameters (weights).

In [9]:
def cost(var, X, Y):
    predictions = [variational_classifier(var, x) for x in X]
    return square_loss(Y, predictions)

## Data
The data as it is in `parity.txt` needs some preprocessing. In particular, it the last entry of every row is the label, and must be differentiated by the others, which are entries of the bitstrings. Also, the labels must be mapped from $\{0,1\}$ to $\{-1,1\}$

In [10]:
data = np.loadtxt("variational_classifier/data/parity.txt")
X = np.array(data[:, :-1], requires_grad=False)
Y = np.array(data[:, -1], requires_grad=False)
Y = Y * 2 - np.ones(len(Y))  # shift label from {0, 1} to {-1, 1}

for i in range(5):
    print("X = {}, Y = {: d}".format(X[i], int(Y[i])))

print("...")

X = [0. 0. 0. 0.], Y = -1
X = [0. 0. 0. 1.], Y =  1
X = [0. 0. 1. 0.], Y =  1
X = [0. 0. 1. 1.], Y = -1
X = [0. 1. 0. 0.], Y =  1
...


## Optimization
### Initialization
The weights must be stored into a tensor

In [11]:
np.random.seed(0)
num_qubits = 4
num_layers = 2
var_init = (0.01 * np.random.randn(num_layers, num_qubits, 3), 0.0)

print(var_init)

(tensor([[[ 0.01764052,  0.00400157,  0.00978738],
         [ 0.02240893,  0.01867558, -0.00977278],
         [ 0.00950088, -0.00151357, -0.00103219],
         [ 0.00410599,  0.00144044,  0.01454274]],

        [[ 0.00761038,  0.00121675,  0.00443863],
         [ 0.00333674,  0.01494079, -0.00205158],
         [ 0.00313068, -0.00854096, -0.0255299 ],
         [ 0.00653619,  0.00864436, -0.00742165]]], requires_grad=True), 0.0)


### Run the optimization

Now we can choose an optimizer and do the training. The latter is done by using data in batches and updating the batch at every step of the optimizer

In [12]:
opt = NesterovMomentumOptimizer(0.5)
batch_size = 5

In [13]:
var = var_init
for it in range(25):

    # Update the weights by one optimizer step
    batch_index = np.random.randint(0, len(X), (batch_size,))
    X_batch = X[batch_index]
    Y_batch = Y[batch_index]
    var = opt.step(lambda v: cost(v, X_batch, Y_batch), var)

    # Compute accuracy
    predictions = [np.sign(variational_classifier(var, x)) for x in X]
    acc = accuracy(Y, predictions)

    print(
        "Iter: {:5d} | Cost: {:0.7f} | Accuracy: {:0.7f} ".format(
            it + 1, cost(var, X, Y), acc
        )
    )

Iter:     1 | Cost: 3.4355534 | Accuracy: 0.5000000 
Iter:     2 | Cost: 1.9287800 | Accuracy: 0.5000000 
Iter:     3 | Cost: 2.0341238 | Accuracy: 0.5000000 
Iter:     4 | Cost: 1.6372574 | Accuracy: 0.5000000 
Iter:     5 | Cost: 1.3025395 | Accuracy: 0.6250000 
Iter:     6 | Cost: 1.4555019 | Accuracy: 0.3750000 
Iter:     7 | Cost: 1.4492786 | Accuracy: 0.5000000 
Iter:     8 | Cost: 0.6510286 | Accuracy: 0.8750000 
Iter:     9 | Cost: 0.0566074 | Accuracy: 1.0000000 
Iter:    10 | Cost: 0.0053045 | Accuracy: 1.0000000 
Iter:    11 | Cost: 0.0809483 | Accuracy: 1.0000000 
Iter:    12 | Cost: 0.1115426 | Accuracy: 1.0000000 
Iter:    13 | Cost: 0.1460257 | Accuracy: 1.0000000 
Iter:    14 | Cost: 0.0877037 | Accuracy: 1.0000000 
Iter:    15 | Cost: 0.0361311 | Accuracy: 1.0000000 
Iter:    16 | Cost: 0.0040937 | Accuracy: 1.0000000 
Iter:    17 | Cost: 0.0004899 | Accuracy: 1.0000000 
Iter:    18 | Cost: 0.0005290 | Accuracy: 1.0000000 
Iter:    19 | Cost: 0.0024304 | Accuracy: 1.00

We see how the training has been successful (accuracy=1). There may be a suspect of overfitting in this case, since we did not use any test set.

## Iris classification