Variational classifiers can be used to classify and categorise data into their respective groups.

Classical variational classifiers are used commonly today. In this notebook we implement a quantum variational classifier, using quantum circuits and training them to categorise new data. These quantum variational classifiers can be used for supervised machine learning.

We will work on one dataset, learning the parity function, which will also show how to use basis encoding.


# Parity

Import necessary libraries:

In [1]:
import pennylane as qml
from pennylane import numpy as np
from pennylane.optimize import AdamOptimizer

Initiate our device:

In [2]:
dev = qml.device("default.qubit")

We first encode our data. This problem needs a basis encoding state preparation (since it is basic qubit states with 1 and 0 e.g. |1100>).

Pennylane's `BasisState()` method can help us encode our data with basis encoding:

In [3]:
def state_prep(x):
    qml.BasisState(x, wires=range(4))

And now we have to define our quantum ansatze. 

An ansatze in general, is kind of like an assumption about the form of an unknown function. Here, it is a set of parametrised quantum gates used to train a quantum machine learning algorithm, or exclusively here, a variational quantum algorithm (VQA).

Our parity function is a simple problem, therefore a set of rotation gates and entangling such as R or RZ-RY-RZ (as the tutorial implements) are enough; but based on [this paper](http://arxiv.org/pdf/2111.13730), we will implement an RX-CX-RX ansatz and a more efficient RX-RZ-CX ansatz with alternating entanglement. This latter ansatz has more effective parameters to train, and is shown to reduce error by ~95% compared to RX-CX-RX.

We will implement all; please run each cell separately to try them out:

In [4]:
# R ansatz (tutorial implementation)
def layer(layer_weights):
    for wire in range(4):
        qml.Rot(*layer_weights[wire], wires=wire)

    for wires in ([0, 1], [1, 2], [2, 3], [3, 0]):
        qml.CNOT(wires)

In [5]:
# RX-CX-RX ansatz
def layer(layer_weights):
    for wire in range(4):
        qml.RX(layer_weights[wire][0], wires=wire)

    for wires in ([0, 1], [2, 3]):
        qml.CNOT(wires)

    for wire in range(4):
        qml.RX(layer_weights[wire][1], wires=wire)

    for wires in ([1, 2], [3, 0]):
        qml.CNOT(wires)

    for wire in range(4):
        qml.RX(layer_weights[wire][2], wires=wire)

In [6]:
# RX-RZ-CX ansatz
def layer(layer_weights):
    for wire in range(4):
        qml.RX(layer_weights[wire][0], wires=wire)
        qml.RZ(layer_weights[wire][0], wires=wire)

    for wires in ([0, 1], [2, 3]):
        qml.CNOT(wires)

    for wire in range(4):
        qml.RX(layer_weights[wire][1], wires=wire)
        qml.RZ(layer_weights[wire][1], wires=wire)

    for wires in ([1, 2], [3, 0]):
        qml.CNOT(wires)

    for wire in range(4):
        qml.RX(layer_weights[wire][2], wires=wire)
        qml.RZ(layer_weights[wire][2], wires=wire)

We then define our quantum circuit. In pennylane we have to make our circuit a "QNode" so that it differs from a normal python function.

Our circuit will prepare the state (basis encoding) then for the number of layers, add our ansatz layer, and at last measure the expected value:

In [7]:
@qml.qnode(dev)
def circuit(weights, x):
    state_prep(x)

    for layer_weights in weights:
        layer(layer_weights)

    return qml.expval(qml.PauliZ(0))

Our full model will be our circuit from the previous cell, plus a "classical bias" term (trainable) as a form of post-processing for the circuit:

In [8]:
def variational_classifier(weights, bias, x):
    return circuit(weights, x) + bias

Now we have to define our loss and metric for the model to be evaluated and trained upon. We will define MSE and Accuracy:

In [9]:
def square_loss(labels, predictions):
    return np.mean((labels - qml.math.stack(predictions)) ** 2)

In [10]:
def accuracy(labels, predictions):
    acc = sum(abs(label - pred) < 1e-4 for label, pred in zip(labels, predictions))
    acc = acc / len(labels)
    return acc

And now our cost function (cost function simply is the loss function applied to all of the predictions, while loss function is applied to each prediction separately):

In [11]:
def cost(weights, bias, X, Y):
    predictions = [variational_classifier(weights, bias, x) for x in X]
    return square_loss(Y, predictions)

We will load our data and split them into X (input) and Y (label). We also do some preprocessing (shifting labels from [0, 1] to [-1, 1]) so that label distinguishing can be more significant for the model:

In [12]:
data = np.loadtxt("variational_classifier/data/parity_train.txt", dtype=int)
X = np.array(data[:, :-1])
Y = np.array(data[:, -1])
Y = Y * 2 - 1

for x, y in zip(X, Y):
    print(f"x = {x}, y = {y}")

x = [0 0 0 1], y = 1
x = [0 0 1 0], y = 1
x = [0 1 0 0], y = 1
x = [0 1 0 1], y = -1
x = [0 1 1 0], y = -1
x = [0 1 1 1], y = 1
x = [1 0 0 0], y = 1
x = [1 0 0 1], y = -1
x = [1 0 1 1], y = 1
x = [1 1 1 1], y = -1


Now to initialise some random weights and our bias. We put `require_grad` to `True` to let the optimiser know these are the parameters we want to train.

We can change the hyperparameters (num_layers, weights_init, bias_init) as we like:

In [13]:
np.random.seed(17)
num_qubits = 4
num_layers = 3
weights_init = 0.02 * np.random.randn(num_layers, num_qubits, 3, requires_grad=True)
bias_init = np.array(0.7, requires_grad=True)

print("Weights:", weights_init)
print("Bias: ", bias_init)

Weights: [[[ 0.00552532 -0.03709256  0.01247802]
  [ 0.02290623  0.02074381  0.03773278]
  [-0.00223397 -0.00724203  0.0029735 ]
  [-0.00875566  0.04342514  0.02304621]]

 [[-0.03637625 -0.00276099  0.01079679]
  [-0.03550565  0.02629753 -0.00946896]
  [-0.0218446  -0.00500055 -0.01964589]
  [ 0.02062538  0.00982668 -0.00893293]]

 [[-0.0161272   0.00262536 -0.0242512 ]
  [ 0.00319982 -0.01510446  0.00699792]
  [ 0.01955084 -0.0027717   0.00207713]
  [ 0.00601182  0.01936411  0.01739248]]]
Bias:  0.7


We initiate our optimiser, which is an Adam optimiser, and our batch size to feed data to our model cost:

In [14]:
opt = AdamOptimizer(0.3)
batch_size = 4

And now we train our model. The optimiser will update the weights and bias to acquire the best possible parameters, and then we make predictions and compute the accuracy each iteration/epoch. We can increase or decrease the number of iterations to change the learnability of the model. 

Note that increasing the iterations too much might lead to overfitting, or too few iterations causes underfitting:

In [15]:
weights = weights_init
bias = bias_init
for iteration in range(50):

    batch_index = np.random.randint(0, len(X), (batch_size,))  # random batch of data based on batch size
    X_batch = X[batch_index]
    Y_batch = Y[batch_index]

    weights, bias = opt.step(cost, weights, bias, X=X_batch, Y=Y_batch)  # updating weights and bias

    train_preds = [np.sign(variational_classifier(weights, bias, x)) for x in X]  # predictions based on the new weights and bias

    # compute new cost and accuracy
    le_cost = cost(weights, bias, X, Y)
    train_acc = accuracy(Y, train_preds)

    print(f"Iter: {iteration+1:4d} | Cost: {le_cost:0.7f} | Accuracy: {train_acc:0.7f}")

Iter:    1 | Cost: 1.4913523 | Accuracy: 0.5000000
Iter:    2 | Cost: 1.2636099 | Accuracy: 0.4000000
Iter:    3 | Cost: 1.2582003 | Accuracy: 0.4000000
Iter:    4 | Cost: 0.9442554 | Accuracy: 0.6000000
Iter:    5 | Cost: 0.9396397 | Accuracy: 0.7000000
Iter:    6 | Cost: 0.8886298 | Accuracy: 0.7000000
Iter:    7 | Cost: 0.9455230 | Accuracy: 0.7000000
Iter:    8 | Cost: 0.8868692 | Accuracy: 0.7000000
Iter:    9 | Cost: 0.8266910 | Accuracy: 0.8000000
Iter:   10 | Cost: 0.8261738 | Accuracy: 0.6000000
Iter:   11 | Cost: 0.8971381 | Accuracy: 0.6000000
Iter:   12 | Cost: 0.9153667 | Accuracy: 0.8000000
Iter:   13 | Cost: 0.8212975 | Accuracy: 0.7000000
Iter:   14 | Cost: 0.7611873 | Accuracy: 0.6000000
Iter:   15 | Cost: 0.8117961 | Accuracy: 0.7000000
Iter:   16 | Cost: 1.0210193 | Accuracy: 0.6000000
Iter:   17 | Cost: 1.1843587 | Accuracy: 0.6000000
Iter:   18 | Cost: 1.1922951 | Accuracy: 0.6000000
Iter:   19 | Cost: 1.0539110 | Accuracy: 0.6000000
Iter:   20 | Cost: 1.0469186 | 

As we can see, the model already achieved perfect accuracy and low cost early in ~15-25 iterations onwards (with R ansatz). This shows our choice of ansatz and optimiser were pretty good.

Results of different ansatzes:
- R ansatz:         Final accuracy 100%, Final cost 0.0088925   (faster model. pretty amazing ansatz!)
- RX-CX-RX ansatz:  Final accuracy 50%, Final cost 1.0372201    (alternating accuracy between 50% & 60%, cost between [0.9, 1.2]; not a good ansatz!)
- RX-RZ-CX ansatz:  Final accuracy 100%, Final cost 0.1980340   (mixed results. sometimes better than all, sometimes worse that R; but always better than RX-CX-RX.)

In summary, it seems if we want a good model, better to go with the R ansatz. But we'll test the models first then conclude.

Do note that the hyperparameters we choose also affect the results.

And finally we test our model on some test data to see how it does on unseen data. Here we also do the same preprocessing as in train data, and then run our model and calculate predictions and accuracy on unseen data:

In [16]:
data = np.loadtxt("variational_classifier/data/parity_test.txt", dtype=int)
X_test = np.array(data[:, :-1])
Y_test = np.array(data[:, -1])
Y_test = Y_test * 2 - 1

test_preds = [np.sign(variational_classifier(weights, bias, x)) for x in X_test]

for x, y, pred in zip(X_test, Y_test, test_preds):
    print(f"x = {x}, y = {y},   pred = {pred}")

test_acc = accuracy(Y_test, test_preds)
print("Accuracy on unseen data:", test_acc)

x = [0 0 0 0], y = -1,   pred = 1.0
x = [0 0 1 1], y = -1,   pred = 1.0
x = [1 0 1 0], y = -1,   pred = 1.0
x = [1 1 1 0], y = 1,   pred = 1.0
x = [1 1 0 0], y = -1,   pred = -1.0
x = [1 1 0 1], y = 1,   pred = 1.0
x = [1 0 0 0], y = 1,   pred = 1.0
x = [1 1 1 1], y = -1,   pred = -1.0
x = [0 1 0 1], y = -1,   pred = 1.0
x = [0 0 1 0], y = 1,   pred = 1.0
x = [0 1 0 0], y = 1,   pred = -1.0
Accuracy on unseen data: 0.5454545454545454


- R ansatz: 100% accuracy
- RX-CX-RX ansatz: 45% accuracy
- RX-RZ-CX ansatz: mixed accuracy

Therefore to conclude, our best choice of ansatz would be R, but we can also use RX-RZ-CX as its accuracy and cost are similar to R, but take into account the speed, as R was the fastest of all.

RX-CX-RX ansatz was the worst, so it's better not to use it.