# Quantum Encoding and Variational Classifier

In this notebook, two different methods of data encoding are compared in their implementation and performance. These encoding methods are Angle and Amplitude Encoding, respectively. These methods are then used in the training and testing of a mock dataset in a Variational Quantum Classifier. This work is done using the PennyLane SDK along with Pandas and SKLearn for the initial data processing.

In [1]:
# Initialisation Cell

import pandas as pd
from pennylane import numpy as np
from sklearn.preprocessing import normalize

import pennylane as qml
from pennylane.templates.embeddings import BasisEmbedding, AngleEmbedding, AmplitudeEmbedding
from pennylane.optimize import NesterovMomentumOptimizer, AdamOptimizer

import time
start = time.time()

Now that the appropriate libraries have been imported, the data can be loaded.

## Exploratory Data Analysis

Due to the datasets being arbitrary mock sets, there isn't much data analysis required, but rather just the surface-level analysis of the dimensions of the data.

In [2]:
train_set = pd.read_csv('mock_train_set.csv')
test_set = pd.read_csv('mock_test_set.csv')
np.random.seed(42)

In [3]:
train_set.head()

Unnamed: 0,0,1,2,3,4
0,2789.26,1000.0,10.0,20.0,0.0
1,4040.01,1000000.0,1.0,1.0,1.0
2,2931.2,10000.0,10000.0,40.0,1.0
3,3896.54,10000.0,100000.0,30.0,1.0
4,982.06,100.0,1000.0,75.0,0.0


In [4]:
test_set.head()

Unnamed: 0,0,1,2,3,4
0,2988.55,10000.0,10000.0,75.0,1.0
1,3413.8,1.0,100.0,90.0,0.0
2,3891.52,1.0,1.0,5.0,0.0
3,4514.99,10000.0,1000000.0,25.0,1.0
4,752.29,10.0,10.0,90.0,0.0


In [5]:
train_set.shape

(300, 5)

In [6]:
test_set.shape

(120, 5)

In [7]:
x_train = train_set[['0', '1', '2', '3']]
y_train = train_set[['4']]

x_test = test_set[['0', '1', '2', '3']]
y_test = test_set[['4']]

In [8]:
print("Train Shape: {}\nTest Shape : {}".format(x_train.shape, x_test.shape))

Train Shape: (300, 4)
Test Shape : (120, 4)


In [9]:
data = normalize(x_train)
print(data)

[[9.41304389e-01 3.37474595e-01 3.37474595e-03 6.74949190e-03]
 [4.03997703e-03 9.99991839e-01 9.99991839e-07 9.99991839e-07]
 [2.02952793e-01 6.92388075e-01 6.92388075e-01 2.76955230e-03]
 ...
 [9.99879222e-01 2.39044858e-04 2.39044858e-04 1.55379158e-02]
 [2.97282715e-01 9.54741758e-01 9.54741758e-03 9.54741758e-05]
 [4.54021346e-02 9.94010701e-02 9.94010701e-01 8.94609631e-04]]


## Quantum Model

The data has been loaded and formatted in normalised dataframes, so now the quantum part of the project can begin.

Variational quantum classifiers follow a very simple framework to produce the output predictions from input data. The model begins with an encoding block, which is how the input data is converted to be stored in qubits. After the encoding, there is a block of rotation and entangling gates, which contain the parameters (weights) which define the dials of the model and how the qubits communicate through correlation. The final step is to measure the qubits in the computational basis to have an output expectation value corresponding to the classification task. The dataset in this project has a binary target, which makes the measurement a simple correspondence to the prediction.

The first method of encoding used is that of Angle Encoding, which converts normalised data into rotations of a set of qubits by an angle reflecting the magnitude of each data entry. The number of qubits required scales with the number of input dimensions for each data entry. For example, the dataset in this project has 4 dimensions of input data, which requires 4 qubits to encode, with each block being a parallel set of rotation gates of the normalised data, repeated in series for each entry in the set.

In [10]:
# Angle Encoding

num_qubits = data.shape[1]

dev = qml.device('default.qubit', wires = num_qubits)

@qml.qnode(dev)
def circuit(parameters, data):
    # Apply Hadamards to all qubits in the circuit
    for i in range(num_qubits):
        qml.Hadamard(wires = i)
    
    AngleEmbedding(features = data, wires = range(num_qubits), rotation = 'Y')
    
    qml.StronglyEntanglingLayers(weights = parameters, wires = range(num_qubits))
    
    return qml.expval(qml.PauliZ(0))

Following the encoding of data, there is the process of parameterising the circuit. This is done through a set of rotation gates with the parameters of weights which the model needs to train. The number of rotation gates corresponds to the number of qubits used, however that is typically not enough complexity for a good model, so these blocks of rotation gates are repeated. In between these rotation blocks, there are sets of entangling CNOT gates, to fully make use of quantum interference and correlation. The architecture of these parameterisation blocks is arbitrary and depends on the model being built, so in this case the `StronglyEntanglingLayers` format is used, which can be found here: https://pennylane.readthedocs.io/en/stable/code/api/pennylane.StronglyEntanglingLayers.html

Now, the weights and bias of the ML model can be initialised as random values with the appropriate shape for the quantum circuit.

In [11]:
num_layers = 8
weights_init = 0.01 * np.random.randn(num_layers, num_qubits, 3, requires_grad=True)
bias_init = np.array(0.0, requires_grad=True)

print(weights_init, bias_init)

[[[ 0.00496714 -0.00138264  0.00647689]
  [ 0.0152303  -0.00234153 -0.00234137]
  [ 0.01579213  0.00767435 -0.00469474]
  [ 0.0054256  -0.00463418 -0.0046573 ]]

 [[ 0.00241962 -0.0191328  -0.01724918]
  [-0.00562288 -0.01012831  0.00314247]
  [-0.00908024 -0.01412304  0.01465649]
  [-0.00225776  0.00067528 -0.01424748]]

 [[-0.00544383  0.00110923 -0.01150994]
  [ 0.00375698 -0.00600639 -0.00291694]
  [-0.00601707  0.01852278 -0.00013497]
  [-0.01057711  0.00822545 -0.01220844]]

 [[ 0.00208864 -0.0195967  -0.01328186]
  [ 0.00196861  0.00738467  0.00171368]
  [-0.00115648 -0.00301104 -0.01478522]
  [-0.00719844 -0.00460639  0.01057122]]

 [[ 0.00343618 -0.0176304   0.00324084]
  [-0.00385082 -0.00676922  0.00611676]
  [ 0.01031     0.0093128  -0.00839218]
  [-0.00309212  0.00331263  0.00975545]]

 [[-0.00479174 -0.00185659 -0.01106335]
  [-0.01196207  0.00812526  0.0135624 ]
  [-0.0007201   0.01003533  0.00361636]
  [-0.0064512   0.00361396  0.01538037]]

 [[-0.00035826  0.01564644 -

In [12]:
circuit(weights_init, data[0])

tensor(-0.01118875, requires_grad=True)

The quantum circuit for the Variational Quantum Classifier has now been created, and the classical component of the model needs to be constructed. The output of the quantum circuit needs to be converted back to a classical value, which can then be used as an input to a cost function to measure how close the generated predictions are to the actual training set targets. The cost function used here is a simple square loss, which is standard convention in most machine learning models.

In [13]:
def variational_classifier(weights, bias, x):
    return circuit(weights, x) + bias

In [14]:
def square_loss(labels, predictions):
    loss = 0
    for l, p in zip(labels, predictions):
        loss = loss + (l - p) ** 2

    loss = loss / len(labels)
    return loss

In [15]:
def accuracy(labels, predictions):

    loss = 0
    for l, p in zip(labels, predictions):
        if abs(l - p) < 1e-5:
            loss = loss + 1
    loss = loss / len(labels)

    return loss

In [16]:
def cost(weights, bias, X, Y):
    predictions = [variational_classifier(weights, bias, x) for x in X]
    return square_loss(Y, predictions)

To make things easier for the classifier, the target data needs to be shifted to align better with the output of the quantum circuit. The quanutm circuit as a binary classifier will return expectation values of the final qubit states which are projections onto the Z-axis of a Block Sphere, which has a positive or negative value. So, the data can be interpreted as falling into one of two classes defined by 1 and -1, indicating their "spin orientation" which can be expressed in the binary (0, 1) target values by shifting each 0 to become -1.

In [17]:
Y = np.array(y_train.values[:,0] * 2 - np.ones(len(y_train.values[:,0])), requires_grad = False)  # shift label from {0, 1} to {-1, 1}
X = np.array(data, requires_grad=False)

for i in range(5):
    print("X = {}, Y = {: d}".format(list(X[i]), int(Y[i])))

X = [tensor(0.94130439, requires_grad=False), tensor(0.3374746, requires_grad=False), tensor(0.00337475, requires_grad=False), tensor(0.00674949, requires_grad=False)], Y = -1
X = [tensor(0.00403998, requires_grad=False), tensor(0.99999184, requires_grad=False), tensor(9.99991839e-07, requires_grad=False), tensor(9.99991839e-07, requires_grad=False)], Y =  1
X = [tensor(0.20295279, requires_grad=False), tensor(0.69238808, requires_grad=False), tensor(0.69238808, requires_grad=False), tensor(0.00276955, requires_grad=False)], Y =  1
X = [tensor(0.03874291, requires_grad=False), tensor(0.09942901, requires_grad=False), tensor(0.99429008, requires_grad=False), tensor(0.00029829, requires_grad=False)], Y =  1
X = [tensor(0.69790787, requires_grad=False), tensor(0.07106571, requires_grad=False), tensor(0.71065706, requires_grad=False), tensor(0.05329928, requires_grad=False)], Y = -1


For the training of the circuit, an optimisation function is required to tweak the weights of the circuit in the direction which minimises the difference between the predictions and the training targets. The choice of optimiser is once again an arbitrary choice which depends on the model at hand, and in this case the Adam Optimiser was chosen.

In [18]:
opt = AdamOptimizer(stepsize=0.1, beta1=0.9, beta2=0.99, eps=1e-08)
batch_size = 5

The model can now be trained in batches of data, to improve efficiency, while the results are recorded and the best performing weight sets are saved as the model parameters.

In [19]:
weights = weights_init
bias = bias_init

wbest = 0
bbest = 0
abest = 0

for it in range(20):

    # Update the weights by one optimizer step
    batch_index = np.random.randint(0, len(X), (batch_size,))
    X_batch = X[batch_index]
    Y_batch = Y[batch_index]
    weights, bias, _, _ = opt.step(cost, weights, bias, X_batch, Y_batch)

    # Compute accuracy
    predictions = [np.sign(variational_classifier(weights, bias, x)) for x in X]
    
    if accuracy(Y, predictions) > abest:
        wbest = weights
        bbest = bias
        abest = accuracy(Y, predictions)
        print('New best')
    
    acc = accuracy(Y, predictions)

    print(
        "Iter: {:5d} | Cost: {:0.7f} | Accuracy: {:0.7f} ".format(
            it + 1, cost(weights, bias, X, Y), acc
        )
    )

New best
Iter:     1 | Cost: 0.9293968 | Accuracy: 0.5233333 
Iter:     2 | Cost: 0.8903067 | Accuracy: 0.5233333 
New best
Iter:     3 | Cost: 0.8323824 | Accuracy: 0.8000000 
Iter:     4 | Cost: 0.7628541 | Accuracy: 0.6600000 
Iter:     5 | Cost: 0.7338487 | Accuracy: 0.5733333 
Iter:     6 | Cost: 0.7772821 | Accuracy: 0.5233333 
Iter:     7 | Cost: 0.6650366 | Accuracy: 0.7866667 
Iter:     8 | Cost: 0.6862241 | Accuracy: 0.7833333 
Iter:     9 | Cost: 0.7203829 | Accuracy: 0.6666667 
New best
Iter:    10 | Cost: 0.7064974 | Accuracy: 0.8233333 
New best
Iter:    11 | Cost: 0.7261132 | Accuracy: 0.8300000 
Iter:    12 | Cost: 0.6396183 | Accuracy: 0.8300000 
Iter:    13 | Cost: 0.5921352 | Accuracy: 0.8266667 
Iter:    14 | Cost: 0.5848600 | Accuracy: 0.8133333 
New best
Iter:    15 | Cost: 0.5624528 | Accuracy: 0.8366667 
Iter:    16 | Cost: 0.6011162 | Accuracy: 0.8300000 
Iter:    17 | Cost: 0.5742933 | Accuracy: 0.8300000 
Iter:    18 | Cost: 0.5638761 | Accuracy: 0.8233333 
I

Layers nor optimizer make any difference, accuracy peaks at ~83

In [20]:
Yte = np.array(y_test.values[:,0] * 2 - np.ones(len(y_test.values[:,0])), requires_grad = False)
Xte = np.array(normalize(x_test), requires_grad=False)

In [21]:
predictions = [np.sign(variational_classifier(wbest, bbest, x)) for x in Xte]
pred = [np.sign(variational_classifier(wbest, bbest, x)) for x in X]
acc = accuracy(Yte, predictions)

print(f'Cost: {cost(wbest, bbest, Xte, Yte)}, Accuracy: {np.round(acc, 2) * 100}%')

Cost: 0.2821542255428317, Accuracy: 100.0%


In [22]:
pd.DataFrame((predictions, Yte), ('Predictions', 'Test')).T

Unnamed: 0,Predictions,Test
0,1.0,1.0
1,-1.0,-1.0
2,-1.0,-1.0
3,1.0,1.0
4,-1.0,-1.0
...,...,...
115,1.0,1.0
116,-1.0,-1.0
117,-1.0,-1.0
118,-1.0,-1.0


As can be seen, the training process reached a plateau at an accuracy score of ~83%, which is not ideal but still good enough to work with. The optimal parameters are shown to be very effective when generating predictions for the test set, which has an accuracy of 100%. This shows perfect operation of the trained model on the test set. The model could be improved by batching and testing on different combinations of train/test sets, however this mock dataset is just being used to show the process of building a Variational Quantum Classifier, so it serves its purpose.

For the second model, a nearly identical process is followed, with the only change being in the data encoding scheme which is this time the Amplitude Encoding method. This method makes use of the exponential gain in information size of qubits vs classical bits, as it can encode $2^n$ features into an amplitude vector of $n$ qubits. Each quantum state has a normalised probability of being in each of its component states, so in this method the features of the data set can be encoded into the wavefunction of the qubits as the associated probabilities.

In [23]:
# Amplitude Encoding

num_qubits = int(np.log2(data.ravel().shape)) + 1

dev = qml.device('default.qubit', wires = num_qubits)

@qml.qnode(dev)
def circuit(parameters, data):
    
    AmplitudeEmbedding(features = data, wires = range(num_qubits), normalize = True, pad_with = 0)
    
    qml.StronglyEntanglingLayers(weights = parameters, wires = range(num_qubits))
        
    return qml.expval(qml.PauliZ(0))

In [24]:
num_layers = 4
weights_init = 0.01 * np.random.randn(num_layers, num_qubits, 3, requires_grad=True)
bias_init = np.array(0.0, requires_grad=True)

print(weights_init, bias_init)

[[[-7.51179425e-03 -1.13042805e-02  7.69977360e-03]
  [ 1.26838952e-02  4.24486244e-03  9.40535585e-03]
  [-8.67641090e-03  1.45861852e-03 -1.36987106e-02]
  [-7.71780749e-03  8.78673548e-03 -2.39594508e-03]
  [ 1.20938197e-02  5.37960004e-03  2.73442216e-02]
  [ 9.37654302e-04 -1.40640527e-02 -3.45296561e-04]
  [-9.63015626e-03  9.77180001e-03  4.19800645e-04]
  [-1.37271817e-03 -1.24132389e-03  7.40340818e-03]
  [-4.52462232e-03  7.77049795e-03  1.04557117e-02]
  [-3.42141713e-03 -9.26046619e-03 -5.12965318e-03]
  [ 7.10109240e-03  9.24788716e-04  6.30074939e-03]]

 [[ 1.76293747e-02  2.30953723e-03 -8.08936891e-03]
  [ 1.05742351e-02  5.13608595e-04  8.72447158e-03]
  [ 1.06619854e-02 -9.59008096e-03  1.38200460e-02]
  [ 9.05121960e-03 -6.03904372e-03  3.04449120e-03]
  [ 2.57207491e-03  2.39318145e-04  8.71913989e-03]
  [ 1.43735633e-02  7.30637274e-05  1.33088133e-02]
  [ 9.88202611e-03  2.32296158e-03  1.76180922e-03]
  [-1.15256537e-02 -1.50076839e-02  1.65022798e-03]
  [-8.5592

In [25]:
circuit(weights_init, data[0])

tensor(0.77070738, requires_grad=True)

In [26]:
opt = AdamOptimizer(stepsize=0.5, beta1=0.9, beta2=0.99, eps=1e-08)
batch_size = 5

In [27]:
weights = weights_init
bias = bias_init

wbest = 0
bbest = 0
abest = 0

for it in range(20):

    # Update the weights by one optimizer step
    batch_index = np.random.randint(0, len(X), (batch_size,))
    X_batch = X[batch_index]
    Y_batch = Y[batch_index]
    weights, bias, _, _ = opt.step(cost, weights, bias, X_batch, Y_batch)

    # Compute accuracy
    predictions = [np.sign(variational_classifier(weights, bias, x)) for x in X]
    
    if accuracy(Y, predictions) > abest:
        wbest = weights
        bbest = bias
        abest = accuracy(Y, predictions)
        print('New best')
    
    acc = accuracy(Y, predictions)

    print(
        "Iter: {:5d} | Cost: {:0.7f} | Accuracy: {:0.7f} ".format(
            it + 1, cost(weights, bias, X, Y), acc
        )
    )

New best
Iter:     1 | Cost: 1.3722308 | Accuracy: 0.4766667 
Iter:     2 | Cost: 1.0724018 | Accuracy: 0.4766667 
Iter:     3 | Cost: 0.9818780 | Accuracy: 0.4766667 
New best
Iter:     4 | Cost: 1.0404262 | Accuracy: 0.5233333 
Iter:     5 | Cost: 1.0376057 | Accuracy: 0.5233333 
Iter:     6 | Cost: 1.0839862 | Accuracy: 0.5233333 
Iter:     7 | Cost: 1.0133862 | Accuracy: 0.5233333 
Iter:     8 | Cost: 1.0152050 | Accuracy: 0.5233333 
Iter:     9 | Cost: 0.9669390 | Accuracy: 0.5233333 
Iter:    10 | Cost: 0.9753207 | Accuracy: 0.5233333 
Iter:    11 | Cost: 0.9721757 | Accuracy: 0.5233333 
New best
Iter:    12 | Cost: 0.9844758 | Accuracy: 0.5733333 
New best
Iter:    13 | Cost: 0.9722606 | Accuracy: 0.8300000 
Iter:    14 | Cost: 0.9581019 | Accuracy: 0.8133333 
Iter:    15 | Cost: 0.9698022 | Accuracy: 0.4766667 
Iter:    16 | Cost: 1.0200429 | Accuracy: 0.4766667 
Iter:    17 | Cost: 0.9890963 | Accuracy: 0.4766667 
Iter:    18 | Cost: 1.0314200 | Accuracy: 0.4766667 
Iter:    1

In [28]:
Yte = np.array(y_test.values[:,0] * 2 - np.ones(len(y_test.values[:,0])), requires_grad = False)
Xte = np.array(normalize(x_test), requires_grad=False)

In [29]:
predictions = [np.sign(variational_classifier(wbest, bbest, x)) for x in Xte]
pred = [np.sign(variational_classifier(wbest, bbest, x)) for x in X]
acc = accuracy(Yte, predictions)

print(f'Cost: {cost(wbest, bbest, Xte, Yte)}, Accuracy: {np.round(acc, 2) * 100}%')

Cost: 0.9548300323278923, Accuracy: 99.0%


In [30]:
pd.DataFrame((predictions, Yte), ('Predictions', 'Test')).T

Unnamed: 0,Predictions,Test
0,1.0,1.0
1,-1.0,-1.0
2,-1.0,-1.0
3,1.0,1.0
4,-1.0,-1.0
...,...,...
115,1.0,1.0
116,-1.0,-1.0
117,-1.0,-1.0
118,-1.0,-1.0


A very similar performance is seen in this model, although with slight differences. Firstly, more qubits are required as the necessary circuit size grows as $\log_2(n)$ with the data entries. Additionally, less entangling layers and weight parameters are required to achieve similar accuracies in training and test validation to make up for the wider quantum circuit required. Techincially, the Angle Encoding model had slightly better performance, however both performed very well and are satisfactory binary classifiers. In practice, the angle encoding model would also be preferable simply due to it requiring fewer qubits, however this is only really a problem of NISQ devices which have limits of the number of qubits which can effectively be used in circuits of modern devices.

This concludes the bare-bones version of this project, which effectively implements two methods of data encoding into quantum circuits and builds nearly equally successful Variational Quantum Classifiers on a mock dataset. The models can expression of the data/results can be significantly improved in future, but for now this does the job.

In [31]:
end = time.time()
totaltime = end - start

mins = int(np.round(totaltime % 60))
secs = int(np.round((totaltime % 60 - mins) * 60))

print(f'Execution time: {mins}m{secs}s')

Execution time: 6m1s
