# Introduction
Quantum computers use quantum mechanical properties such as superposition and entanglement to perform computations. In some cases, these computations can far outperform classical computers. Quantum computers store information in quantum bits which can be in a state of 0, 1, or a superposition of both. After a circuit is run on a quantum computer, the output can only be seen by measuring the quantum bits. When a quantum bit is measured, the measurement returns either 0 or 1. The probability of either outcome depends on the quantum superpositon immediately prior to measurement. The probability of measuring 0 or 1 can be found by repeating the computation and recording the measurement result many times. When using a circuit with multiple quantum bits, the probabilities distribution is over all combinations of outcomes (for two quantum bits, those are 00, 01, 10, and 11). Here, the output of a quantum circuit refers to the probability distribution over the possible measurement outcomes found by running a quantum circuit.

Quantum computers suffer from errors, also called noise. As a result of imperfect computation and measurement, the output of a quantum circuit is never exactly the same as the theory predicts. For example, if the expected output of a single quantum bit circuit is to measure 0 with 30% probability and 1 with 70% probability, the actual output might be to measure 0 with 32% probability and 1 with 68% probability. If that circuit was run again, the noise might be different leading to measuring 0 with 31% probability and 1 with 69% probability. The quantum computer noisy outputs dataset is a collection of outputs from various circuits run on different quantum computers. This dataset can help researchers to understand how noise affects quantum computers.

The dataset contains the output data from running nine circuits 2000 times each on seven different quantum computers. The dataset was generated by running code found on the [Quantum Noise Fingerprint GitHub page](https://github.com/trianam/learningQuantumNoiseFingerprint). The quantum computers are Santiago, Lima, Quito, Bogota, Casablanca, Yorktown, and Athens and were built by IBM. The output from the quantum circuits used in this experiment consist of the probabilities of measuring four possible outcomes: 00, 01, 10, or 11. All of the numerical data points are probabilities, so they lie between 0 and 1. One row of the dataset consists of the four outcome probabilities for the nine circuits totalling 36 features and the output column containing the name of the quantum computer used the run the circuits.

The quantum computers are each affected differently by noise and this leads to unique outcomes for every computer (and every circuit). By studying the distributions accross the nine circuits, a sort of “noise fingerprint” can be found which distinguishes the quantum computers from one another. If a suitable model can be learned, it will be possible to classify which quantum computer is in use just by studying the output of a circuit.

**The classification problem is to label which computer was used to some circuits based on the input statistics about the probability outcomes. Here, quantum kernel estimation is used to train the mode.**

# Pre-processing
## Imports

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

from sklearn.model_selection import train_test_split
from sklearn.decomposition import PCA
from sklearn.svm import SVC 

from qiskit import QuantumCircuit, Aer, execute, IBMQ
from qiskit.providers.ibmq import least_busy
from qiskit.utils import QuantumInstance
from qiskit.circuit.library import ZZFeatureMap
from qiskit_machine_learning.kernels import QuantumKernel
from qiskit.providers.ibmq.runtime.runtime_job import RuntimeJobFailureError

## Load Data

In [2]:
# read in data
df_cat = pd.read_csv('cleaned_QC_data.csv')

# group by circuit
df_gb = df_cat.groupby('Circuit')

# intitiate list dfs for each circuit
list_dfs = []

# do for all 9 circuits
for i in range(9):
    # get group for the current circuit and reset the index
    df_sub = df_gb.get_group(i+1)[['p00','p01','p10','p11']].reset_index(drop=True)
    
    # create new column names which include the circuit number
    col_names = {'p00':'p00_'+str(i+1),'p01':'p01_'+str(i+1),'p10':'p10_'+str(i+1),'p11':'p11_'+str(i+1)}
    df_sub.rename(columns=col_names, inplace=True)
    
    # add df to list
    list_dfs.append(df_sub)

# concatenate groups from list
df = pd.concat(list_dfs, axis=1)

# append target column
df['QC Name'] = df_gb.get_group(1)['QC Name'].reset_index(drop=True)

# shuffle
df = df.sample(frac=1).reset_index(drop=True)

# show
df

Unnamed: 0,p00_1,p01_1,p10_1,p11_1,p00_2,p01_2,p10_2,p11_2,p00_3,p01_3,...,p11_7,p00_8,p01_8,p10_8,p11_8,p00_9,p01_9,p10_9,p11_9,QC Name
0,0.536,0.448,0.007,0.009,0.342,0.271,0.152,0.235,0.177,0.470,...,0.207,0.374,0.191,0.237,0.198,0.299,0.266,0.244,0.191,quito
1,0.482,0.470,0.018,0.030,0.287,0.236,0.234,0.243,0.129,0.364,...,0.172,0.293,0.219,0.262,0.226,0.275,0.289,0.239,0.197,casablanca
2,0.467,0.519,0.006,0.008,0.274,0.232,0.228,0.266,0.119,0.435,...,0.164,0.310,0.215,0.281,0.194,0.291,0.223,0.285,0.201,bogota
3,0.481,0.508,0.006,0.005,0.337,0.283,0.168,0.212,0.183,0.454,...,0.225,0.323,0.252,0.208,0.217,0.337,0.252,0.226,0.185,quito
4,0.785,0.172,0.033,0.010,0.430,0.382,0.090,0.098,0.181,0.374,...,0.183,0.291,0.206,0.274,0.229,0.318,0.268,0.222,0.192,yorktown
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
13995,0.549,0.433,0.007,0.011,0.247,0.271,0.272,0.210,0.108,0.478,...,0.177,0.367,0.192,0.265,0.176,0.275,0.292,0.277,0.156,lima
13996,0.532,0.441,0.020,0.007,0.263,0.250,0.252,0.235,0.097,0.364,...,0.140,0.324,0.199,0.262,0.215,0.301,0.239,0.241,0.219,casablanca
13997,0.499,0.490,0.008,0.003,0.265,0.244,0.250,0.241,0.074,0.460,...,0.166,0.458,0.165,0.225,0.152,0.391,0.236,0.230,0.143,athens
13998,0.492,0.500,0.002,0.006,0.342,0.274,0.149,0.235,0.195,0.448,...,0.195,0.334,0.223,0.230,0.213,0.334,0.233,0.241,0.192,quito


## Batch
There are too many rows to run on a quantum simulator in a reasonable amount of time so a random batch of 200 is selected.

In [3]:
# sample
df = df.sample(n=80).reset_index(drop=True)

## Train-Test Split

In [4]:
# split data (test size is 25%, use random seed for reproducability)
df_tr,df_va = train_test_split(df, test_size=0.25, random_state=0)

X_tr = df_tr.drop(['QC Name'], axis=1).to_numpy()
X_va = df_va.drop(['QC Name'], axis=1).to_numpy()

Y_tr = df_tr['QC Name'].to_numpy()
Y_va = df_va['QC Name'].to_numpy()

# show shapes of train and test inputs and target
print ('training set ==',X_tr.shape,Y_tr.shape,', validation set ==', X_va.shape,Y_va.shape)

training set == (60, 36) (60,) , validation set == (20, 36) (20,)


## Principle Component Analysis (PCA)
PCA is used for dimensionality reduction from 36 features down to 5 so that the program can be run on 5-qubit quantum hardware.

In [5]:
# instantiate PCA for 5 components
pca = PCA(5)

# fit PCA on training data
pca.fit(X_tr)

# transform training and testing data
X_tr_pca = pca.transform(X_tr)
X_va_pca = pca.transform(X_va)

# show shapes of train and test inputs and target
print ('training set ==',X_tr_pca.shape,Y_tr.shape,', validation set ==', X_va_pca.shape,Y_va.shape)

training set == (60, 5) (60,) , validation set == (20, 5) (20,)


# Quantum Kernel Estimation

## Runtinme Program Creation

In [6]:
# load account
IBMQ.load_account()
provider = IBMQ.get_provider(hub = 'ibm-q')
provider.has_service('runtime')

True

In [7]:
program_id = provider.runtime.upload_program(
    data="quantum_kernel_estimation.py",
    metadata="qke_metadata.json"
)
print(program_id)

quantum-kernel-estimation-e3bqAx6oZm


# Quantum Support Vector Classification (SVC)
Using the quantum kernel to determine the distance between points, perform SVC.

In [8]:
backends = provider.backends(filters=lambda x: x.configuration().n_qubits == 5
                                   and not x.configuration().simulator)
backend = least_busy(backends)
print("Backend =", backend)

Backend = ibmq_santiago


In [9]:
# inputs to runtime must be serialized to json
job = provider.runtime.run(program_id, options={"backend_name":backend.name()}, 
                               inputs={"X_tr":X_tr_pca.tolist(), "Y_tr":Y_tr.tolist(), "X_va":X_va_pca.tolist(), "Y_va":Y_va.tolist()})

try:
    q_score = job.result()
    # show validation score
    print("Quantum kernel validation score", q_score)
except RuntimeJobFailureError as ex:
    print("Job failed!: {}".format(ex))

Quantum kernel validation score 0.85


In [10]:
provider.runtime.delete_program(program_id)

# Classical Support Vector Classification (SVC)
SCV is carried out using two different classical kernels for comparison. The linear kernel is simply 
\begin{equation}
    k(x,x') = \langle x, x' \rangle
\end{equation}
and the RBF kernel is 
\begin{equation*}
    k(x,x') =e^{-\gamma \|x - x'\|^2}.
\end{equation*}

In [11]:
# do support vector classification with linear kernel
l_model = SVC(kernel='linear')
l_model.fit(X_tr_pca, Y_tr)
l_score = l_model.score(X_va_pca, Y_va)

# show validation score
print("Linear kernel validation score", l_score)

Linear kernel validation score 0.45


In [12]:
# do support vector classification with rbf kernel (default)
rbf_model = SVC(kernel='rbf')
rbf_model.fit(X_tr_pca, Y_tr)
rbf_score = rbf_model.score(X_va_pca, Y_va)

# show validation score
print("RBF kernel validation score", rbf_score)

RBF kernel validation score 0.85
