## Large-scale quantum machine learning
Supervised learning of datasets with quantum kernels by IBM quantum computer.
Companion script to "Large-scale quantum machine learning" (https://arxiv.org/abs/2108.01039) by Tobias Haug, Chris N. Self and M.S. Kim.
To get quantum kernels, run scripts as detailed in readme or use provided pre-processed data.
The data used in the manuscript is available from https://doi.org/10.5281/zenodo.5211695.
The kernels are used to train a support vector machine (SVM) that classifies unseen testdata using labeled training data.


In [1]:
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn import svm
import os

In [2]:
def train_gram(y_train,y_test,gram_train,gram_test,regularizationC=1):
    """
    Train using a given labels and kernels and evaluate results.
    Uses Support vector machines to learn from data.
    
    y_train: Training labels
    y_test: Test labels
    gram_train: Kernel of training data
    gram_test: Kernel of test data
    regularizationC: Regularization parameter
    
    Return:
    accuracy_train: Accuracy predicting labels of training dataset
    accuracy_test: Accuracy predicting labels of test dataset
    """
    clf_gaussian = svm.SVC(kernel='precomputed',C=regularizationC,probability=True)
    clf_gaussian.fit(gram_train, y_train)

    y_pred_test=clf_gaussian.predict(gram_test)
    
    y_pred_train=clf_gaussian.predict(gram_train)

    wrong_index_train=np.nonzero(np.abs(y_train-y_pred_train))[0]
    wrong_index_test=np.nonzero(np.abs(y_test-y_pred_test))[0]
    
    accuracy_train=1-len(wrong_index_train)/len(y_train)
    accuracy_test=1-len(wrong_index_test)/len(y_test)
    return accuracy_train,accuracy_test


def error_mitigation(kernel_matrix):
    """
    Mitigate error of quantum kernel calculated via randomized measurements.
    Assumes depolarizing model.
    
    kernel_matrix: Quantum kernel as calculated by randomized measurements
    
    kernel_matrix_mitigated: Mitigated quantum kernel
    
    """
    dataset_size=np.shape(kernel_matrix)[0]
    kernel_matrix_mitigated=np.zeros([dataset_size,dataset_size])
    for rep in range(dataset_size):
        for rep2 in range(dataset_size):
            kernel_matrix_mitigated[rep,rep2]=kernel_matrix[rep,rep2]/np.sqrt(kernel_matrix[rep,rep]*kernel_matrix[rep2,rep2])
    return kernel_matrix_mitigated
    

def rbf_kernel(dataset,gamma=0.25):
    """
    classical radial basis function kernel between data1 and data2 exp(-gamma*(data1-data2)**2)
    
    dataset: Matrix containing features of dataset
    
    
    gram_matrix: resulting rbf kernel 
    """
    
    
    nData=len(dataset)
    gram_matrix=np.zeros([nData,nData])
    for i in range(nData):
        for j in range(nData):
            gram_matrix[i,j]=np.exp(-gamma*np.sum(np.abs(dataset[i]-dataset[j])**2))
    return gram_matrix



In [3]:
#main_path=os.path.join("studies","handwriting-all-digits","YZ","results","processed","raw")
#label_file="y-ibmq_guadalupe,n_qubits8,depth10,n_shots8192,n_unitaries8,crossfid_modeRzRy.csv"
#feature_file="X-ibmq_guadalupe,n_qubits8,depth10,n_shots8192,n_unitaries8,crossfid_modeRzRy.csv"
#kernel_file="GramMatrix-ibmq_guadalupe,n_qubits8,depth10,n_shots8192,n_unitaries8,crossfid_modeRzRy.csv"

main_path=os.path.join("studies","handwriting-all-digits","NPQC","results","processed","raw")
label_file="y-ibmq_guadalupe,n_qubits8,depth8,n_shots8192,n_unitaries8,crossfid_modeRzRy_custom.csv"
feature_file="X-ibmq_guadalupe,n_qubits8,depth8,n_shots8192,n_unitaries8,crossfid_modeRzRy_custom.csv"
kernel_file="GramMatrix-ibmq_guadalupe,n_qubits8,depth8,n_shots8192,n_unitaries8,crossfid_modeRzRy_custom.csv"




label_data=np.loadtxt(os.path.join(main_path,label_file),delimiter=",")
feature_data=np.loadtxt(os.path.join(main_path,feature_file),delimiter=",")
quantum_kernel_data=np.loadtxt(os.path.join(main_path,kernel_file),delimiter=",")

#unmitigated kernel
quantum_kernel_unmitigated=quantum_kernel_data

#mitigate quantum kernel
quantum_kernel_mitigated=error_mitigation(quantum_kernel_unmitigated)

#get classical rbf kernel as reference
rbf_kernel_data=rbf_kernel(feature_data)

dataset_size=len(label_data)

print("Load features, labels and kernel with a size of",dataset_size)



Load features, labels and kernel with a size of 1790


Given the kernel, we now randomly draw part of it as test and training data to evaluate how well the kernel can learn the data.
Define the size of the test and training dataset, and how often to repeat the training with randomly drawn data.

In [4]:
#number of test data and training data
#randomly shuffle dataset and randomly assign test and training data
n_test=200 #size of test data
n_train=800 #size of training data
n_shuffle_data=10 #randomize training and test data for n_shuffle_data times

regularizationC=1 #regularization for SVM training

Do training of SMV with data

In [5]:
print("Training with",n_train,"datapoints, testing with",n_test,"datapoints, randomly draw data",n_shuffle_data,"times")

if(n_test+n_train>dataset_size):
    raise NameError("Loaded dataset is smaller than size of training and test data")

stratify=label_data #stratify to assign data such that each label has equal probability
full_range=np.arange(dataset_size)

random_seed_select=0 #seed for random selection of data

accuracy_train_quantum_mitigated=np.zeros(n_shuffle_data)
accuracy_train_quantum_unmitigated=np.zeros(n_shuffle_data)
accuracy_train_rbf=np.zeros(n_shuffle_data)

accuracy_test_quantum_mitigated=np.zeros(n_shuffle_data)
accuracy_test_quantum_unmitigated=np.zeros(n_shuffle_data)
accuracy_test_rbf=np.zeros(n_shuffle_data)

for rep_training in range(n_shuffle_data):
    
    #randomly draw subset of data as training and test dataset
    X_train_select , X_test_select , y_train_select, y_test_select,index_train,index_test = \
        train_test_split(feature_data, label_data, full_range,train_size = n_train,test_size = n_test, random_state = random_seed_select,stratify=stratify)
    
    
    #train SVM with mitigated quantum kernel
    gram_train_select=quantum_kernel_mitigated[index_train,:][:,index_train]
    gram_test_select=quantum_kernel_mitigated[index_test,:][:,index_train]
    accuracy_train,accuracy_test=train_gram(y_train_select,y_test_select,gram_train_select,gram_test_select,regularizationC=regularizationC)
    accuracy_train_quantum_mitigated[rep_training]=accuracy_train
    accuracy_test_quantum_mitigated[rep_training]=accuracy_test

    
    #train SVM with ubmitigated quantum kernel
    gram_train_select=quantum_kernel_unmitigated[index_train,:][:,index_train]
    gram_test_select=quantum_kernel_unmitigated[index_test,:][:,index_train]
    accuracy_train,accuracy_test=train_gram(y_train_select,y_test_select,gram_train_select,gram_test_select,regularizationC=regularizationC)
    accuracy_train_quantum_unmitigated[rep_training]=accuracy_train
    accuracy_test_quantum_unmitigated[rep_training]=accuracy_test

    
    #train SVM with classical rbf kernel
    gram_train_select=rbf_kernel_data[index_train,:][:,index_train]
    gram_test_select=rbf_kernel_data[index_test,:][:,index_train]
    accuracy_train,accuracy_test=train_gram(y_train_select,y_test_select,gram_train_select,gram_test_select,regularizationC=regularizationC)
    accuracy_train_rbf[rep_training]=accuracy_train
    accuracy_test_rbf[rep_training]=accuracy_test

    
    
    random_seed_select+=1 #change seed for data selection
    
    
print("Finished Training SVM")

Training with 800 datapoints, testing with 200 datapoints, randomly draw data 10 times
Finished Training SVM


In [6]:
print("Training accuracy mitigated quantum kernel",np.mean(accuracy_train_quantum_mitigated),"±",np.std(accuracy_train_quantum_mitigated))

print("Training accuracy unmitigated quantum kernel",np.mean(accuracy_train_quantum_unmitigated),"±",np.std(accuracy_train_quantum_unmitigated))

print("Training accuracy classical rbf kernel",np.mean(accuracy_train_rbf),"±",np.std(accuracy_train_rbf))



Training accuracy mitigated quantum kernel 0.9686250000000001 ± 0.005574775780244413
Training accuracy unmitigated quantum kernel 0.9247500000000001 ± 0.008474225628339127
Training accuracy classical rbf kernel 0.982375 ± 0.003184434800714267


In [7]:
print("Test accuracy mitigated quantum kernel",np.mean(accuracy_test_quantum_mitigated),"±",np.std(accuracy_test_quantum_mitigated))

print("Test accuracy unmitigated quantum kernel",np.mean(accuracy_test_quantum_unmitigated),"±",np.std(accuracy_test_quantum_unmitigated))

print("Test accuracy classical rbf kernel",np.mean(accuracy_test_rbf),"±",np.std(accuracy_test_rbf))




Test accuracy mitigated quantum kernel 0.9274999999999999 ± 0.019137659209004617
Test accuracy unmitigated quantum kernel 0.885 ± 0.020248456731316606
Test accuracy classical rbf kernel 0.9594999999999999 ± 0.01863464515358422
