# QSVR for Daily Temperature of Major cities dataset regression with TorchQuantum


##Introduction to Quantum Kernel Methods.


###Kernel Methods
Kernels or kernel methods (also called Kernel functions) are sets of different types of algorithms that are being used for pattern analysis. They are used to solve a non-linear problem by a linear classifier. Kernels Methods are employed in SVM (Support Vector Machines) which are often used in classification and regression problems. The SVM uses what is called a “Kernel Trick” where the data is transformed and an optimal boundary is found for the possible outputs.


####Quantum Kernel
Quantum circuit can transfer the data to a high dimension Hilbert space which is hard to simulate on classical computer. Using kernel methods based on this Hilbert space can achieve unexpected performance.

###How to evaluate the distance in Hilbert space?
Assume S(x) is the unitary that transfer data x to the state in Hilbert space. To evaluate the inner product between S(x) and S(y), we add a Transpose Conjugation of S(y) behind S(x) and measure the probability that the state falls on $|00\cdots0\rangle$

##Build and train an SVM using Quantum Kernel Methods.

###Installation

In [1]:
# !pip install 'qiskit[all]'
# !pip install qiskit_ibm_runtime
# !pip install qiskit_algorithms
# !pip install qiskit-machine-learning

Download and cd to the repo.

###Import the module
`SVR` is support vector regression. We use this module to call the support vector machine algorithm.

`StandardScaler` is to help scale the data by removing the mean and scaling to unit variance.

`train_test_split` is a tool to split the dataset.

`func_name_dict` is a very important dict under `torchquantum.functional`. If we feed the name of the gates we want, like ‘rx’, ‘ry’, or ‘rzz’, the dict will give us a function. The function plays a central role in our quantum model. It performs the specified unitary operations on a specified quantum state on a specified wire. These three specified things are the three parameters we need to pass to it. You can see that later.


In [19]:
import numpy as np

from qiskit_machine_learning.algorithms import QSVR
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.metrics import accuracy_score
from qiskit_ibm_runtime import QiskitRuntimeService, Sampler
from qiskit.compiler import transpile
from qiskit.quantum_info import Operator
from qiskit.circuit.library import ZZFeatureMap
from qiskit_algorithms.state_fidelities import ComputeUncompute
from qiskit_machine_learning.kernels import FidelityQuantumKernel

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
from sklearn.decomposition import PCA


###Prepare dataset



In [17]:
df = pd.read_csv("/home/mahmoudelkarargy/Desktop/Womanium/city_temperature.csv")
df = df.drop(df.query('AvgTemperature == -99').index).reset_index(drop= True) # adjust index after dropping the data
df = df.drop('State', axis= 1)
df = df.drop('Region', axis= 1)
df = df.drop('City', axis= 1)
df['AvgTemperature'] = df['AvgTemperature'].map(lambda x: (x - 32) * (5/9))
df = df[df['Country'].isin(['Egypt', 'Algeria'])]
df = df.drop(df.query('AvgTemperature == -99').index).reset_index(drop= True)
df.shape

  df = pd.read_csv("/home/mahmoudelkarargy/Desktop/Womanium/city_temperature.csv")


(18458, 5)

In [18]:
df = df.rename(
    columns= {'Country': 'country', 'Month': 'month', 'Day': 'day', 'Year': 'year', 'AvgTemperature': 'avg_temperature'}
)
df=pd.get_dummies(df,columns=['country','month','day','year'])
X=df.drop(columns=['avg_temperature'])
y=df['avg_temperature']
X=X.astype(int)

## PCA to reduce features from 71 to 10

In [20]:
pca = PCA(n_components=10)
X = pca.fit_transform(X)

In [21]:
X

array([[ 0.70663703, -0.54118828,  0.77768511, ..., -0.0097965 ,
        -0.06767613,  0.00566058],
       [ 0.70663703, -0.54118828,  0.77768511, ..., -0.0097965 ,
        -0.06767613,  0.00566058],
       [ 0.70663703, -0.54118828,  0.77768511, ..., -0.0097965 ,
        -0.06767613,  0.00566058],
       ...,
       [-0.70730528, -0.03793927, -0.17211521, ..., -0.04757115,
        -0.14808166,  0.00979586],
       [-0.70722333, -0.03795553, -0.17216213, ..., -0.04738476,
        -0.14682606,  0.00882487],
       [-0.70738729, -0.03740171, -0.17289658, ..., -0.04737893,
        -0.14697779,  0.00885795]])

In [22]:
y

0        17.888889
1         9.666667
2         9.333333
3         8.000000
4         8.833333
           ...    
18453    22.000000
18454    22.277778
18455    23.444444
18456    25.722222
18457    20.555556
Name: avg_temperature, Length: 18458, dtype: float64

In [23]:
x_train, x_test, y_train, y_test = train_test_split(X, y,
                                                    train_size=0.8, test_size=0.2)
x_train,x_val,y_train,y_val=train_test_split(x_train,y_train,train_size=0.8,test_size=0.2)

# Print information about the splits
print(f"Total dataset length: {len(X)}")
print(f"Training set length: {len(x_train)}")
print(f"Validation set length: {len(x_val)}")
print(f"Test set length: {len(x_test)}")
print(x_train)
print(y_train)

Total dataset length: 18458
Training set length: 11812
Validation set length: 2954
Test set length: 3692
[[ 0.70681552 -0.03669474 -0.15195435 ... -0.46537334 -0.24418361
   0.01275145]
 [-0.7073168  -0.52964951  0.82551147 ... -0.01811019 -0.08266317
   0.00651505]
 [ 0.70671689 -0.01410361 -0.04988903 ...  0.01304584  0.20279304
  -0.03493842]
 ...
 [ 0.70681676 -0.02241435 -0.08615752 ...  0.02600249  0.43523244
  -0.42166702]
 [-0.7072313  -0.03994103 -0.17202831 ... -0.1378068  -0.20437388
   0.0092364 ]
 [-0.70724337 -0.54102905  0.7779793  ... -0.01030364 -0.06851299
   0.00340074]]
4589     24.611111
18341    14.000000
5150      9.333333
15365    18.833333
6897     10.833333
           ...    
2607     13.222222
7778     23.722222
2495     12.944444
14247    24.444444
13968    13.611111
Name: avg_temperature, Length: 11812, dtype: float64


In [24]:
tokens="88932d7d58efc72a2293259005cf2a11c39901510b6b0d8747bc15103d5b9e017ab3ce765a7c8d7374fc8d75b451b09a0ace5386658666b3eb3b6b9166deb635"
ibm_quantum_service = QiskitRuntimeService(channel="ibm_quantum", token=tokens)
QiskitRuntimeService.save_account(channel="ibm_quantum", token=tokens,overwrite=True)

In [11]:
service = QiskitRuntimeService(channel="ibm_quantum")
available_backends = service.backends()
print([backend.name for backend in available_backends])

['ibm_brisbane', 'ibm_kyoto', 'ibm_osaka', 'ibm_sherbrooke']


In [12]:
service.least_busy(operational=True, min_num_qubits=5)

<IBMBackend('ibm_osaka')>

In [13]:
backend = service.backend("ibm_osaka")

In [25]:
transpiled_circuit= ZZFeatureMap(feature_dimension=10, reps=1, entanglement="linear")

adhoc_feature_map = transpile(transpiled_circuit,backend = backend, optimization_level = 0)

sampler = Sampler(backend=backend)

fidelity = ComputeUncompute(sampler=sampler)

adhoc_kernel = FidelityQuantumKernel(fidelity=fidelity, feature_map=adhoc_feature_map)

  sampler = Sampler(backend=backend)


In [26]:
print(x_train.shape)
print(x_test.shape)
print(y_train.shape)
print(y_test.shape)

(11812, 10)
(3692, 10)
(11812,)
(3692,)


###Train the svm model from sklearn based on our quantum kernel.

Define a kernel matrix function.

Pass the kernel matrix function to SVR, call `.fit(X_train, y_train)` and the SVC object can start training.

Predict and see the accuracy. The accuracy looks pretty well.


In [30]:
qsvr = QSVR(quantum_kernel=adhoc_kernel)
qsvr.fit(x_train, y_train)
predictions = qsvr.predict(x_test)

In [29]:
accuracy = qsvr.score(x_test, y_test)
print(f"Accuracy: {accuracy}")

Accuracy: 0.93215454211545
