# Pegasos Quantum Support Vector Classifier

There's another SVM based algorithm that benefits from the quantum kernel method. Here, we introduce an implementation of a another classification algorithm, which is an alternative version to the `QSVC` available in Qiskit Machine Learning and shown in the ["Quantum Kernel Machine Learning"](./03_quantum_kernel.ipynb) tutorial. This classification algorithm implements the Pegasos algorithm from the paper "Pegasos: Primal Estimated sub-GrAdient SOlver for SVM" by Shalev-Shwartz et al., see: https://home.ttic.edu/~nati/Publications/PegasosMPB.pdf.

This algorithm is an alternative to the dual optimization from the `scikit-learn` package, benefits from the kernel trick, and yields a training complexity that is independent of the size of the training set. Thus, the `PegasosQSVC` is expected to train faster than QSVC for sufficiently large training sets.

The algorithm can be used as direct replacement of `QSVC` with some hyper-parameterization.

Let's generate some data:

In [32]:
!wget https://aq5efd7d2644dd406cb3ec2d.blob.core.windows.net/dga/BotnetDgaDataset_1000.csv


--2022-12-29 06:26:33--  https://aq5efd7d2644dd406cb3ec2d.blob.core.windows.net/dga/BotnetDgaDataset_1000.csv
Resolving aq5efd7d2644dd406cb3ec2d.blob.core.windows.net (aq5efd7d2644dd406cb3ec2d.blob.core.windows.net)... 52.239.169.4
Connecting to aq5efd7d2644dd406cb3ec2d.blob.core.windows.net (aq5efd7d2644dd406cb3ec2d.blob.core.windows.net)|52.239.169.4|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 75901 (74K) [text/csv]
Saving to: ‘BotnetDgaDataset_1000.csv’


2022-12-29 06:26:33 (595 KB/s) - ‘BotnetDgaDataset_1000.csv’ saved [75901/75901]



In [19]:
import os
files = os.listdir(os.curdir)
for file in files:
    print(file)

.bash_logout
.bashrc
.profile
azurequantumtoken.json
.cache
.ipython
.config
.jupyter
.local
BotnetDgaDataset_10.csv
.gitconfig
.dotnet
.nuget
.templateengine
.azure
.packages


In [34]:

import csv
import os
import numpy as np
from sklearn.datasets import make_blobs
datafilename="BotnetDgaDataset_1000.csv"
resultname="result_BotnetDgaDataset_pegasos_1000.txt"
cwd=os.getcwd()
mycsv=cwd+"/"+datafilename
print(mycsv)
def load_data(filepath):
    with open(filepath) as csv_file:
        data_file = csv.reader(csv_file)
        temp = next(data_file)
        n_samples = 1000
        n_features = 7
        data = np.empty((n_samples, n_features))
        target = np.empty((n_samples,), dtype=int)

        for i, ir in enumerate(data_file):
            data[i] = np.asarray(ir[:-1], dtype=np.float64)
            target[i] = np.asarray(ir[-1], dtype=int)

    return data, target
features, labels = load_data(mycsv)


/home/jovyan/BotnetDgaDataset_1000.csv
[[2.75000000e+00 2.02395145e+00 1.68833892e+00 ... 1.20000000e+01
  8.28927094e-01 6.10289861e+01]
 [2.75000000e+00 1.34932404e+00 1.27629343e+00 ... 1.10000000e+01
  1.07095312e-01 5.03986659e+01]
 [2.94770278e+00 1.18356393e+00 1.04314566e+00 ... 1.20000000e+01
  1.07095312e-01 4.38894660e+01]
 ...
 [2.75000000e+00 2.27882381e+00 1.59106724e+00 ... 1.10000000e+01
  8.28927094e-01 2.20945536e+01]
 [3.03063906e+00 2.57658904e+00 6.44391656e-01 ... 2.00000000e+01
  8.28927094e-01 4.25845531e+01]
 [3.11827516e+00 9.13557521e-01 8.23459962e-01 ... 2.50000000e+01
  1.07095312e-01 1.78458415e+02]]
[0 0 0 0 1 1 0 0 0 0 0 1 1 1 1 1 0 1 0 0 1 0 1 0 0 0 1 0 0 1 0 0 0 1 0 0 0
 0 0 1 0 1 0 1 0 0 0 0 0 1 1 0 0 0 0 0 0 0 1 0 0 0 0 0 1 1 1 1 0 1 0 0 1 1
 0 1 1 1 0 0 1 0 1 1 1 1 1 1 0 1 1 0 0 1 1 0 0 1 0 0 0 0 0 1 1 1 1 1 1 1 1
 1 1 1 1 1 0 0 1 1 0 1 1 0 1 0 1 1 1 0 1 1 0 1 0 0 1 1 0 1 0 1 0 0 1 1 1 0
 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 1 1 0 0 0 0 0 0 1 

In [8]:
!pip install qiskit-machine-learning

Defaulting to user installation because normal site-packages is not writeable


We pre-process the data to ensure compatibility with the rotation encoding and split it into the training and test datasets.

In [35]:
import numpy as np

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler

features = MinMaxScaler(feature_range=(0, np.pi)).fit_transform(features)

train_features, test_features, train_labels, test_labels = train_test_split(
    features, labels, train_size=700, shuffle=False
)


We have seven features in the dataset, so we set a number of qubits to the number of features in the dataset.

Then we set $\tau$ to the number of steps performed during the training procedure. Please note that, there is no early stopping criterion in the algorithm. The algorithm iterates over all $\tau$ steps.

And the last one is the hyperparameter $C$. This is a positive regularization parameter. The strength of the regularization is inversely proportional to $C$. Smaller $C$ induce smaller weights which generally helps preventing overfitting. However, due to the nature of this algorithm, some of the computation steps become trivial for larger $C$. Thus, larger $C$ improve the performance of the algorithm drastically. If the data is linearly separable in feature space, $C$ should be chosen to be large. If the separation is not perfect, $C$ should be chosen smaller to prevent overfitting.

In [36]:
# number of qubits is equal to the number of features
num_qubits = 7

# number of steps performed during the training procedure
tau = 100

# regularization parameter
C = 1000

The algorithm will run using:

- The default fidelity instantiated in `FidelityQuantumKernel`
- A quantum kernel created from `ZFeatureMap`

In [37]:
from qiskit import BasicAer
from qiskit.circuit.library import ZFeatureMap
from qiskit.utils import algorithm_globals

from qiskit_machine_learning.kernels import FidelityQuantumKernel

algorithm_globals.random_seed = 12345

feature_map = ZFeatureMap(feature_dimension=num_qubits, reps=1)

qkernel = FidelityQuantumKernel(feature_map=feature_map)

The implementation `PegasosQSVC` is compatible with the `scikit-learn` interfaces and has a pretty standard way of training a model. In the constructor we pass parameters of the algorithm, in this case there are a regularization hyper-parameter $C$ and a number of steps.

Then we pass training features and labels to the `fit` method, which trains a models and returns a fitted classifier.

Afterwards, we score our model using test features and labels.

In [44]:
from qiskit_machine_learning.algorithms import PegasosQSVC
import time, datetime
pegasos_qsvc = PegasosQSVC(quantum_kernel=qkernel, C=C, num_steps=tau)
start = time.perf_counter()
f = open(resultname, "a")
f.write(datetime.datetime.now().strftime("%Y-%m-%d_%H-%M-%S_") + "   training_size = " + str(700) + "  test_size = " + str(300) + "  feature_dim = " + str(
    7) + "\n\n")
f.flush()
f.close()
# training
pegasos_qsvc.fit(train_features, train_labels)

# testing
pegasos_score = pegasos_qsvc.score(test_features, test_labels)
finish = time.perf_counter()
print(f"PegasosQSVC Accuracy: {pegasos_score}")
print("time=",finish-start)

PegasosQSVC Accuracy: 0.7966666666666666
time= 57.733982165998896


In [45]:
import qiskit.tools.jupyter

%qiskit_version_table


Qiskit Software,Version
qiskit-terra,0.22.3
qiskit-aer,0.10.4
qiskit-ignis,0.7.0
qiskit,0.36.0
qiskit-machine-learning,0.5.0
System information,
Python version,3.9.15
Python compiler,GCC 10.2.1 20210110
Python build,"main, Nov 15 2022 21:44:41"
OS,Linux
