# Pegasos Quantum Support Vector Classifier

There's another SVM based algorithm that benefits from the quantum kernel method. Here, we introduce an implementation of a another classification algorithm, which is an alternative version to the `QSVC` available in Qiskit Machine Learning and shown in the ["Quantum Kernel Machine Learning"](./03_quantum_kernel.ipynb) tutorial. This classification algorithm implements the Pegasos algorithm from the paper "Pegasos: Primal Estimated sub-GrAdient SOlver for SVM" by Shalev-Shwartz et al., see: https://home.ttic.edu/~nati/Publications/PegasosMPB.pdf.

This algorithm is an alternative to the dual optimization from the `scikit-learn` package, benefits from the kernel trick, and yields a training complexity that is independent of the size of the training set. Thus, the `PegasosQSVC` is expected to train faster than QSVC for sufficiently large training sets.

The algorithm can be used as direct replacement of `QSVC` with some hyper-parameterization.

Let's generate some data:

In [1]:
import os
files = os.listdir(os.curdir)
for file in files:
    print(file)

.config
BotnetDgaDataset_1000.csv
sample_data


In [2]:

import csv
import os
import numpy as np
from sklearn.datasets import make_blobs
datafilename="BotnetDgaDataset_1000.csv"
resultname="result_BotnetDgaDataset_pegasos_1000.txt"
cwd=os.getcwd()
mycsv=cwd+"/"+datafilename
print(mycsv)
def load_data(filepath):
    with open(filepath) as csv_file:
        data_file = csv.reader(csv_file)
        temp = next(data_file)
        n_samples = 1000
        n_features = 7
        data = np.empty((n_samples, n_features))
        target = np.empty((n_samples,), dtype=int)

        for i, ir in enumerate(data_file):
            data[i] = np.asarray(ir[:-1], dtype=np.float64)
            target[i] = np.asarray(ir[-1], dtype=int)

    return data, target
features, labels = load_data(mycsv)
print (len(features))
print (len(labels))

/content/BotnetDgaDataset_1000.csv
1000
1000


In [4]:
!pip install qiskit-machine-learning

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting qiskit-machine-learning
  Downloading qiskit_machine_learning-0.5.0-py3-none-any.whl (152 kB)
[K     |████████████████████████████████| 152 kB 4.4 MB/s 
[?25hCollecting dill<0.3.6,>=0.3.4
  Downloading dill-0.3.5.1-py2.py3-none-any.whl (95 kB)
[K     |████████████████████████████████| 95 kB 5.9 MB/s 
Collecting qiskit-terra>=0.22
  Downloading qiskit_terra-0.22.3-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (4.8 MB)
[K     |████████████████████████████████| 4.8 MB 63.5 MB/s 
Collecting stevedore>=3.0.0
  Downloading stevedore-4.1.1-py3-none-any.whl (50 kB)
[K     |████████████████████████████████| 50 kB 8.0 MB/s 
[?25hCollecting ply>=3.10
  Downloading ply-3.11-py2.py3-none-any.whl (49 kB)
[K     |████████████████████████████████| 49 kB 7.6 MB/s 
[?25hCollecting tweedledum<2.0,>=1.1
  Downloading tweedledum-1.1.1-cp38-cp38-manylinux_2_12_x86_64.manylinux2010

In [5]:
import qiskit.tools.jupyter

%qiskit_version_table

Qiskit Software,Version
qiskit-terra,0.22.3
qiskit-machine-learning,0.5.0
System information,
Python version,3.8.16
Python compiler,GCC 7.5.0
Python build,"default, Dec 7 2022 01:12:13"
OS,Linux
CPUs,6
Memory (Gb),83.48347091674805
Mon Jan 02 20:12:27 2023 UTC,Mon Jan 02 20:12:27 2023 UTC


We pre-process the data to ensure compatibility with the rotation encoding and split it into the training and test datasets.

In [6]:
import numpy as np

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler

features = MinMaxScaler(feature_range=(0, np.pi)).fit_transform(features)

train_features, test_features, train_labels, test_labels = train_test_split(
    features, labels, train_size=700, shuffle=False
)


We have seven features in the dataset, so we set a number of qubits to the number of features in the dataset.

Then we set $\tau$ to the number of steps performed during the training procedure. Please note that, there is no early stopping criterion in the algorithm. The algorithm iterates over all $\tau$ steps.

And the last one is the hyperparameter $C$. This is a positive regularization parameter. The strength of the regularization is inversely proportional to $C$. Smaller $C$ induce smaller weights which generally helps preventing overfitting. However, due to the nature of this algorithm, some of the computation steps become trivial for larger $C$. Thus, larger $C$ improve the performance of the algorithm drastically. If the data is linearly separable in feature space, $C$ should be chosen to be large. If the separation is not perfect, $C$ should be chosen smaller to prevent overfitting.

In [7]:
# number of qubits is equal to the number of features
num_qubits = 7

# number of steps performed during the training procedure
tau = 100

# regularization parameter
C = 1000

The algorithm will run using:

- The default fidelity instantiated in `FidelityQuantumKernel`
- A quantum kernel created from `ZFeatureMap`

In [8]:
from qiskit import BasicAer
from qiskit.circuit.library import ZFeatureMap
from qiskit.utils import algorithm_globals

from qiskit_machine_learning.kernels import FidelityQuantumKernel

algorithm_globals.random_seed = 12345

feature_map = ZFeatureMap(feature_dimension=num_qubits, reps=1)

qkernel = FidelityQuantumKernel(feature_map=feature_map)

The implementation `PegasosQSVC` is compatible with the `scikit-learn` interfaces and has a pretty standard way of training a model. In the constructor we pass parameters of the algorithm, in this case there are a regularization hyper-parameter $C$ and a number of steps.

Then we pass training features and labels to the `fit` method, which trains a models and returns a fitted classifier.

Afterwards, we score our model using test features and labels.

In [9]:
from qiskit_machine_learning.algorithms import PegasosQSVC
import time
pegasos_start=time.perf_counter()
pegasos_qsvc = PegasosQSVC(quantum_kernel=qkernel, C=C, num_steps=tau)

# training
pegasos_qsvc.fit(train_features, train_labels)

# testing
pegasos_score = pegasos_qsvc.score(test_features, test_labels)
pegasos_end=time.perf_counter()

print(f"PegasosQSVC Accuracy: {pegasos_score}")
print(pegasos_end-pegasos_start)

PegasosQSVC Accuracy: 0.85
48.881030916999975


In [None]:
import qiskit.tools.jupyter

%qiskit_version_table


Qiskit Software,Version
qiskit-terra,0.22.3
qiskit-machine-learning,0.5.0
System information,
Python version,3.8.16
Python compiler,GCC 7.5.0
Python build,"default, Dec 7 2022 01:12:13"
OS,Linux
CPUs,6
Memory (Gb),83.48347091674805
Mon Jan 02 16:42:58 2023 UTC,Mon Jan 02 16:42:58 2023 UTC


In [None]:
!pip install pylatexenc

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting pylatexenc
  Downloading pylatexenc-2.10.tar.gz (162 kB)
[K     |████████████████████████████████| 162 kB 14.8 MB/s 
[?25hBuilding wheels for collected packages: pylatexenc
  Building wheel for pylatexenc (setup.py) ... [?25l[?25hdone
  Created wheel for pylatexenc: filename=pylatexenc-2.10-py3-none-any.whl size=136833 sha256=9c17cdc0bcbf91e67208f3ec9096b3f53dbbc910dce0273c1b4559c12989e6da
  Stored in directory: /root/.cache/pip/wheels/72/99/be/81d9bcdf5dd5ee5acd8119a9dd5bc07204c9ce205fd341b021
Successfully built pylatexenc
Installing collected packages: pylatexenc
Successfully installed pylatexenc-2.10


In [None]:
from qiskit_machine_learning.algorithms import QSVC
import time
from qiskit.algorithms.optimizers import SLSQP, SPSA
from qiskit.circuit.library import ZFeatureMap

num_features = features.shape[1]

feature_map = ZZFeatureMap(feature_dimension=num_features, reps=1)
feature_map.decompose().draw(output="mpl", fold=20)
qsvc = QSVC(quantum_kernel=qkernel)


import time
QSVC_start=time.perf_counter()
# training
qsvc.fit(train_features, train_labels)

# testing
qsvc_score = qsvc.score(test_features, test_labels)
QSVC_end=time.perf_counter()

print(f"QSVC Accuracy: {qsvc_score}")
print(QSVC_end-QSVC_start) 

QSVC Accuracy: 0.8633333333333333
3090.653390765001
