# Pratice 2: Quantum Support Vector Machines

MQIST 2025/26: Quantum Computing and Machine Learning

Alfredo Chavert Sancho

Pedro Herrero Maldonado

## Data loading and preprocessing

The `load_breast_cancer` function from `sklearn.datasets` is used to load the breast cancer dataset. This dataset contains features computed from breast cancer biopsy images, along with labels indicating whether the cancer is malignant or benign. So this is a binary classification problem. There are 30 features in total and 569 samples which are divided into:

- 357 benign samples labeled as 1
- 212 malignant samples labeled as 0

As measure to evaluate the performance of our quantum neural network, we will use **recall** of the positive class labeled with 0, which is the ratio of correctly predicted positive malignant cases to the total actual positive instances in the dataset (true positives and false negatives). 
$$
\text{Recall} = \frac{\text{True Malignant}}{\text{True Malignant} + \text{False Benign}}
$$

This is particularly important in medical diagnosis tasks, where minimizing false negatives (i.e., failing to identify actual positive cases) is crucial. Therefore, our goal in this work is to maximize this metric in our experiments.

In [1]:
from sklearn.datasets import load_breast_cancer
data = load_breast_cancer()
X = data.data # features
y = data.target # labels

print("Original shape:", X.shape)

Original shape: (569, 30)


For the prepocessing the data to be used in the quantum neural network, we will use  ``StandardScaler`` from ``sklearn.preprocessing`` to normalize it  since all the data is numeric.

We will also use ``PCA`` from ``sklearn.decomposition`` to reduce the dimensionality of the data to reduce the number of qubits needed in the quantum neural network. This transformation helps in capturing the most significant variance in the data while reducing its complexity.

In [13]:
from sklearn.preprocessing import MinMaxScaler, StandardScaler
from sklearn.decomposition import PCA
import matplotlib.pyplot as plt

# Standardize (important for PCA)
# scaler = StandardScaler()
scaler = MinMaxScaler(feature_range=(0, 1)) 
X_scaled = scaler.fit_transform(X)

# Apply PCA to reduce to 4 dimensions
pca = PCA(n_components = 4)
X_pca = pca.fit_transform(X_scaled)

print("Shape after PCA:", X_pca.shape) # (150, 2)
print("Explained variance:", pca.explained_variance_ratio_)
print("Total variance explained:", pca.explained_variance_ratio_.sum())

Shape after PCA: (569, 4)
Explained variance: [0.53097689 0.1728349  0.07114442 0.06411259]
Total variance explained: 0.8390687984671878


Finally, we will split the dataset into training and testing sets using ``train_test_split`` from ``sklearn.model_selection``.

In [3]:
from sklearn.model_selection import train_test_split
from qiskit_algorithms.utils import algorithm_globals

# Set the seed so results are reproducible
algorithm_globals.random_seed = 42

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X_pca, y, test_size=0.2, random_state=algorithm_globals.random_seed,)

## Classical SVM

Classical SVM is implemented using ``SVC`` from ``sklearn.svm`` with a radial basis function (RBF) kernel. The model is trained on the training set and evaluated on the test set. The recall metric is calculated to assess the performance of the classical SVM.

### Linear kernel

In [4]:
from sklearn.svm import SVC
from sklearn.metrics import classification_report

svc_linear = SVC(kernel = 'linear', gamma='auto')
svc_linear.fit(X_train, y_train)

y_svc_linear = svc_linear.predict(X_test)
report_svc_linear = classification_report(y_test, y_svc_linear)
print(report_svc_linear)

              precision    recall  f1-score   support

           0       0.98      0.95      0.96        43
           1       0.97      0.99      0.98        71

    accuracy                           0.97       114
   macro avg       0.97      0.97      0.97       114
weighted avg       0.97      0.97      0.97       114



### Polynomial kernel

In [5]:
svc_poly = SVC(kernel = 'poly', gamma='auto')
svc_poly .fit(X_train, y_train)

y_svc_poly = svc_poly.predict(X_test)
report_svc_poly = classification_report(y_test, y_svc_poly)
print(report_svc_poly)

              precision    recall  f1-score   support

           0       1.00      0.58      0.74        43
           1       0.80      1.00      0.89        71

    accuracy                           0.84       114
   macro avg       0.90      0.79      0.81       114
weighted avg       0.87      0.84      0.83       114



### RBF kernel

In [6]:
svc_rbf = SVC(kernel = 'rbf', gamma='auto')
svc_rbf.fit(X_train, y_train)

y_svc_rbf = svc_rbf.predict(X_test)
report_svc_rbf = classification_report(y_test, y_svc_rbf)
print(report_svc_rbf)

              precision    recall  f1-score   support

           0       0.95      0.93      0.94        43
           1       0.96      0.97      0.97        71

    accuracy                           0.96       114
   macro avg       0.96      0.95      0.95       114
weighted avg       0.96      0.96      0.96       114



The performance of the classical SVM with Linear, Polynomial, and RBF kernel is as follows:

| Kernel     | Recall |
|------------|--------|
| Linear     | 0.95  |
| Polynomial | 0.58  |
| RBF        | 0.93  |

We can observe that the Linear kernel performs the best in terms of recall for this dataset.



## Quantum SVM

### Z Feature Map

In [14]:
from qiskit_machine_learning.kernels import FidelityQuantumKernel
from qiskit_machine_learning.algorithms import QSVC
from qiskit.circuit.library import ZFeatureMap

num_features = X_pca.shape[1]

kernel = FidelityQuantumKernel(feature_map = ZFeatureMap(feature_dimension = num_features, reps = 2))
qsvc_z = QSVC(quantum_kernel = kernel)
qsvc_z.fit(X_train, y_train)
y_pred_qsvc_z = qsvc_z.predict(X_test)

report_qsvc_z = classification_report(y_test, y_pred_qsvc_z)
print(report_qsvc_z)

              precision    recall  f1-score   support

           0       0.59      0.40      0.47        43
           1       0.69      0.83      0.76        71

    accuracy                           0.67       114
   macro avg       0.64      0.61      0.61       114
weighted avg       0.65      0.67      0.65       114



### ZZ Feature Map

In [9]:
from qiskit.circuit.library import ZZFeatureMap

kernel = FidelityQuantumKernel(feature_map = ZZFeatureMap(feature_dimension = num_features, reps = 2))
qsvc_zz = QSVC(quantum_kernel = kernel)
qsvc_zz.fit(X_train, y_train)
y_pred_qsvc_zz = qsvc_zz.predict(X_test)

report_qsvc_zz = classification_report(y_test, y_pred_qsvc_zz)
print(report_qsvc_zz)

              precision    recall  f1-score   support

           0       0.52      0.33      0.40        43
           1       0.67      0.82      0.73        71

    accuracy                           0.63       114
   macro avg       0.59      0.57      0.57       114
weighted avg       0.61      0.63      0.61       114



### Pauli Feature Map

In [10]:
from qiskit.circuit.library import PauliFeatureMap

kernel = FidelityQuantumKernel(feature_map = PauliFeatureMap(feature_dimension = num_features, reps = 2, paulis = ['Z', 'XX', 'ZXZ']))
qsvc_pauli = QSVC(quantum_kernel = kernel)
qsvc_pauli.fit(X_train, y_train)
y_pred_qsvc_pauli = qsvc_pauli.predict(X_test)

report_qsvc_pauli = classification_report(y_test, y_pred_qsvc_pauli)
print(report_qsvc_pauli)

              precision    recall  f1-score   support

           0       0.60      0.21      0.31        43
           1       0.66      0.92      0.76        71

    accuracy                           0.65       114
   macro avg       0.63      0.56      0.54       114
weighted avg       0.64      0.65      0.59       114

