In [None]:
# Install required packages (runs automatically in Colab, fast no-op in Binder)
!pip install -q qiskit qiskit-aer qiskit-ibm-runtime pylatexenc category-encoders numpy pandas scipy scikit-learn tqdm

# Améliorer la classification de caractéristiques à l'aide de noyaux quantiques projetés

*Estimation d'utilisation : 80 minutes sur un processeur Heron r3 (REMARQUE : Ceci est une estimation uniquement. Votre temps d'exécution peut varier.)*

Dans ce tutoriel, nous montrons comment exécuter un [noyau quantique projeté](https://www.nature.com/articles/s41467-021-22539-9) (PQK) avec Qiskit sur un jeu de données biologiques réel, en nous basant sur l'article [Enhanced Prediction of CAR T-Cell Cytotoxicity with Quantum-Kernel Methods](https://arxiv.org/abs/2507.22710) [[1]](#references).

Le PQK est une méthode utilisée en apprentissage automatique quantique (QML) pour encoder des données classiques dans un espace de caractéristiques quantique et les projeter à nouveau dans le domaine classique, en utilisant des ordinateurs quantiques pour améliorer la sélection de caractéristiques. Cela implique l'encodage de données classiques dans des états quantiques à l'aide d'un circuit quantique, généralement par un processus appelé mappage de caractéristiques (feature mapping), où les données sont transformées dans un espace de Hilbert de haute dimension. L'aspect « projeté » fait référence à l'extraction d'informations classiques à partir des états quantiques, en mesurant des observables spécifiques, afin de construire une matrice de noyau qui peut être utilisée dans des algorithmes classiques basés sur les noyaux, comme les machines à vecteurs de support. Cette approche exploite les avantages computationnels des systèmes quantiques pour potentiellement obtenir de meilleures performances sur certaines tâches par rapport aux méthodes classiques.

Ce tutoriel suppose également une familiarité générale avec les méthodes de QML. Pour approfondir le QML, consultez le cours [Quantum machine learning](/learning/courses/quantum-machine-learning) sur IBM Quantum Learning.
### Prérequis
Avant de commencer ce tutoriel, assurez-vous que les éléments suivants sont installés :

- Qiskit SDK v2.0 ou ultérieur, avec le support de [visualisation](https://docs.quantum.ibm.com/api/qiskit/visualization)
- Qiskit Runtime v0.40 ou ultérieur (`pip install qiskit-ibm-runtime`)
- Category encoders 2.8.1 (`pip install category-encoders`)
- NumPy 2.3.2 (`pip install numpy`)
- Pandas 2.3.2 (`pip install pandas`)
- Scikit-learn 1.7.1 (`pip install scikit-learn`)
- Tqdm 4.67.1 (`pip install tqdm`)
### Configuration

In [None]:
import warnings

# Standard libraries
import os
import numpy as np
import pandas as pd

# Machine learning and data processing
import category_encoders as ce
from scipy.linalg import inv, sqrtm
from sklearn.metrics.pairwise import rbf_kernel
from sklearn.model_selection import GridSearchCV, StratifiedKFold
from sklearn.svm import SVC

# Qiskit and Qiskit Runtime
from qiskit import QuantumCircuit
from qiskit.circuit import ParameterVector
from qiskit.circuit.library import UnitaryGate, ZZFeatureMap
from qiskit.quantum_info import SparsePauliOp, random_unitary
from qiskit.transpiler import generate_preset_pass_manager
from qiskit_ibm_runtime import (
    Batch,
    EstimatorOptions,
    EstimatorV2 as Estimator,
    QiskitRuntimeService,
)

# Progress bar
import tqdm

warnings.filterwarnings("ignore")

## Étape 1 : Mapper les entrées classiques vers un problème quantique
### Préparation du jeu de données
Dans ce tutoriel, nous utilisons un jeu de données biologiques réel pour une tâche de classification binaire, généré par Daniels et al. (2022) et téléchargeable depuis le [matériel supplémentaire](https://www.science.org/doi/full/10.1126/science.abq0225#supplementary-materials) inclus avec l'article. Les données sont constituées de cellules CAR T, qui sont des cellules T génétiquement modifiées utilisées en immunothérapie pour traiter certains cancers. Les cellules T, un type de cellule immunitaire, sont modifiées en laboratoire pour exprimer des récepteurs antigéniques chimériques (CAR) ciblant des protéines spécifiques sur les cellules cancéreuses. Ces cellules T modifiées peuvent reconnaître et détruire les cellules cancéreuses plus efficacement. Les caractéristiques des données sont les motifs des cellules CAR T, qui font référence au composant structurel ou fonctionnel spécifique du CAR intégré dans les cellules T. Sur la base de ces motifs, notre tâche consiste à prédire la cytotoxicité d'une cellule CAR T donnée, en la classifiant comme toxique ou non toxique.
Ci-dessous se trouvent les fonctions utilitaires pour le prétraitement de ce jeu de données.

In [2]:
def preprocess_data(dir_root, args):
    """
    Preprocess the training and test data.
    """
    # Read from the csv files
    train_data = pd.read_csv(
        os.path.join(dir_root, args["file_train_data"]),
        encoding="unicode_escape",
        sep=",",
    )
    test_data = pd.read_csv(
        os.path.join(dir_root, args["file_test_data"]),
        encoding="unicode_escape",
        sep=",",
    )

    # Fix the last motif ID
    train_data[train_data == 17] = 14
    train_data.columns = [
        "Cell Number",
        "motif",
        "motif.1",
        "motif.2",
        "motif.3",
        "motif.4",
        "Nalm 6 Cytotoxicity",
    ]
    test_data[test_data == 17] = 14
    test_data.columns = [
        "Cell Number",
        "motif",
        "motif.1",
        "motif.2",
        "motif.3",
        "motif.4",
        "Nalm 6 Cytotoxicity",
    ]

    # Adjust motif at the third position
    if args["filter_for_spacer_motif_third_position"]:
        train_data = train_data[
            (train_data["motif.2"] == 14) | (train_data["motif.2"] == 0)
        ]
        test_data = test_data[
            (test_data["motif.2"] == 14) | (test_data["motif.2"] == 0)
        ]

    train_data = train_data[
        args["motifs_to_use"] + [args["label_name"], "Cell Number"]
    ]
    test_data = test_data[
        args["motifs_to_use"] + [args["label_name"], "Cell Number"]
    ]

    # Adjust motif at the last position
    if not args["allow_spacer_motif_last_position"]:
        last_motif = args["motifs_to_use"][len(args["motifs_to_use"]) - 1]
        train_data = train_data[
            (train_data[last_motif] != 14) & (train_data[last_motif] != 0)
        ]
        test_data = test_data[
            (test_data[last_motif] != 14) & (test_data[last_motif] != 0)
        ]

    # Get the labels
    train_labels = np.array(train_data[args["label_name"]])
    test_labels = np.array(test_data[args["label_name"]])

    # For the classification task use the threshold to binarize labels
    train_labels[train_labels > args["label_binarization_threshold"]] = 1
    train_labels[train_labels < 1] = args["min_label_value"]
    test_labels[test_labels > args["label_binarization_threshold"]] = 1
    test_labels[test_labels < 1] = args["min_label_value"]

    # Reduce data to just the motifs of interest
    train_data = train_data[args["motifs_to_use"]]
    test_data = test_data[args["motifs_to_use"]]

    # Get the class and motif counts
    min_class = np.min(np.unique(np.concatenate([train_data, test_data])))
    max_class = np.max(np.unique(np.concatenate([train_data, test_data])))

    num_class = max_class - min_class + 1
    num_motifs = len(args["motifs_to_use"])
    print(str(max_class) + ":" + str(min_class) + ":" + str(num_class))

    train_data = train_data - min_class
    test_data = test_data - min_class

    return (
        train_data,
        test_data,
        train_labels,
        test_labels,
        num_class,
        num_motifs,
    )


def data_encoder(args, train_data, test_data, num_class, num_motifs):
    """
    Use one-hot or binary encoding for classical data representation.
    """
    if args["encoder"] == "one-hot":
        # Transform to one-hot encoding
        train_data = np.eye(num_class)[train_data]
        test_data = np.eye(num_class)[test_data]

        train_data = train_data.reshape(
            train_data.shape[0], train_data.shape[1] * train_data.shape[2]
        )
        test_data = test_data.reshape(
            test_data.shape[0], test_data.shape[1] * test_data.shape[2]
        )

    elif args["encoder"] == "binary":
        # Transform to binary encoding
        encoder = ce.BinaryEncoder()

        base_array = np.unique(np.concatenate([train_data, test_data]))
        base = pd.DataFrame(base_array).astype("category")
        base.columns = ["motif"]
        for motif_name in args["motifs_to_use"][1:]:
            base[motif_name] = base.loc[:, "motif"]
        encoder.fit(base)

        train_data = encoder.transform(train_data.astype("category"))
        test_data = encoder.transform(test_data.astype("category"))

        train_data = np.reshape(
            train_data.values, (train_data.shape[0], num_motifs, -1)
        )
        test_data = np.reshape(
            test_data.values, (test_data.shape[0], num_motifs, -1)
        )

        train_data = train_data.reshape(
            train_data.shape[0], train_data.shape[1] * train_data.shape[2]
        )
        test_data = test_data.reshape(
            test_data.shape[0], test_data.shape[1] * test_data.shape[2]
        )

    else:
        raise ValueError("Invalid encoding type.")

    return train_data, test_data

Vous pouvez exécuter ce tutoriel en lançant la cellule suivante, qui crée automatiquement la structure de dossiers requise et télécharge les fichiers d'entraînement et de test directement dans votre environnement. Si vous disposez déjà de ces fichiers localement, cette étape les écrasera en toute sécurité pour garantir la cohérence des versions.

In [None]:
## Download dataset

# Create data directory if it doesn't exist
!mkdir -p data_tutorial/pqk

# Download the training and test sets from the official Qiskit documentation repo
!wget -q --show-progress -O data_tutorial/pqk/train_data.csv \
  https://raw.githubusercontent.com/Qiskit/documentation/main/datasets/tutorials/pqk/train_data.csv

!wget -q --show-progress -O data_tutorial/pqk/test_data.csv \
  https://raw.githubusercontent.com/Qiskit/documentation/main/datasets/tutorials/pqk/test_data.csv

!wget -q --show-progress -O data_tutorial/pqk/projections_train.csv \
  https://raw.githubusercontent.com/Qiskit/documentation/main/datasets/tutorials/pqk/projections_train.csv

!wget -q --show-progress -O data_tutorial/pqk/projections_test.csv \
  https://raw.githubusercontent.com/Qiskit/documentation/main/datasets/tutorials/pqk/projections_test.csv

# Check the files have been downloaded
!echo "Dataset files downloaded:"
!ls -lh data_tutorial/pqk/*.csv

In [None]:
args = {
    "file_train_data": "train_data.csv",
    "file_test_data": "test_data.csv",
    "motifs_to_use": ["motif", "motif.1", "motif.2", "motif.3"],
    "label_name": "Nalm 6 Cytotoxicity",
    "label_binarization_threshold": 0.62,
    "filter_for_spacer_motif_third_position": False,
    "allow_spacer_motif_last_position": True,
    "min_label_value": -1,
    "encoder": "one-hot",
}
dir_root = "./"

# Preprocess data
train_data, test_data, train_labels, test_labels, num_class, num_motifs = (
    preprocess_data(dir_root=dir_root, args=args)
)

# Encode the data
train_data, test_data = data_encoder(
    args, train_data, test_data, num_class, num_motifs
)

14:0:15


We also transform the dataset such that $1$ is represented as $\pi/2$ for scaling purposes.

In [4]:
# Change 1 to pi/2
angle = np.pi / 2

tmp = pd.DataFrame(train_data).astype("float64")
tmp[tmp == 1] = angle
train_data = tmp.values

tmp = pd.DataFrame(test_data).astype("float64")
tmp[tmp == 1] = angle
test_data = tmp.values

Nous transformons également le jeu de données de sorte que $1$ soit représenté par $\pi/2$ à des fins de mise à l'échelle.

In [5]:
print(train_data.shape, train_labels.shape)
print(test_data.shape, test_labels.shape)

(172, 60) (172,)
(74, 60) (74,)


Nous vérifions les tailles et les dimensions des jeux de données d'entraînement et de test.

In [6]:
feature_dimension = train_data.shape[1]
reps = 24
insert_barriers = True
entanglement = "pairwise"

# ZZFeatureMap with linear entanglement and a repetition of 2
embed = ZZFeatureMap(
    feature_dimension=feature_dimension,
    reps=reps,
    entanglement=entanglement,
    insert_barriers=insert_barriers,
    name="ZZFeatureMap",
)
embed.decompose().draw(output="mpl", style="iqp", fold=-1)

<Image src="../docs/images/tutorials/projected-quantum-kernels/extracted-outputs/45956df4-5472-4394-a3e1-5514c456791d-0.avif" alt="Output of the previous code cell" />

Another quantum embedding option is the 1D-Heisenberg Hamiltonian evolution ansatz. You can skip running this section if you would like to continue with the `ZZFeatureMap`.

In [7]:
feature_dimension = train_data.shape[1]
num_qubits = feature_dimension + 1
embed2 = QuantumCircuit(num_qubits)
num_trotter_steps = 6
pv_length = feature_dimension * num_trotter_steps
pv = ParameterVector("theta", pv_length)

# Add Haar random single qubit unitary to each qubit as initial state
np.random.seed(42)
seeds_unitary = np.random.randint(0, 100, num_qubits)
for i in range(num_qubits):
    rand_gate = UnitaryGate(random_unitary(2, seed=seeds_unitary[i]))
    embed2.append(rand_gate, [i])


def trotter_circ(feature_dimension, num_trotter_steps):
    num_qubits = feature_dimension + 1
    circ = QuantumCircuit(num_qubits)
    # Even
    for i in range(0, feature_dimension, 2):
        circ.rzz(2 * pv[i] / num_trotter_steps, i, i + 1)
    for i in range(0, feature_dimension, 2):
        circ.rxx(2 * pv[i] / num_trotter_steps, i, i + 1)
    for i in range(0, feature_dimension, 2):
        circ.ryy(2 * pv[i] / num_trotter_steps, i, i + 1)
    # Odd
    for i in range(1, feature_dimension, 2):
        circ.rzz(2 * pv[i] / num_trotter_steps, i, i + 1)
    for i in range(1, feature_dimension, 2):
        circ.rxx(2 * pv[i] / num_trotter_steps, i, i + 1)
    for i in range(1, feature_dimension, 2):
        circ.ryy(2 * pv[i] / num_trotter_steps, i, i + 1)
    return circ


# Hamiltonian evolution ansatz
for step in range(num_trotter_steps):
    circ = trotter_circ(feature_dimension, num_trotter_steps)
    if step % 2 == 0:
        embed2 = embed2.compose(circ)
    else:
        reverse_circ = circ.reverse_ops()
        embed2 = embed2.compose(reverse_circ)


embed2.draw(output="mpl", style="iqp", fold=-1)

<Image src="../docs/images/tutorials/projected-quantum-kernels/extracted-outputs/659dbf23-fd3f-4e01-94b4-33e6d672172c-0.avif" alt="Output of the previous code cell" />

## Étape 2 : Optimiser le problème pour l'exécution sur du matériel quantique
### Circuit quantique
Nous construisons maintenant le mappage de caractéristiques (feature map) qui intègre notre jeu de données classique dans un espace de caractéristiques de dimension supérieure. Pour cette intégration, nous utilisons le [``ZZFeatureMap``](https://docs.quantum.ibm.com/api/qiskit/qiskit.circuit.library.ZZFeatureMap) de Qiskit.

In [None]:
service = QiskitRuntimeService()
backend = service.least_busy(
    operational=True, simulator=False, min_num_qubits=133
)
target = backend.target

![Output of the previous code cell](../docs/images/tutorials/projected-quantum-kernels/extracted-outputs/45956df4-5472-4394-a3e1-5514c456791d-0.avif)

Une autre option d'intégration quantique est l'ansatz d'évolution hamiltonienne de Heisenberg 1D. Vous pouvez ignorer l'exécution de cette section si vous souhaitez continuer avec le `ZZFeatureMap`.

In [None]:
# Let's select the ZZFeatureMap embedding for this example
qc = embed
num_qubits = feature_dimension

# Identity operator on all qubits
id = "I" * num_qubits

# Let's select the first training datapoint as an example
parameters = train_data[0]

# Bind parameter to the circuit and simplify it
qc_bound = qc.assign_parameters(parameters)
transpiler = generate_preset_pass_manager(
    optimization_level=3, basis_gates=["u3", "cz"]
)
transpiled_circuit = transpiler.run(qc_bound)

# Transpile for hardware
transpiler = generate_preset_pass_manager(optimization_level=3, target=target)
transpiled_circuit = transpiler.run(transpiled_circuit)

# We group all commuting observables
# These groups are the Pauli X, Y and Z operators on individual qubits
observables_x = [
    SparsePauliOp(id[:i] + "X" + id[(i + 1) :]).apply_layout(
        transpiled_circuit.layout
    )
    for i in range(num_qubits)
]
observables_y = [
    SparsePauliOp(id[:i] + "Y" + id[(i + 1) :]).apply_layout(
        transpiled_circuit.layout
    )
    for i in range(num_qubits)
]
observables_z = [
    SparsePauliOp(id[:i] + "Z" + id[(i + 1) :]).apply_layout(
        transpiled_circuit.layout
    )
    for i in range(num_qubits)
]

# We define the primitive unified blocs (PUBs) consisting of the embedding circuit,
# set of observables and the circuit parameters
pub_x = (transpiled_circuit, observables_x)
pub_y = (transpiled_circuit, observables_y)
pub_z = (transpiled_circuit, observables_z)

# Experiment options for error mitigation
num_randomizations = 300
shots_per_randomization = 100
noise_factors = [1, 3, 5]

experimental_opts = {}
experimental_opts["resilience"] = {
    "measure_mitigation": True,
    "zne_mitigation": True,
    "zne": {
        "noise_factors": noise_factors,
        "amplifier": "gate_folding",
        "extrapolated_noise_factors": [0] + noise_factors,
    },
}
experimental_opts["twirling"] = {
    "num_randomizations": num_randomizations,
    "shots_per_randomization": shots_per_randomization,
    "strategy": "active-accum",
}

# We define and run the estimator to obtain <X>, <Y> and <Z> on all qubits
estimator = Estimator(mode=backend, options=experimental_opts)

job = estimator.run([pub_x, pub_y, pub_z])

![Output of the previous code cell](../docs/images/tutorials/projected-quantum-kernels/extracted-outputs/659dbf23-fd3f-4e01-94b4-33e6d672172c-0.avif)

## Étape 3 : Exécution à l'aide des primitives Qiskit
### Mesure des 1-RDM
Les principaux éléments constitutifs des noyaux quantiques projetés sont les matrices densité réduites (RDM), qui sont obtenues par des mesures projectives de la carte de caractéristiques quantique. Dans cette étape, nous obtenons toutes les matrices densité réduites à un seul Qubit (1-RDM), qui seront ensuite fournies à la fonction de noyau exponentiel classique.
Voyons comment calculer la 1-RDM pour un seul point de données de l'ensemble de données avant de parcourir toutes les données. Les 1-RDM sont une collection de mesures à un seul Qubit des opérateurs de Pauli ``X``, ``Y`` et ``Z`` sur tous les Qubits. Cela s'explique par le fait qu'une RDM à un seul Qubit peut être entièrement exprimée comme : $$\rho = \frac{1}{2} \big( I + \braket \sigma_x \sigma_x  + \braket \sigma_y \sigma_y + \braket \sigma_z \sigma_z  \big)$$
Nous sélectionnons d'abord le Backend à utiliser.

In [11]:
job_result_x = job.result()[0].data.evs
job_result_y = job.result()[1].data.evs
job_result_z = job.result()[2].data.evs

In [12]:
print(job_result_x)
print(job_result_y)
print(job_result_z)

[ 3.67865951e-03  1.01158571e-02 -3.95790878e-02  6.33984326e-03
  1.86035759e-02 -2.91533268e-02 -1.06374793e-01  4.48873518e-18
  4.70201764e-02  3.53997968e-02  2.53130819e-02  3.23903401e-02
  6.06327843e-03  1.16313667e-02 -1.12387504e-02 -3.18457725e-02
 -4.16445718e-04 -1.45609602e-03 -4.21737114e-01  2.83705669e-02
  6.91332890e-03 -7.45363001e-02 -1.20139326e-02 -8.85566135e-02
 -3.22648394e-02 -3.24228074e-02  6.20431299e-04  3.04225434e-03
  5.72795792e-03  1.11288428e-02  1.50395861e-01  9.18380197e-02
  1.02553163e-01  2.98312847e-02 -3.30298912e-01 -1.13979648e-01
  4.49159340e-03  8.63861493e-02  3.05666566e-02  2.21463145e-04
  1.45946735e-02  8.54537275e-03 -8.09805979e-02 -2.92608104e-02
 -3.91243644e-02 -3.96632760e-02 -1.41187613e-01 -1.07363243e-01
  1.81089440e-02  2.70778895e-02  1.45139414e-02  2.99480458e-02
  4.99137134e-02  7.08789852e-02  4.30565759e-02  8.71287156e-02
  1.04334798e-01  7.72191962e-02  7.10059720e-02  1.04650403e-01]
[-7.31765102e-05  7.4266

Ensuite, nous récupérons les résultats.

In [13]:
print(f"qubits: {qc.num_qubits}")
print(
    f"2q-depth: {transpiled_circuit.depth(lambda x: x.operation.num_qubits==2)}"
)
print(
    f"2q-size: {transpiled_circuit.size(lambda x: x.operation.num_qubits==2)}"
)
print(f"Operator counts: {transpiled_circuit.count_ops()}")
transpiled_circuit.draw("mpl", fold=-1, style="clifford", idle_wires=False)

qubits: 60
2q-depth: 64
2q-size: 1888
Operator counts: OrderedDict({'rz': 6016, 'sx': 4576, 'cz': 1888, 'x': 896, 'barrier': 31})


<Image src="../docs/images/tutorials/projected-quantum-kernels/extracted-outputs/4f573436-ec5c-451b-976c-ad718b3c201d-1.avif" alt="Output of the previous code cell" />

We can now loop over the entire training dataset to obtain all 1-RDMs.

We also provide the results from an experiment that we ran on quantum hardware. You can either run the training yourself by setting the flag below to `True`, or use the projection results that we provide.

In [None]:
# Set this to True if you want to run the training on hardware
run_experiment = False

In [None]:
# Identity operator on all qubits
id = "I" * num_qubits

# projections_train[i][j][k] will be the expectation value of the j-th Pauli operator (0: X, 1: Y, 2: Z)
# of datapoint i on qubit k
projections_train = []
jobs_train = []

# Experiment options for error mitigation
num_randomizations = 300
shots_per_randomization = 100
noise_factors = [1, 3, 5]

experimental_opts = {}
experimental_opts["resilience"] = {
    "measure_mitigation": True,
    "zne_mitigation": True,
    "zne": {
        "noise_factors": noise_factors,
        "amplifier": "gate_folding",
        "return_all_extrapolated": True,
        "return_unextrapolated": True,
        "extrapolated_noise_factors": [0] + noise_factors,
    },
}
experimental_opts["twirling"] = {
    "num_randomizations": num_randomizations,
    "shots_per_randomization": shots_per_randomization,
    "strategy": "active-accum",
}
options = EstimatorOptions(experimental=experimental_opts)

if run_experiment:
    with Batch(backend=backend):
        for i in tqdm.tqdm(
            range(len(train_data)), desc="Training data progress"
        ):
            # Get training sample
            parameters = train_data[i]

            # Bind parameter to the circuit and simplify it
            qc_bound = qc.assign_parameters(parameters)
            transpiler = generate_preset_pass_manager(
                optimization_level=3, basis_gates=["u3", "cz"]
            )
            transpiled_circuit = transpiler.run(qc_bound)

            # Transpile for hardware
            transpiler = generate_preset_pass_manager(
                optimization_level=3, target=target
            )
            transpiled_circuit = transpiler.run(transpiled_circuit)

            # We group all commuting observables
            # These groups are the Pauli X, Y and Z operators on individual qubits
            observables_x = [
                SparsePauliOp(id[:i] + "X" + id[(i + 1) :]).apply_layout(
                    transpiled_circuit.layout
                )
                for i in range(num_qubits)
            ]
            observables_y = [
                SparsePauliOp(id[:i] + "Y" + id[(i + 1) :]).apply_layout(
                    transpiled_circuit.layout
                )
                for i in range(num_qubits)
            ]
            observables_z = [
                SparsePauliOp(id[:i] + "Z" + id[(i + 1) :]).apply_layout(
                    transpiled_circuit.layout
                )
                for i in range(num_qubits)
            ]

            # We define the primitive unified blocs (PUBs) consisting of the embedding circuit,
            # set of observables and the circuit parameters
            pub_x = (transpiled_circuit, observables_x)
            pub_y = (transpiled_circuit, observables_y)
            pub_z = (transpiled_circuit, observables_z)

            # We define and run the estimator to obtain <X>, <Y> and <Z> on all qubits
            estimator = Estimator(options=options)

            job = estimator.run([pub_x, pub_y, pub_z])
            jobs_train.append(job)

Training data progress: 100%|██████████| 172/172 [13:03<00:00,  4.55s/it]


Nous affichons la taille du Circuit et la profondeur en Gates à deux Qubits.

In [None]:
if run_experiment:
    for i in tqdm.tqdm(
        range(len(train_data)), desc="Retrieving training data results"
    ):
        # Completed job
        job = jobs_train[i]

        # Job results
        job_result_x = job.result()[0].data.evs
        job_result_y = job.result()[1].data.evs
        job_result_z = job.result()[2].data.evs

        # Record <X>, <Y> and <Z> on all qubits for the current datapoint
        projections_train.append([job_result_x, job_result_y, job_result_z])

We repeat this for the test set.

In [None]:
# Identity operator on all qubits
id = "I" * num_qubits

# projections_test[i][j][k] will be the expectation value of the j-th Pauli operator (0: X, 1: Y, 2: Z)
# of datapoint i on qubit k
projections_test = []
jobs_test = []

# Experiment options for error mitigation
num_randomizations = 300
shots_per_randomization = 100
noise_factors = [1, 3, 5]

experimental_opts = {}
experimental_opts["resilience"] = {
    "measure_mitigation": True,
    "zne_mitigation": True,
    "zne": {
        "noise_factors": noise_factors,
        "amplifier": "gate_folding",
        "return_all_extrapolated": True,
        "return_unextrapolated": True,
        "extrapolated_noise_factors": [0] + noise_factors,
    },
}
experimental_opts["twirling"] = {
    "num_randomizations": num_randomizations,
    "shots_per_randomization": shots_per_randomization,
    "strategy": "active-accum",
}
options = EstimatorOptions(experimental=experimental_opts)

if run_experiment:
    with Batch(backend=backend):
        for i in tqdm.tqdm(range(len(test_data)), desc="Test data progress"):
            # Get test sample
            parameters = test_data[i]

            # Bind parameter to the circuit and simplify it
            qc_bound = qc.assign_parameters(parameters)
            transpiler = generate_preset_pass_manager(
                optimization_level=3, basis_gates=["u3", "cz"]
            )
            transpiled_circuit = transpiler.run(qc_bound)

            # Transpile for hardware
            transpiler = generate_preset_pass_manager(
                optimization_level=3, target=target
            )
            transpiled_circuit = transpiler.run(transpiled_circuit)

            # We group all commuting observables
            # These groups are the Pauli X, Y and Z operators on individual qubits
            observables_x = [
                SparsePauliOp(id[:i] + "X" + id[(i + 1) :]).apply_layout(
                    transpiled_circuit.layout
                )
                for i in range(num_qubits)
            ]
            observables_y = [
                SparsePauliOp(id[:i] + "Y" + id[(i + 1) :]).apply_layout(
                    transpiled_circuit.layout
                )
                for i in range(num_qubits)
            ]
            observables_z = [
                SparsePauliOp(id[:i] + "Z" + id[(i + 1) :]).apply_layout(
                    transpiled_circuit.layout
                )
                for i in range(num_qubits)
            ]

            # We define the primitive unified blocs (PUBs) consisting of the embedding circuit,
            # set of observables and the circuit parameters
            pub_x = (transpiled_circuit, observables_x)
            pub_y = (transpiled_circuit, observables_y)
            pub_z = (transpiled_circuit, observables_z)

            # We define and run the estimator to obtain <X>, <Y> and <Z> on all qubits
            estimator = Estimator(options=options)

            job = estimator.run([pub_x, pub_y, pub_z])
            jobs_test.append(job)

Test data progress: 100%|██████████| 74/74 [00:13<00:00,  5.56it/s]


![Output of the previous code cell](../docs/images/tutorials/projected-quantum-kernels/extracted-outputs/4f573436-ec5c-451b-976c-ad718b3c201d-1.avif)

Nous pouvons maintenant parcourir l'ensemble de données d'entraînement complet pour obtenir toutes les 1-RDM.
Nous fournissons également les résultats d'une expérience que nous avons exécutée sur du matériel quantique. Vous pouvez soit exécuter l'entraînement vous-même en définissant l'indicateur ci-dessous sur `True`, soit utiliser les résultats de projection que nous fournissons.

In [None]:
if run_experiment:
    for i in tqdm.tqdm(
        range(len(test_data)), desc="Retrieving test data results"
    ):
        # Completed job
        job = jobs_test[i]

        # Job results
        job_result_x = job.result()[0].data.evs
        job_result_y = job.result()[1].data.evs
        job_result_z = job.result()[2].data.evs

        # Record <X>, <Y> and <Z> on all qubits for the current datapoint
        projections_test.append([job_result_x, job_result_y, job_result_z])

## Step 4: Post-process and return result in desired classical format

### Define the projected quantum kernel

The projected quantum kernel is defined with the following kernel function: $$k^{\textrm{PQ}}(x_i, x_j) = \textrm{exp} \Big(-\gamma \sum_k \sum_{P \in \{ X,Y,Z \}} (\textrm{Tr}[P \rho_k(x_i)] - \textrm{Tr}[P \rho_k(x_j)])^2 \Big) $$
In the above equation, $\gamma>0$ is a tunable hyperparameter. The $K^{\textrm{PQ}}_{ij} = k^{\textrm{PQ}}(x_i, x_j)$ are the entries of the kernel matrix $K^{\textrm{PQ}}$.

Using the definition of 1-RDMs, we can see that the individual terms within the kernel function can be evaluated as $\textrm{Tr}[P \rho_k (x_i)] = \braket P$, where $P \in \{ X,Y,Z \}$. These expectation values are precisely what we measured above.

By using ``scikit-learn``, we can in fact compute the kernel even more easily. This is due to the readily available radial basis function (``'rbf'``) kernel: $ \textrm{exp} (-\gamma \lVert x - x' \rVert^2)$. First, we simply need to reshape the new projected training and test datasets into two-dimensional arrays.

Note that going over the entire dataset can take about 80 minutes on the QPU. To make sure that the rest of the tutorial is easily executable, we additionally provide projections from a previously run experiment (which are included in the files you downloaded in the `Download dataset` code block). If you performed the training yourself, you can continue the tutorial with your own results.

In [None]:
if run_experiment:
    projections_train = np.array(projections_train).reshape(
        len(projections_train), -1
    )
    projections_test = np.array(projections_test).reshape(
        len(projections_test), -1
    )
else:
    projections_train = np.loadtxt("projections_train.txt")
    projections_test = np.loadtxt("projections_test.txt")

### Support Vector Machine (SVM)

We can now run a classical SVM on this precomputed kernel, and use the kernel between test and training sets for prediction.

In [10]:
# Range of 'C' and 'gamma' values as SVC hyperparameters
C_range = [0.001, 0.005, 0.007]
C_range.extend([x * 0.01 for x in range(1, 11)])
C_range.extend([x * 0.25 for x in range(1, 60)])
C_range.extend(
    [
        20,
        50,
        100,
        200,
        500,
        700,
        1000,
        1100,
        1200,
        1300,
        1400,
        1500,
        1700,
        2000,
    ]
)

gamma_range = ["auto", "scale", 0.001, 0.005, 0.007]
gamma_range.extend([x * 0.01 for x in range(1, 11)])
gamma_range.extend([x * 0.25 for x in range(1, 60)])
gamma_range.extend([20, 50, 100])

param_grid = dict(C=C_range, gamma=gamma_range)

# Support vector classifier
svc = SVC(kernel="rbf")

# Define the cross validation
cv = StratifiedKFold(n_splits=10)

# Grid search for hyperparameter tuning (q: quantum)
grid_search_q = GridSearchCV(
    svc, param_grid, cv=cv, verbose=1, n_jobs=-1, scoring="f1_weighted"
)
grid_search_q.fit(projections_train, train_labels)

# Best model with best parameters
best_svc_q = grid_search_q.best_estimator_
print(
    f"The best parameters are {grid_search_q.best_params_} with a score of {grid_search_q.best_score_:.4f}"
)

# Test accuracy
accuracy_q = best_svc_q.score(projections_test, test_labels)
print(f"Test accuracy with best model: {accuracy_q:.4f}")

Fitting 10 folds for each of 6622 candidates, totalling 66220 fits
The best parameters are {'C': 8.5, 'gamma': 0.01} with a score of 0.6980
Test accuracy with best model: 0.8108


Une fois les tâches terminées, nous pouvons récupérer les résultats.

In [11]:
# Support vector classifier
svc = SVC(kernel="rbf")

# Grid search for hyperparameter tuning (c: classical)
grid_search_c = GridSearchCV(
    svc, param_grid, cv=cv, verbose=1, n_jobs=-1, scoring="f1_weighted"
)
grid_search_c.fit(train_data, train_labels)

# Best model with best parameters
best_svc_c = grid_search_c.best_estimator_
print(
    f"The best parameters are {grid_search_c.best_params_} with a score of {grid_search_c.best_score_:.4f}"
)

# Test accuracy
accuracy_c = best_svc_c.score(test_data, test_labels)
print(f"Test accuracy with best model: {accuracy_c:.4f}")

Fitting 10 folds for each of 6622 candidates, totalling 66220 fits
The best parameters are {'C': 10.75, 'gamma': 0.04} with a score of 0.7830
Test accuracy with best model: 0.7432


Nous répétons cette opération pour l'ensemble de test.

In [12]:
# Gamma values used in best models above
gamma_c = grid_search_c.best_params_["gamma"]
gamma_q = grid_search_q.best_params_["gamma"]

# Regularization parameter used in the best classical model above
C_c = grid_search_c.best_params_["C"]
l_c = 1 / C_c

# Classical and quantum kernels used above
K_c = rbf_kernel(train_data, train_data, gamma=gamma_c)
K_q = rbf_kernel(projections_train, projections_train, gamma=gamma_q)

# Intermediate matrices in the equation
K_c_sqrt = sqrtm(K_c)
K_q_sqrt = sqrtm(K_q)
K_c_inv = inv(K_c + l_c * np.eye(K_c.shape[0]))
K_multiplication = (
    K_q_sqrt @ K_c_sqrt @ K_c_inv @ K_c_inv @ K_c_sqrt @ K_q_sqrt
)

# Geometric separation
norm = np.linalg.norm(K_multiplication, ord=np.inf)
g_cq = np.sqrt(norm)
print(
    f"Geometric separation between classical and quantum kernels is {g_cq:.4f}"
)

print(np.sqrt(len(train_data)))

Geometric separation between classical and quantum kernels is 1.5440
13.114877048604


Model complexity is defined as follows (M1 in [[2]](#references)):
$$ s_{K, \lambda}(N) = \sqrt{\frac{\lambda^2 \sum_{i=1}^N \sum_{j=1}^N (K+\lambda I)^{-2}_{ij} y_i y_j}{N}} + \sqrt{\frac{\sum_{i=1}^N \sum_{j=1}^N ((K+\lambda I)^{-1}K(K+\lambda I)^{-1})_{ij} y_i y_j}{N}}$$

In [13]:
# Model complexity of the classical kernel

# Number of training data
N = len(train_data)

# Predicted labels
pred_labels = best_svc_c.predict(train_data)
pred_matrix = np.outer(pred_labels, pred_labels)

# Intermediate terms
K_c_inv = inv(K_c + l_c * np.eye(K_c.shape[0]))

# First term
first_sum = np.sum((K_c_inv @ K_c_inv) * pred_matrix)
first_term = l_c * np.sqrt(first_sum / N)

# Second term
second_sum = np.sum((K_c_inv @ K_c @ K_c_inv) * pred_matrix)
second_term = np.sqrt(second_sum / N)

# Model complexity
s_c = first_term + second_term
print(f"Classical model complexity is {s_c:.4f}")

Classical model complexity is 1.3578


In [14]:
# Model complexity of the projected quantum kernel

# Number of training data
N = len(projections_train)

# Predicted labels
pred_labels = best_svc_q.predict(projections_train)
pred_matrix = np.outer(pred_labels, pred_labels)

# Regularization parameter used in the best classical model above
C_q = grid_search_q.best_params_["C"]
l_q = 1 / C_q

# Intermediate terms
K_q_inv = inv(K_q + l_q * np.eye(K_q.shape[0]))

# First term
first_sum = np.sum((K_q_inv @ K_q_inv) * pred_matrix)
first_term = l_q * np.sqrt(first_sum / N)

# Second term
second_sum = np.sum((K_q_inv @ K_q @ K_q_inv) * pred_matrix)
second_term = np.sqrt(second_sum / N)

# Model complexity
s_q = first_term + second_term
print(f"Quantum model complexity is {s_q:.4f}")

Quantum model complexity is 1.5806


## Étape 4 : Post-traitement et renvoi du résultat dans le format classique souhaité
### Définition du noyau quantique projeté
Le noyau quantique projeté est défini avec la fonction noyau suivante : $$k^{\textrm{PQ}}(x_i, x_j) = \textrm{exp} \Big(-\gamma \sum_k \sum_{P \in { X,Y,Z }} (\textrm{Tr}[P \rho_k(x_i)] - \textrm{Tr}[P \rho_k(x_j)])^2 \Big) $$
Dans l'équation ci-dessus, $\gamma>0$ est un hyperparamètre ajustable. Les $K^{\textrm{PQ}}_{ij} = k^{\textrm{PQ}}(x_i, x_j)$ sont les entrées de la matrice noyau $K^{\textrm{PQ}}$.
En utilisant la définition des 1-RDM, nous pouvons constater que les termes individuels de la fonction noyau peuvent être évalués comme $\textrm{Tr}[P \rho_k (x_i)] = \braket P$, où $P \in { X,Y,Z }$. Ces valeurs d'espérance sont précisément ce que nous avons mesuré ci-dessus.
En utilisant ``scikit-learn``, nous pouvons en fait calculer le noyau encore plus facilement. Cela est dû à la fonction noyau à base radiale (``'rbf'``) disponible directement : $ \textrm{exp} (-\gamma \lVert x - x' \rVert^2)$. Il nous suffit d'abord de reformater les nouveaux jeux de données d'entraînement et de test projetés en tableaux bidimensionnels.
Notez que le traitement de l'ensemble du jeu de données peut prendre environ 80 minutes sur le QPU. Pour garantir que la suite du tutoriel soit facilement exécutable, nous fournissons également des projections issues d'une expérience précédente (qui sont incluses dans les fichiers que vous avez téléchargés dans le bloc de code `Download dataset`). Si vous avez effectué l'entraînement vous-même, vous pouvez poursuivre le tutoriel avec vos propres résultats.