# Quantum Kernels & QSVMs — Detailed Notes (Session 10)
**Course:** CS490/5590 — Quantum Computing Applications in Data Science, AI, & Deep Learning  
**Instructor:** Luke Miller  

> **Purpose.**  These notes expand the Session-10 slides into a stand-alone reference. We review classical kernel methods, build quantum feature-map circuits, derive the quantum-kernel estimator, and show how to train a Quantum Support Vector Machine (QSVM) in Qiskit Machine Learning.  Mini-exercises with short answers appear at the end.

---

## Session Road-map  

1. Quick recap: VQE & variational flavour  
2. Kernel methods refresher (SVM dual form)  
3. Quantum feature maps & kernel definition  
4. Quantum-kernel estimation circuits  
5. QSVM training & inference workflow  
6. Implementation in Qiskit (ZZFeatureMap demo)  
7. Performance, noise, and scaling issues  
8. AI / DS applications & open research questions  
9. Q&A  

---

## 0) Recap — why quantum kernels?

| Variational algorithm | Target | Depth profile | ML hook |
|-----------------------|--------|---------------|---------|
| **VQE** | ground-state energy minimisation | shallow, adaptive | state prep for kernels |
| **QAOA** | Ising optimisation | shallow, layered | feature-map inspiration |
| **QSVM** | supervised classification | shallow (feature map only) | kernel trick in Hilbert space |

Quantum kernels shift focus **from** finding optimal parameters **to** embedding data into a hard-to-simulate space where *classical* optimisation (SVM) finishes the job.

---

## 1) Classical kernel methods refresher  

- **Feature map** $\phi:\mathbb{R}^d\to\mathcal{H}$.  
- **Kernel** $K(x,x')=\langle\phi(x),\phi(x')\rangle$.  
- Dual SVM optimisation (hard margin):
  $$
  \max_{\boldsymbol{\alpha}}\; \sum_i\alpha_i -\frac12\sum_{i,j}\alpha_i\alpha_j y_i y_j K(x_i,x_j)
  \quad\text{s.t. } \alpha_i\ge0, \, \sum_i \alpha_i y_i=0.
  $$
- Solution uses kernel matrix $K$ only; no explicit $\phi(x)$ needed → **kernel trick**.

---

## 2) Quantum feature maps  

### 2.1 Definition  
Prepare $n$-qubit state
$$
|\phi(x)\rangle = U_\phi(x)\,|0\rangle^{\otimes n},
$$
where $U_\phi(x)$ is a data-dependent circuit (often diagonal in computational basis with entangling terms).

### 2.2 Examples  

| Map | Circuit sketch | Non-linearity source |
|-----|----------------|----------------------|
| **ZZFeatureMap** | layers of $R_Z(x_i)$ + entangling $R_{ZZ}(x_i x_j)$ | pairwise products & entanglement |
| **Pauli FeatureMap** | generalised $e^{-i\sum f_k(x) P_k}$ | higher-order monomials |
| **ZFeatureMap** | only single-qubit Z rotations | linear baseline |

Feature-map depth = layers × (single + 2-qubit gates) — typically ≤ 3 on NISQ devices.

---

## 3) Quantum kernel  

$$
K(x,x') = |\langle\phi(x)|\phi(x')\rangle|^2
        = \bigl|\langle 0|\,U_\phi^\dagger(x) U_\phi(x')\,|0\rangle\bigr|^2.
$$

**Estimation circuit**  
```
|0…0> ──Uφ(x')──Uφ†(x)── Measure Z…Z
```
Probability of all-zero outcome equals $K(x,x')$. Sampling with $M$ shots gives estimator variance $K(1-K)/M$.

---

## 4) QSVM workflow  

1. **Choose feature map** $U_\phi$.  
2. **Estimate kernel matrix** $K_{ij}=K(x_i,x_j)$ for training set (size $N$ → $N(N+1)/2$ circuits).  
3. **Train classical SVM** (e.g., `sklearn.svm.SVC`) with `kernel='precomputed'`.  
4. **Predict** new sample $x\_*$: estimate kernel vector $K(x\_*,x_i)$ against support vectors.

---

## 5) Qiskit implementation demo (iris binary subset)

```python
from qiskit import BasicAer
from qiskit_machine_learning.kernels import QuantumKernel
from qiskit.circuit.library import ZZFeatureMap
from sklearn.svm import SVC
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn.datasets import load_iris
import numpy as np

# --- dataset: 2-class subset of iris ---
X, y = load_iris(return_X_y=True)
X = X[y != 2][:, :2]         # dim=2 for small feature map
y = y[y != 2]
X = StandardScaler().fit_transform(X)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=.3)

# --- quantum kernel ---
num_qubits = 2
feature_map = ZZFeatureMap(num_qubits)
backend = BasicAer.get_backend('statevector_simulator')
quantum_kernel = QuantumKernel(feature_map=feature_map, quantum_instance=backend)

# --- QSVM ---
svm = SVC(kernel=quantum_kernel.evaluate)
svm.fit(X_train, y_train)
pred = svm.predict(X_test)
print("QSVM accuracy:", accuracy_score(y_test, pred))
```

Replace `statevector_simulator` with `AerSimulator(noise_model=…)` or hardware backend for realistic runs.

---

## 6) Performance & noise  

| Factor | Effect | Mitigation |
|--------|--------|------------|
| Two-qubit error | Lowers kernel fidelity → bias | Dynamical decoupling, gate-folding ZNE |
| Shot noise | Kernel variance | Importance-weighted shot allocation |
| $O(N^2)$ circuits | Training bottleneck | Batching, kernel interpolation, random subsampling |
| Effective dimension | Overfitting risk | Regularisation $C$ in SVM |

**Research frontier:** identify data distributions where quantum feature map yields provably hard-to-simulate kernels (Havlíček et al. 2019: “feature-map hardness conjecture”).

---

## 7) Mini-exercises (answers in Appendix)

1. Show $\langle\phi(x)|\phi(x)\rangle=1$ ⇒ diagonal of kernel matrix is 1.  
2. Implement ZFeatureMap ($U_\phi(x)=\prod_i R_Z(x_i)H$) and compare QSVM accuracy to linear SVM on iris subset.  
3. For a 3-qubit ZZFeatureMap with depth 2, count number of two-qubit gates.  
4. Explain how read-out error biases kernel estimate and propose correction.  
5. Under depolarising error $p=0.01$ per two-qubit gate, approximate kernel shrinkage factor for depth-2 ZZFeatureMap on 3 qubits.

---

## 8) FAQ  

- **“Do I still need a classical optimiser?”** Not for QSVM training; optimiser is in classical SVM solver, but feature-map parameters (if any) might need tuning.  
- **“Why probability squared?”** Using overlap squared ensures PSD kernel; amplitude overlaps alone can be negative.  
- **“Can I reuse shots across pairs?”** No—each $U_\phi^\dagger(x_i)U_\phi(x_j)$ needs its own circuit; but batching on hardware reduces calibration overhead.  
- **“What about multiclass?”** Use one-vs-rest SVM or quantum multiclass strategies.

---

## 9) Summary (Session 10)

- **Quantum kernels** embed data via feature-map circuits; similarity is squared state overlap.  
- **QSVM** pairs quantum kernel estimation with classical SVM optimisation.  
- Implementation in Qiskit: `ZZFeatureMap` + `QuantumKernel` + `SVC`.  
- Potential advantage: highly non-linear embeddings in $2^n$-dimensional Hilbert space with few qubits, though noise and $O(N^2)$ circuit scaling are practical hurdles.  
- Research aims to pinpoint regimes where quantum kernels beat RBF and polynomial kernels.

---

## 10) Looking ahead  

- **Next Session:** Quantum Feature Selection & data re-uploading circuits.  
- **Homework 3 tasks:**  
  - Evaluate QSVM vs RBF on moons dataset (noise-free & noisy).  
  - Implement shot-adaptive kernel estimator.  
  - Mini-exercise write-ups.

---

## Appendix — mini-exercise solutions (sketch)

1. Normalisation: $|\langle\phi(x)|\phi(x)\rangle|^2 = 1$ ⇒ $K_{ii}=1$.  
2. Linear SVM accuracy 0.93; ZFeatureMap QSVM 0.93 (no gain) → need entangling terms for advantage.  
3. Depth-2, 3 qubits: each layer entangles pairs (3 choose 2) = 3; total 6 two-qubit gates.  
4. Read-out flips 0↔1 with matrix $A$; invert $A$ on probability vector before kernel computation.  
5. Each two-qubit gate shrinks off-diagonal elements by $(1-2p)$; with 6 gates, factor ≈ $(0.98)^6≈0.886$.

