# Quantum Approximate Optimization Algorithm (QAOA) — Detailed Notes (Session 8)
**Course:** CS490/5590 — Quantum Computing Applications in Data Science, AI, & Deep Learning  
**Instructor:** Luke Miller  

> **Purpose.**  These notes expand the Session-8 slides, giving a stand-alone explanation of QAOA, its implementation in Qiskit, and its relevance to combinatorial-optimization tasks in data science. Mini-exercises with concise solutions appear at the end.

---

## Session Road-map  

1. Recap: Phase estimation vs. variational depth  
2. Combinatorial optimisation landscape  
3. QAOA foundations — cost & mixer Hamiltonians  
4. Worked example: MaxCut on a 3-vertex graph  
5. Variational parameter optimisation loop  
6. QAOA in Qiskit (QuadraticProgram → circuit)  
7. Performance, noise, barren plateaus  
8. AI / DS use-cases  
9. Q&A  

---

## 0) Why QAOA after Shor/QPE?

| Method | Strength | Weakness |
|--------|----------|----------|
| **QPE/Shor** | Provable speedups | Requires deep, error-corrected circuits |
| **QAOA** | Shallow variational ansatz, NISQ-friendly | No proven exponential speedup; classical optimiser overhead |

---

## 1) Combinatorial optimisation primer  

| Problem | Objective (Ising form) | Typical ML link |
|---------|-----------------------|-----------------|
| MaxCut | maximise $\sum_{(i,j)\in E}\tfrac{1}{2}(1-Z_iZ_j)$ | Graph clustering / community detection |
| TSP | minimise path length (sum) | Routing, sequence alignment |
| Knapsack | maximise $\sum v_i x_i$ s.t. $\sum w_i x_i \le W$ | Feature subset selection |

Classically NP-hard → heuristics or relaxations. Quantum: hope QAOA + quantum sampling gives better approximation for certain instances.

---

## 2) QAOA overview  

- **Input**: problem encoded as cost Hamiltonian $H_C$ (diagonal in computational basis).  
- **Ansatz** of depth $p$:

$$
|\psi_{p}(\boldsymbol{\gamma},\boldsymbol{\beta})\rangle
  := \left[\prod_{k=1}^{p}e^{-i\beta_k H_B}e^{-i\gamma_k H_C}\right]\,|+\rangle^{\otimes n},
$$
  where $H_B=\sum_i X_i$.

- **Objective**: maximise $F(\boldsymbol{\gamma},\boldsymbol{\beta})=\langle\psi_p|H_C|\psi_p\rangle$.

- **Hybrid loop**: evaluate $F$ via sampling; classical optimiser updates parameters.

- **Depth** $p$ ↔ quality; $p=1$ often competitive with simple heuristics.

---

## 3) MaxCut: canonical benchmark  

### 3.1 Encoding  

For unweighted graph $G=(V,E)$:

$$
H_C = \frac12\sum_{(i,j)\in E}(I-Z_iZ_j).
$$

Cut value = expectation of $H_C$.

### 3.2 Cost-unitary  

Each edge $(i,j)$:

$$
e^{-i\gamma (I-Z_iZ_j)/2} =
  \bigl(\mathrm{CZ}_{ij}\bigr)\,R_Z^{(i)}(\gamma)\,R_Z^{(j)}(\gamma)\,\mathrm{CZ}_{ij},
$$
where $\mathrm{CZ}$ is controlled-Z, $R_Z(\gamma)=e^{-i\gamma Z/2}$.

### 3.3 Mixer-unitary  

Single-qubit $R_X(2\beta)$ on all qubits.

---

## 4) Worked example: triangle graph (3 vertices) with $p=1$

```python
from qiskit import QuantumCircuit
from math import pi

gamma, beta = 0.7, 0.34
qc = QuantumCircuit(3)

# 1. |+> state
qc.h(range(3))

# 2a. Cost unitary (edges 01, 12, 02)
edges = [(0,1), (1,2), (0,2)]
for i,j in edges:
    qc.cx(i,j)
    qc.rz(-2*gamma, j)
    qc.cx(i,j)

# 2b. Mixer unitary
qc.rx(2*beta, range(3))

qc.measure_all()
print(qc.draw('text'))
```
With $(\gamma,\beta)$ near optimum, sampling shows cut value ≈ 1.5 (classical optimum 2).

---

## 5) Parameter-optimisation strategies  

| Optimiser | Pros | Cons |
|-----------|------|------|
| COBYLA | Derivative-free, robust | Many iterations |
| SPSA | Low shots per step | Sensitive to hyper-params |
| Bayesian (Qiskit Aqua) | Sample-efficient | Overhead grows with dim |

**Warm-starting**: use analytic solutions for small graphs or interpolation as $n$ grows; mitigates barren plateaus.

---

## 6) QAOA in Qiskit  

```python
from qiskit_optimization.applications import Maxcut
from qiskit_optimization.algorithms import MinimumEigenOptimizer
from qiskit.algorithms import QAOA
from qiskit.algorithms.optimizers import COBYLA
from qiskit.primitives import Sampler

# define weighted graph
w = [[0,1,1],[1,0,1],[1,1,0]]
problem = Maxcut(w).to_quadratic_program()

qaoa = QAOA(Sampler(), optimizer=COBYLA(maxiter=50), reps=1)
solver = MinimumEigenOptimizer(qaoa)
result = solver.solve(problem)
print("MaxCut value:", Maxcut.max_cut_value(result.x, w))
```

For NISQ back-end: wrap `Sampler(backend=provider.get_backend('ibmq_quito'), resilience_level=1)` and rely on built-in read-out mitigation.

---

## 7) Performance considerations  

- **Approximation ratio** improves with $p$; analytic bound $p=1$ ≥ 0.5 for rings; empirically 0.7–0.9 on random graphs.  
- **Noise**: depth ∝ $p|E|$. On 15-edge graph, $p=3$ uses ~100 two-qubit gates → feasible with error-mitigation.  
- **Barren plateaus**: gradient variance $\sim e^{-n}$ for generic ansatz; structured QAOA appears less susceptible at small $p$.

---

## 8) AI / DS use-cases  

| Task | QAOA role |
|------|-----------|
| **Graph clustering** | MaxCut / community detection as cost Hamiltonian. |
| **Feature selection** | Binary string selects features; cost = validation loss + penalty. |
| **Portfolio optimisation** | QUBO encoding risk-return trade-off. |
| **Neural architecture search** | Discrete hyper-parameters mapped to Ising variables. |

**Hybrid flow**: classical pre-screening → QAOA on reduced instance → classical post-processing.

---

## 9) Mini-exercises (answers in Appendix)

1. Show that for a single edge, $e^{-i\gamma Z_iZ_j/2}$ equals a CX-sandwiched $R_Z$ rotation as used above.  
2. Derive analytic optimum $(\gamma^*,\beta^*)$ for MaxCut on a 2-vertex graph.  
3. Implement QAOA $p=2$ on a square (4 vertices, ring) in Qiskit; report approximation ratio.  
4. Explain why increasing $p$ mitigates barren plateaus on small graphs but may re-introduce them on large $n$.  
5. Using depolarising error $p=0.01$ per CNOT, estimate success-probability loss for $p=3$ QAOA on triangle (9 CNOTs).

---

## 10) FAQ  

- **“Why choose $H_B=\sum X_i$?** For unconstrained problems binary-domain $±1$, this mixer keeps uniform exploration. Constrained problems need custom mixers (e.g., XY-mixer for fixed-weight).  
- **“Does QAOA guarantee better than classical?”** Not proved; for some graphs $p=1$ equals classical random cut, but $p\to\infty$ approaches optimum.  
- **“How to initialise parameters?”** Grid search for $p=1$; Fourier heuristic or interpolation for larger $p$.  
- **“What about gradient-based optimisation?”** Parameter-shift rule works but may suffer from shot noise.  

---

## 11) Summary (Session 8)

- QAOA alternates cost-driven and mixer evolutions; depth $2p$ layers.  
- MaxCut serves as canonical benchmark; cost unitary uses ZZ rotations; mixer uses RX rotations.  
- Hybrid loop tunes $\gamma,\beta$ to maximise expected cost; classical optimiser is key.  
- Shallow circuits make QAOA compatible with NISQ, but noise and barren plateaus remain hurdles.  
- Practical applications span graph partitioning, feature selection, portfolio optimisation.

---

## 12) Looking ahead  

- **Next Session:** Variational Quantum Eigensolver (VQE) for chemistry and kernel-based QML.  
- **Homework 3:** formulate feature-selection QUBO, implement $p=2$ QAOA, compare to greedy feature ranking.

---

## Appendix — mini-exercise solutions (sketch)

1. Conjugation identity: $CX\,(I\otimes R_Z(\phi))\,CX = e^{-i\phi Z_iZ_j/2}$.  
2. Two-vertex graph: optimum $\gamma^*=\pi/8,\; \beta^*=\pi/4$ achieves ratio 1.  
3. Simulation shows ratio ≥ 0.92 for ring-4 at $p=2$ with tuned params.  
4. Parameter count $2p$ grows; landscape flattens for random initialisation on large $n$.  
5. Success drop ≈ $1-(1-p)^{9}\approx 8.6\,\%$.

