# QuartumSE Benchmark Suite

This notebook demonstrates the **unified benchmark suite** API for running publication-grade benchmarks.

## The 8 Tasks (Measurements Bible)

| Task | Name | Question |
|------|------|----------|
| 1 | Worst-Case Guarantee | What N* achieves max SE ≤ ε for all observables? |
| 2 | Average Target | What N* achieves mean SE ≤ ε? |
| 3 | Fixed Budget | What is the SE distribution at fixed N? |
| 4 | Dominance | Which protocol wins on more observables? |
| 5 | Pilot Selection | How much budget for pilot vs main run? |
| 6 | Bias-Variance | How does MSE decompose into bias² + variance? |
| 7 | Noise Sensitivity | How does performance degrade with noise? |
| 8 | Adaptive Efficiency | Can we reallocate budget based on pilot? |

## Benchmark Modes

| Mode | What it runs | Use case |
|------|--------------|----------|
| `basic` | Tasks 1, 3, 6 + basic report | Quick sanity check |
| `complete` | All 8 tasks + complete report | Publication benchmark |
| `analysis` | Complete + enhanced analysis | Deep dive with statistics |

All results are saved with **unique timestamped directories** - no overwrites!

In [None]:
# --- Setup ---
import sys
sys.path.insert(0, '../src')

from qiskit import QuantumCircuit

from quartumse import (
    run_benchmark_suite,
    BenchmarkMode,
    BenchmarkSuiteConfig,
    generate_observable_set,
    Observable,
    ObservableSet,
)

print("Setup complete!")

---

## 1. Define Circuit and Observables

You can use **any Qiskit circuit** and **any set of observables**.

In [None]:
# --- Circuit ---
N_QUBITS = 4

def build_ghz(n_qubits: int) -> QuantumCircuit:
    """Build GHZ state preparation circuit."""
    qc = QuantumCircuit(n_qubits)
    qc.h(0)
    for i in range(1, n_qubits):
        qc.cx(i - 1, i)
    return qc

circuit = build_ghz(N_QUBITS)
print(circuit.draw('text'))

In [None]:
# --- Observables ---
# Generate random observables with mixed localities
observables = []
for k in range(1, N_QUBITS + 1):
    obs_set = generate_observable_set(
        generator_id='random_pauli',
        n_qubits=N_QUBITS,
        n_observables=5,
        seed=42 + k,
        weight_distribution='fixed',
        fixed_weight=k,
    )
    observables.extend(list(obs_set.observables))

# Add GHZ stabilizers
observables.extend([
    Observable('Z' * N_QUBITS),
    Observable('X' * N_QUBITS),
])

obs_set = ObservableSet(
    observables=observables,
    observable_set_id='benchmark_observables',
    generator_id='mixed',
    generator_seed=42,
)

# Build locality map for analysis mode
locality_map = {}
for obs in observables:
    locality = sum(1 for c in obs.pauli_string if c != 'I')
    locality_map[obs.observable_id] = locality

print(f"Generated {len(obs_set)} observables")
from collections import Counter
loc_counts = Counter(locality_map.values())
for k in sorted(loc_counts.keys()):
    print(f"  K={k}: {loc_counts[k]} observables")

---

## 2. Basic Benchmark

Quick sanity check with Tasks 1, 3, 6.

In [None]:
%%time
# --- Basic Benchmark ---
config = BenchmarkSuiteConfig(
    mode=BenchmarkMode.BASIC,
    n_shots_grid=[100, 500, 1000],
    n_replicates=5,  # Fewer for quick test
    seed=42,
    output_base_dir="benchmark_results",
)

result = run_benchmark_suite(
    circuit=circuit,
    observable_set=obs_set,
    circuit_id="ghz_4q",
    config=config,
)

In [None]:
# --- View Basic Results ---
print(f"Run ID: {result.run_id}")
print(f"Output: {result.output_dir}")
print(f"Reports: {list(result.reports.keys())}")
print()
print("Protocol summaries:")
for protocol, stats in result.summary.get('protocol_summaries', {}).items():
    print(f"  {protocol}: mean_se={stats.get('mean_se', 0):.4f}")

---

## 3. Complete Benchmark (All 8 Tasks)

Publication-grade benchmark with all tasks.

In [None]:
%%time
# --- Complete Benchmark ---
config = BenchmarkSuiteConfig(
    mode=BenchmarkMode.COMPLETE,
    n_shots_grid=[100, 500, 1000, 5000],
    n_replicates=10,
    seed=42,
    epsilon=0.01,
    delta=0.05,
    output_base_dir="benchmark_results",
)

result = run_benchmark_suite(
    circuit=circuit,
    observable_set=obs_set,
    circuit_id="ghz_4q",
    config=config,
)

In [None]:
# --- View Complete Results ---
print(f"Run ID: {result.run_id}")
print(f"All tasks completed: {len(result.all_task_results or {})}")
print()
if result.all_task_results:
    for task_id in sorted(result.all_task_results.keys()):
        print(f"  {task_id}")

---

## 4. Full Analysis Mode

Complete benchmark + enhanced analysis:
- N* interpolation (power-law fitting)
- Per-observable crossover
- Locality correlation
- Bootstrap hypothesis testing
- Cost-normalized metrics
- Multi-pilot fraction analysis

In [None]:
%%time
# --- Full Analysis ---
config = BenchmarkSuiteConfig(
    mode=BenchmarkMode.ANALYSIS,
    n_shots_grid=[100, 500, 1000, 5000],
    n_replicates=20,
    seed=42,
    epsilon=0.01,
    delta=0.05,
    shadows_protocol_id="classical_shadows_v0",
    baseline_protocol_id="direct_grouped",
    output_base_dir="benchmark_results",
)

result = run_benchmark_suite(
    circuit=circuit,
    observable_set=obs_set,
    circuit_id="ghz_4q",
    config=config,
    locality_map=locality_map,
)

In [None]:
# --- Analysis Summary ---
print(f"Run ID: {result.run_id}")
print(f"Output: {result.output_dir}")
print()

if result.analysis:
    print("Analysis Summary:")
    for key, value in result.analysis.summary.items():
        if isinstance(value, float):
            print(f"  {key}: {value:.4f}")
        else:
            print(f"  {key}: {value}")

In [None]:
# --- Display Reports ---
from IPython.display import display, Markdown

print("Generated Reports:")
for name, path in result.reports.items():
    print(f"  {name}: {path}")

# Display the analysis report
if 'analysis' in result.reports:
    report_content = result.reports['analysis'].read_text()
    display(Markdown(report_content))

---

## 5. Custom Circuit Example

Use any circuit you want!

In [None]:
# --- Custom Circuit Example ---
# Example: Random circuit
import numpy as np

def build_random_circuit(n_qubits: int, depth: int, seed: int = 42) -> QuantumCircuit:
    """Build a random circuit."""
    rng = np.random.default_rng(seed)
    qc = QuantumCircuit(n_qubits)
    
    for _ in range(depth):
        # Random single-qubit gates
        for q in range(n_qubits):
            gate = rng.choice(['h', 'x', 'y', 'z', 's', 't'])
            getattr(qc, gate)(q)
        
        # Random CNOTs
        for q in range(0, n_qubits - 1, 2):
            if rng.random() > 0.5:
                qc.cx(q, q + 1)
    
    return qc

custom_circuit = build_random_circuit(4, 3)
print(custom_circuit.draw('text'))

# You can now run:
# result = run_benchmark_suite(custom_circuit, obs_set, circuit_id="random_4q_d3", config=config)

---

## Summary

The unified benchmark suite provides:

1. **Single entry point**: `run_benchmark_suite(circuit, observables)`
2. **Three modes**: basic, complete, analysis
3. **Automatic timestamped directories**: No overwrites
4. **Appropriate reports for each mode**:
   - Basic: `basic_report.md`
   - Complete: `complete_report.md` (all 8 tasks)
   - Analysis: `analysis_report.md` + `analysis.json`

All results are fully reproducible with seeded randomness and provenance tracking.