# QuartumSE Complete Benchmark Suite

**The canonical benchmark notebook** for classical shadows vs direct measurement.

This notebook consolidates ALL benchmarking functionality:
- All 8 Measurements Bible tasks
- All circuits from research workstreams (S, C, O, B, M)
- Enhanced statistical analysis (bootstrap, K-S tests, crossover)
- Optional noise sensitivity sweep (Task 7)
- Locality breakdown and cost-normalized metrics
- Cross-circuit consolidated comparison

## Research Workstreams

| Workstream | Focus | Circuits |
|------------|-------|----------|
| **S** | Shadows Core | GHZ, Bell pairs, Clifford, Ising |
| **C** | Chemistry | H2, LiH, BeH2 molecular ansatze |
| **O** | Optimization | QAOA MAX-CUT |
| **B** | Benchmarking | RB/XEB random circuits |
| **M** | Metrology | GHZ phase sensing |

In [1]:
# =============================================================================
# SETUP
# =============================================================================
import sys
sys.path.insert(0, '../src')

import numpy as np
from collections import Counter, defaultdict
from qiskit import QuantumCircuit
from scipy import stats
import json
from pathlib import Path
from datetime import datetime

from quartumse import (
    run_benchmark_suite,
    BenchmarkMode,
    BenchmarkSuiteConfig,
    Observable,
    ObservableSet,
)

# NEW: Import suite classes and builders
from quartumse.observables.suites import (
    ObservableSuite,
    ObjectiveType,
    SuiteType,
    # Circuit-specific suite builders
    make_ghz_suites,
    make_bell_suites,
    make_ising_suites,
    make_qaoa_ring_suites,
    make_phase_sensing_suites,
    make_chemistry_suites,
    # Generic builders
    make_stress_suite,
    make_posthoc_library,
    make_commuting_suite,
)

print("Setup complete!")
print("Suite types available:", [t.value for t in SuiteType])

Setup complete!
Suite types available: ['workload', 'stress', 'posthoc', 'commuting', 'diagnostic']


---

## 1. Circuit and Observable Definitions

In [2]:
# =============================================================================
# CIRCUIT BUILDERS
# =============================================================================

def build_ghz(n_qubits: int) -> QuantumCircuit:
    """GHZ state: |00...0> + |11...1> / sqrt(2)"""
    qc = QuantumCircuit(n_qubits, name=f'GHZ_{n_qubits}q')
    qc.h(0)
    for i in range(1, n_qubits):
        qc.cx(i - 1, i)
    return qc

def build_bell_pairs(n_pairs: int) -> QuantumCircuit:
    """Parallel Bell pairs."""
    n_qubits = 2 * n_pairs
    qc = QuantumCircuit(n_qubits, name=f'Bell_{n_pairs}pairs')
    for i in range(n_pairs):
        qc.h(2 * i)
        qc.cx(2 * i, 2 * i + 1)
    return qc

def build_random_clifford(n_qubits: int, depth: int, seed: int = 42) -> QuantumCircuit:
    """Random Clifford circuit."""
    rng = np.random.default_rng(seed)
    qc = QuantumCircuit(n_qubits, name=f'Clifford_{n_qubits}q_d{depth}')
    clifford_gates = ['h', 's', 'sdg', 'x', 'y', 'z']
    for _ in range(depth):
        for q in range(n_qubits):
            gate = rng.choice(clifford_gates)
            getattr(qc, gate)(q)
        for q in range(0, n_qubits - 1, 2):
            if rng.random() > 0.3:
                qc.cx(q, q + 1)
    return qc

def build_ising_trotter(n_qubits: int, steps: int = 3, dt: float = 0.5) -> QuantumCircuit:
    """Trotterized transverse-field Ising model."""
    qc = QuantumCircuit(n_qubits, name=f'Ising_{n_qubits}q_t{steps}')
    J, h = 1.0, 0.5
    for q in range(n_qubits):
        qc.h(q)
    for _ in range(steps):
        for q in range(n_qubits - 1):
            qc.cx(q, q + 1)
            qc.rz(2 * J * dt, q + 1)
            qc.cx(q, q + 1)
        for q in range(n_qubits):
            qc.rx(2 * h * dt, q)
    return qc

def build_h2_ansatz(theta: float = 0.5) -> QuantumCircuit:
    """H2 molecule ansatz (4 qubits)."""
    qc = QuantumCircuit(4, name='H2_ansatz')
    qc.x(0); qc.x(1)
    qc.cx(1, 2); qc.ry(theta, 2); qc.cx(1, 2)
    qc.cx(0, 3); qc.ry(theta / 2, 3); qc.cx(0, 3)
    return qc

def build_lih_ansatz(theta: float = 0.5) -> QuantumCircuit:
    """LiH molecule ansatz (6 qubits)."""
    qc = QuantumCircuit(6, name='LiH_ansatz')
    for i in range(4): qc.x(i)
    qc.cx(3, 4); qc.ry(theta, 4); qc.cx(3, 4)
    qc.cx(2, 5); qc.ry(theta / 2, 5); qc.cx(2, 5)
    return qc

def build_beh2_ansatz(theta: float = 0.5) -> QuantumCircuit:
    """BeH2 molecule ansatz (8 qubits)."""
    qc = QuantumCircuit(8, name='BeH2_ansatz')
    for i in range(6): qc.x(i)
    qc.cx(5, 6); qc.ry(theta, 6); qc.cx(5, 6)
    qc.cx(4, 7); qc.ry(theta / 2, 7); qc.cx(4, 7)
    return qc

def build_qaoa_maxcut_ring(n_qubits: int, p: int = 1, gamma: float = 0.5, beta: float = 0.5) -> QuantumCircuit:
    """QAOA for MAX-CUT on ring graph."""
    qc = QuantumCircuit(n_qubits, name=f'QAOA_ring_{n_qubits}q_p{p}')
    for q in range(n_qubits): qc.h(q)
    for _ in range(p):
        for q in range(n_qubits):
            q_next = (q + 1) % n_qubits
            qc.cx(q, q_next); qc.rz(2 * gamma, q_next); qc.cx(q, q_next)
        for q in range(n_qubits): qc.rx(2 * beta, q)
    return qc

def build_ghz_phase_sensing(n_qubits: int, phi: float = 0.1) -> QuantumCircuit:
    """GHZ state with phase encoding."""
    qc = build_ghz(n_qubits)
    qc.name = f'GHZ_phase_{n_qubits}q'
    for q in range(n_qubits): qc.rz(phi, q)
    return qc

def build_xeb_circuit(n_qubits: int, depth: int, seed: int = 42) -> QuantumCircuit:
    """Cross-Entropy Benchmarking random circuit."""
    rng = np.random.default_rng(seed)
    qc = QuantumCircuit(n_qubits, name=f'XEB_{n_qubits}q_d{depth}')
    gates_1q = ['h', 'x', 'y', 'z', 's', 't', 'sdg', 'tdg']
    for d in range(depth):
        for q in range(n_qubits):
            getattr(qc, rng.choice(gates_1q))(q)
        for q in range(d % 2, n_qubits - 1, 2):
            qc.cx(q, q + 1)
    return qc

print("Circuit builders defined!")

Circuit builders defined!


In [3]:
# =============================================================================
# OBSERVABLE SUITES (from quartumse.observables.suites module)
# =============================================================================
# 
# The suite system provides task-aligned observable sets for each circuit family.
# Each circuit gets multiple suites:
#   - workload: What practitioners actually measure (energy, cost, fidelity)
#   - stress: Large sets (1000+) for testing protocol scaling  
#   - commuting: All-commuting baselines (where grouped measurement wins)
#   - posthoc: Libraries for "measure once, query later" tests
#
# Suite builders:
#   make_ghz_suites(n)           -> stabilizers, stress, commuting, posthoc
#   make_bell_suites(n_pairs)    -> pair correlations, diagnostics, stress
#   make_ising_suites(n)         -> energy (weighted), correlations, stress
#   make_qaoa_ring_suites(n)     -> cost (weighted, with wrap edge!), stress, posthoc
#   make_phase_sensing_suites(n) -> phase signal (X^n, Y^n), stabilizers, stress
#   make_chemistry_suites(n)     -> energy (weighted), stress

# Demo: Show what suites are generated for each circuit type
print("="*70)
print("AVAILABLE SUITES BY CIRCUIT TYPE")
print("="*70)

demo_configs = [
    ("GHZ-4", make_ghz_suites(4)),
    ("Bell-2pairs", make_bell_suites(2)),
    ("Ising-4", make_ising_suites(4)),
    ("QAOA-5-ring", make_qaoa_ring_suites(5)),
    ("Phase-3", make_phase_sensing_suites(3)),
]

for name, suites in demo_configs:
    print(f"\n{name}:")
    for suite_name, suite in suites.items():
        obj = "weighted" if suite.objective == ObjectiveType.WEIGHTED_SUM else "per-obs"
        comm = suite.commutation_analysis()
        comm_str = "FULLY COMMUTING" if comm['fully_commuting'] else f"{comm['n_commuting_groups']} groups"
        print(f"  {suite_name:30s} | {suite.n_observables:4d} obs | {obj:8s} | {comm_str}")

print("\n" + "="*70)
print("KEY INSIGHT: Commuting suites (e.g., QAOA cost) favor grouped measurement")
print("             Non-commuting suites (e.g., stress) may favor shadows")
print("="*70)

AVAILABLE SUITES BY CIRCUIT TYPE

GHZ-4:
  workload_stabilizers           |    7 obs | per-obs  | 2 groups
  stress_random_1000             |  255 obs | per-obs  | 81 groups
  commuting_z_only               |   11 obs | per-obs  | FULLY COMMUTING
  posthoc_library                |  255 obs | per-obs  | 81 groups

Bell-2pairs:
  workload_pair_correlations     |    6 obs | per-obs  | 3 groups
  diagnostics_single_qubit       |    4 obs | per-obs  | FULLY COMMUTING
  diagnostics_cross_pair         |    1 obs | per-obs  | FULLY COMMUTING
  stress_random_1000             |  255 obs | per-obs  | 81 groups

Ising-4:
  workload_energy                |    7 obs | weighted | 2 groups
  workload_correlations          |    6 obs | per-obs  | FULLY COMMUTING
  stress_random_1000             |  255 obs | per-obs  | 81 groups

QAOA-5-ring:
  workload_cost                  |    5 obs | weighted | FULLY COMMUTING
  commuting_cost                 |    5 obs | weighted | FULLY COMMUTING
  stress_random_1

---

## 2. Configuration

In [4]:
# =============================================================================
# CIRCUIT AND SUITE SELECTION
# =============================================================================

# Which circuits to benchmark
CIRCUITS_TO_RUN = {
    # WORKSTREAM S: SHADOWS CORE
    'S-GHZ-4':    True,    # 4-qubit GHZ
    'S-GHZ-5':    False,   # 5-qubit GHZ
    'S-BELL-2':   True,   # 2 Bell pairs (4 qubits)
    'S-BELL-3':   False,   # 3 Bell pairs (6 qubits)
    'S-ISING-4':  False,   # 4-qubit Ising
    'S-ISING-6':  False,   # 6-qubit Ising
    # WORKSTREAM C: CHEMISTRY
    'C-H2':       False,   # H2 molecule (4 qubits)
    'C-LiH':      False,   # LiH molecule (6 qubits)
    # WORKSTREAM O: OPTIMIZATION
    'O-QAOA-5':   False,    # QAOA 5q ring
    'O-QAOA-7':   False,   # QAOA 7q ring
    # WORKSTREAM M: METROLOGY
    'M-PHASE-3':  False,   # 3-qubit phase sensing
    'M-PHASE-4':  False,   # 4-qubit phase sensing
}

# Which suite types to run for each circuit
# Options: 'workload', 'stress', 'commuting', 'posthoc', 'all'
SUITES_TO_RUN = {
    'workload': False,      # Task-aligned (energy, cost, fidelity)
    'stress': True,       # Large random sets (1000+ observables)
    'commuting': False,     # All-commuting baseline
    'posthoc': True,      # Post-hoc query library
    'diagnostics': True,  # System diagnostics
}

# Count enabled
enabled_circuits = [k for k, v in CIRCUITS_TO_RUN.items() if v]
enabled_suites = [k for k, v in SUITES_TO_RUN.items() if v]

print(f"Circuits to run: {len(enabled_circuits)} / {len(CIRCUITS_TO_RUN)}")
for c in enabled_circuits:
    print(f"  + {c}")

print(f"\nSuite types to run: {enabled_suites}")

Circuits to run: 2 / 12
  + S-GHZ-4
  + S-BELL-2

Suite types to run: ['stress', 'posthoc', 'diagnostics']


In [5]:
# =============================================================================
# BENCHMARK CONFIGURATION
# =============================================================================

CONFIG = BenchmarkSuiteConfig(
    mode=BenchmarkMode.ANALYSIS,      # Full analysis with all features
    n_shots_grid=[100, 200, 500],
    n_replicates=10,                  # Increase to 20+ for publication
    seed=42,
    epsilon=0.05,                     # Target precision
    delta=0.05,                       # Failure probability
    shadows_protocol_id="classical_shadows_v0",
    baseline_protocol_id="direct_grouped",
    output_base_dir="benchmark_results",
)

# Optional: Enable noise sweep for Task 7
RUN_NOISE_SWEEP = False  # Set True to run with multiple noise profiles
NOISE_PROFILES = ['ideal', 'readout_1e-2', 'depol_1e-3']  # If enabled

print(f"Mode: {CONFIG.mode.value}")
print(f"Shots: {CONFIG.n_shots_grid}")
print(f"Replicates: {CONFIG.n_replicates}")
print(f"Noise sweep: {RUN_NOISE_SWEEP}")

Mode: analysis
Shots: [100, 200, 500]
Replicates: 10
Noise sweep: False


In [6]:
# =============================================================================
# BUILD CIRCUITS AND SUITES
# =============================================================================

def filter_suites(all_suites: dict, enabled_types: dict) -> dict:
    """Filter suites based on enabled suite types."""
    filtered = {}
    for name, suite in all_suites.items():
        suite_type = suite.suite_type.value
        # Check if this suite type is enabled
        if enabled_types.get(suite_type, False):
            filtered[name] = suite
        # Also check for partial matches (e.g., 'workload_energy' matches 'workload')
        elif any(enabled_types.get(t, False) and t in name for t in enabled_types):
            filtered[name] = suite
    return filtered

# Circuit definitions: (circuit_builder, suite_builder)
CIRCUIT_DEFS = {
    # Workstream S: Shadows Core
    'S-GHZ-4':   (build_ghz(4), make_ghz_suites(4)),
    'S-GHZ-5':   (build_ghz(5), make_ghz_suites(5)),
    'S-BELL-2':  (build_bell_pairs(2), make_bell_suites(2)),
    'S-BELL-3':  (build_bell_pairs(3), make_bell_suites(3)),
    'S-ISING-4': (build_ising_trotter(4, 3), make_ising_suites(4)),
    'S-ISING-6': (build_ising_trotter(6, 3), make_ising_suites(6)),
    # Workstream C: Chemistry
    'C-H2':      (build_h2_ansatz(), make_chemistry_suites(4, molecule_name='H2')),
    'C-LiH':     (build_lih_ansatz(), make_chemistry_suites(6, molecule_name='LiH')),
    # Workstream O: Optimization
    'O-QAOA-5':  (build_qaoa_maxcut_ring(5, p=1), make_qaoa_ring_suites(5)),
    'O-QAOA-7':  (build_qaoa_maxcut_ring(7, p=1), make_qaoa_ring_suites(7)),
    # Workstream M: Metrology
    'M-PHASE-3': (build_ghz_phase_sensing(3, 0.1), make_phase_sensing_suites(3)),
    'M-PHASE-4': (build_ghz_phase_sensing(4, 0.1), make_phase_sensing_suites(4)),
}

# Build selected circuits with filtered suites
circuits = {}
for cid, run in CIRCUITS_TO_RUN.items():
    if run and cid in CIRCUIT_DEFS:
        circ, all_suites = CIRCUIT_DEFS[cid]
        filtered = filter_suites(all_suites, SUITES_TO_RUN)
        
        if filtered:
            circuits[cid] = {
                'circuit': circ,
                'suites': filtered,
                'n_qubits': circ.num_qubits,
            }

# Display what was built
print(f"\nBuilt {len(circuits)} circuits:")
print("="*80)

total_benchmarks = 0
for cid, info in circuits.items():
    print(f"\n{cid} ({info['n_qubits']} qubits):")
    for suite_name, suite in info['suites'].items():
        comm = suite.commutation_analysis()
        comm_str = "COMMUTING" if comm['fully_commuting'] else f"{comm['n_commuting_groups']} groups"
        obj_str = "[weighted]" if suite.objective == ObjectiveType.WEIGHTED_SUM else ""
        print(f"  • {suite_name:30s} {suite.n_observables:4d} obs  {comm_str:15s} {obj_str}")
        total_benchmarks += 1

print(f"\n{'='*80}")
print(f"TOTAL: {total_benchmarks} benchmark runs (circuit × suite combinations)")


Built 2 circuits:

S-GHZ-4 (4 qubits):
  • stress_random_1000              255 obs  81 groups       
  • posthoc_library                 255 obs  81 groups       

S-BELL-2 (4 qubits):
  • diagnostics_single_qubit          4 obs  COMMUTING       
  • diagnostics_cross_pair            1 obs  COMMUTING       
  • stress_random_1000              255 obs  81 groups       

TOTAL: 5 benchmark runs (circuit × suite combinations)


---

## 3. Run Benchmarks

In [None]:
%%time
# =============================================================================
# RUN ALL BENCHMARKS (Circuit × Suite combinations)
# =============================================================================

all_results = {}  # Keyed by (circuit_id, suite_name)

run_count = 0
total_runs = sum(len(info['suites']) for info in circuits.values())

for cid, info in circuits.items():
    for suite_name, suite in info['suites'].items():
        run_count += 1
        # Use __ separator instead of : for Windows path compatibility
        run_key = f"{cid}__{suite_name}"
        
        print(f"\n{'='*80}")
        print(f"BENCHMARK {run_count}/{total_runs}: {cid} / {suite_name}")
        print(f"  Circuit: {cid} ({info['n_qubits']}q)")
        print(f"  Suite: {suite_name} ({suite.n_observables} observables)")
        print(f"  Type: {suite.suite_type.value}, Objective: {suite.objective.value}")
        print(f"{'='*80}")
        
        # Build locality map from suite
        loc_map = {obs.observable_id: obs.locality for obs in suite.observables}
        
        # Run benchmark
        result = run_benchmark_suite(
            circuit=info['circuit'],
            observable_set=suite.observable_set,
            circuit_id=run_key,
            config=CONFIG,
            locality_map=loc_map,
        )
        
        # Store result with suite metadata
        all_results[run_key] = {
            'result': result,
            'circuit_id': cid,
            'suite_name': suite_name,
            'suite': suite,
            'n_qubits': info['n_qubits'],
        }

print(f"\n\n{'='*80}")
print(f"ALL BENCHMARKS COMPLETE: {len(all_results)} runs")
print(f"{'='*80}")


BENCHMARK 1/5: S-GHZ-4 / stress_random_1000
  Circuit: S-GHZ-4 (4q)
  Suite: stress_random_1000 (255 observables)
  Type: stress, Objective: per_observable
BENCHMARK SUITE: ANALYSIS
Run ID: S-GHZ-4__stress_random_1000_20260123_150156_7197e209
Output: benchmark_results\S-GHZ-4__stress_random_1000_20260123_150156_7197e209
Mode: analysis

Step 1: Running base benchmark...


  ret = _var(a, axis=axis, dtype=dtype, out=out, ddof=ddof,
  ret = ret.dtype.type(ret / rcount)


  Completed: 22950 rows

Step 2: Running all 8 tasks...
  Completed: 12 task evaluations

Step 3: Running comprehensive analysis...




  Comprehensive analysis complete

Step 4: Generating reports...
  Basic report: benchmark_results\S-GHZ-4__stress_random_1000_20260123_150156_7197e209\basic_report.md
  Complete report: benchmark_results\S-GHZ-4__stress_random_1000_20260123_150156_7197e209\complete_report.md
  Analysis report: benchmark_results\S-GHZ-4__stress_random_1000_20260123_150156_7197e209\analysis_report.md
  Analysis JSON: benchmark_results\S-GHZ-4__stress_random_1000_20260123_150156_7197e209\analysis.json

BENCHMARK COMPLETE
Output directory: benchmark_results\S-GHZ-4__stress_random_1000_20260123_150156_7197e209
Reports generated: ['basic', 'complete', 'analysis', 'analysis_json', 'config', 'manifest']


BENCHMARK 2/5: S-GHZ-4 / posthoc_library
  Circuit: S-GHZ-4 (4q)
  Suite: posthoc_library (255 observables)
  Type: posthoc, Objective: per_observable
BENCHMARK SUITE: ANALYSIS
Run ID: S-GHZ-4__posthoc_library_20260123_151503_74ad1eec
Output: benchmark_results\S-GHZ-4__posthoc_library_20260123_151503_74ad1ee

  ret = _var(a, axis=axis, dtype=dtype, out=out, ddof=ddof,
  ret = ret.dtype.type(ret / rcount)


  Completed: 22950 rows

Step 2: Running all 8 tasks...
  Completed: 12 task evaluations

Step 3: Running comprehensive analysis...




  Comprehensive analysis complete

Step 4: Generating reports...
  Basic report: benchmark_results\S-GHZ-4__posthoc_library_20260123_151503_74ad1eec\basic_report.md
  Complete report: benchmark_results\S-GHZ-4__posthoc_library_20260123_151503_74ad1eec\complete_report.md
  Analysis report: benchmark_results\S-GHZ-4__posthoc_library_20260123_151503_74ad1eec\analysis_report.md
  Analysis JSON: benchmark_results\S-GHZ-4__posthoc_library_20260123_151503_74ad1eec\analysis.json

BENCHMARK COMPLETE
Output directory: benchmark_results\S-GHZ-4__posthoc_library_20260123_151503_74ad1eec
Reports generated: ['basic', 'complete', 'analysis', 'analysis_json', 'config', 'manifest']


BENCHMARK 3/5: S-BELL-2 / diagnostics_single_qubit
  Circuit: S-BELL-2 (4q)
  Suite: diagnostics_single_qubit (4 observables)
  Type: diagnostic, Objective: per_observable
BENCHMARK SUITE: ANALYSIS
Run ID: S-BELL-2__diagnostics_single_qubit_20260123_152916_5beab050
Output: benchmark_results\S-BELL-2__diagnostics_single_qub

  ret = _var(a, axis=axis, dtype=dtype, out=out, ddof=ddof,
  ret = ret.dtype.type(ret / rcount)


---

## 4. Complete Results Analysis

This section displays ALL analysis features from the enhanced benchmarking system.

In [8]:
# =============================================================================
# TASK SUMMARY FOR EACH (Circuit, Suite) PAIR (All 8 Measurements Bible Tasks)
# =============================================================================

def compute_full_task_summary(bench_result, suite, config):
    """Compute all 8 task answers for a single benchmark run."""
    long_form = bench_result.long_form_results
    truth = bench_result.ground_truth.truth_values if bench_result.ground_truth else {}
    max_n = max(config.n_shots_grid)
    eps = config.epsilon
    
    # Group by protocol and N
    by_pn = defaultdict(lambda: defaultdict(list))
    for row in long_form:
        by_pn[row.protocol_id][row.N_total].append(row)
    protocols = sorted(by_pn.keys())
    
    tasks = {}
    
    # Task 1: Worst-case N*
    tasks['1'] = {}
    for p in protocols:
        n_star = None
        for n in sorted(by_pn[p].keys()):
            max_se = max(r.se for r in by_pn[p][n] if r.se is not None)
            if max_se <= eps:
                n_star = n
                break
        tasks['1'][p] = f"N*={n_star}" if n_star else f"N*>{max_n}"
    
    # Task 2: Average N*
    tasks['2'] = {}
    for p in protocols:
        n_star = None
        for n in sorted(by_pn[p].keys()):
            mean_se = np.mean([r.se for r in by_pn[p][n] if r.se is not None])
            if mean_se <= eps:
                n_star = n
                break
        tasks['2'][p] = f"N*={n_star}" if n_star else f"N*>{max_n}"
    
    # Task 3: SE distribution at max N
    tasks['3'] = {}
    for p in protocols:
        ses = [r.se for r in by_pn[p][max_n] if r.se is not None]
        tasks['3'][p] = {'mean': np.mean(ses), 'median': np.median(ses), 'max': np.max(ses)}
    
    # Task 4: Dominance
    obs_best = {}
    for p in protocols:
        for r in by_pn[p][max_n]:
            if r.observable_id not in obs_best or r.se < obs_best[r.observable_id][1]:
                obs_best[r.observable_id] = (p, r.se)
    wins = defaultdict(int)
    for oid, (p, _) in obs_best.items():
        wins[p] += 1
    total = len(obs_best)
    tasks['4'] = {p: f"{wins[p]}/{total} ({100*wins[p]/total:.0f}%)" for p in protocols}
    tasks['4']['winner'] = max(wins, key=wins.get) if wins else "N/A"
    
    # Task 5: Pilot selection
    if bench_result.analysis and hasattr(bench_result.analysis, 'pilot_analysis') and bench_result.analysis.pilot_analysis:
        tasks['5'] = f"{bench_result.analysis.pilot_analysis.optimal_fraction*100:.0f}%"
    else:
        tasks['5'] = "N/A"
    
    # Task 6: Bias-variance
    tasks['6'] = {}
    if truth:
        for p in protocols:
            by_obs = defaultdict(list)
            for r in by_pn[p][max_n]:
                if r.observable_id in truth:
                    by_obs[r.observable_id].append(r.estimate)
            biases_sq, vars_ = [], []
            for oid, ests in by_obs.items():
                biases_sq.append((np.mean(ests) - truth[oid])**2)
                vars_.append(np.var(ests))
            if biases_sq:
                tasks['6'][p] = {'bias2': np.mean(biases_sq), 'var': np.mean(vars_),
                                 'mse': np.mean(biases_sq) + np.mean(vars_)}
    
    # Task 7: Noise sensitivity (placeholder)
    tasks['7'] = "Requires noise sweep" if not RUN_NOISE_SWEEP else "See noise analysis"
    
    # Task 8: Adaptive efficiency (placeholder)
    tasks['8'] = "See Task 5 pilot analysis"
    
    return tasks, protocols

# Generate and display for each (circuit, suite) pair
for run_key, run_data in all_results.items():
    bench_result = run_data['result']
    suite = run_data['suite']
    cid = run_data['circuit_id']
    n_qubits = run_data['n_qubits']
    
    tasks, protocols = compute_full_task_summary(bench_result, suite, CONFIG)
    
    # Suite metadata
    comm = suite.commutation_analysis()
    comm_str = "FULLY COMMUTING" if comm['fully_commuting'] else f"{comm['n_commuting_groups']} groups"
    obj_str = f"[{suite.objective.value}]" if suite.objective != ObjectiveType.PER_OBSERVABLE else ""
    
    print(f"\n{'='*100}")
    print(f"{run_key}")
    print(f"  {n_qubits}q, {suite.n_observables} obs, {suite.suite_type.value} {obj_str}")
    print(f"  Commutation: {comm_str}")
    print(f"{'='*100}")
    
    # Header
    col_w = 24
    hdr = f"{'Task':<6} {'Question':<40}"
    for p in protocols:
        short = p.replace('classical_shadows_v0', 'shadows').replace('direct_', '')
        hdr += f" {short:>{col_w}}"
    print(hdr)
    print("-" * len(hdr))
    
    # Task 1
    row = f"{'1':<6} {'Worst-case N* (max SE <= eps)?':<40}"
    for p in protocols: row += f" {tasks['1'][p]:>{col_w}}"
    print(row)
    
    # Task 2
    row = f"{'2':<6} {'Average N* (mean SE <= eps)?':<40}"
    for p in protocols: row += f" {tasks['2'][p]:>{col_w}}"
    print(row)
    
    # Task 3
    print(f"{'3':<6} {'SE distribution at max N?':<40}")
    for m in ['mean', 'median', 'max']:
        row = f"{'':.<6} {'  ' + m:<40}"
        for p in protocols: row += f" {tasks['3'][p][m]:>{col_w}.4f}"
        print(row)
    
    # Task 4
    row = f"{'4':<6} {'Dominance (wins)?':<40}"
    for p in protocols: row += f" {tasks['4'][p]:>{col_w}}"
    print(row)
    print(f"{'':.<6} {'  WINNER:':<40} {tasks['4']['winner']}")
    
    # Task 5
    print(f"{'5':<6} {'Optimal pilot fraction?':<40} {tasks['5']}")
    
    # Task 6
    if tasks['6']:
        print(f"{'6':<6} {'Bias-variance decomposition?':<40}")
        for m in ['bias2', 'var', 'mse']:
            row = f"{'':.<6} {'  ' + m:<40}"
            for p in protocols:
                if p in tasks['6']:
                    row += f" {tasks['6'][p][m]:>{col_w}.6f}"
                else:
                    row += f" {'N/A':>{col_w}}"
            print(row)
    else:
        print(f"{'6':<6} {'Bias-variance?':<40} (requires ground truth)")
    
    # Task 7 & 8
    print(f"{'7':<6} {'Noise sensitivity?':<40} {tasks['7']}")
    print(f"{'8':<6} {'Adaptive efficiency?':<40} {tasks['8']}")
    print("-" * len(hdr))


S-GHZ-4__workload_stabilizers
  4q, 7 obs, workload 
  Commutation: 2 groups
Task   Question                                                  shadows                  grouped                optimized
--------------------------------------------------------------------------------------------------------------------------
1      Worst-case N* (max SE <= eps)?                             N*>500                   N*=100                   N*=100
2      Average N* (mean SE <= eps)?                               N*>500                   N*=100                   N*=100
3      SE distribution at max N?               
......   mean                                                     0.1631                   0.0000                   0.0000
......   median                                                   0.1298                   0.0000                   0.0000
......   max                                                      0.5071                   0.0000                   0.0000
4      Domina

In [9]:
# =============================================================================
# ENHANCED ANALYSIS (Bootstrap, K-S Tests, Crossover, Locality)
# =============================================================================

def display_enhanced_analysis(bench_result, run_key, suite):
    """Display enhanced statistical analysis from result.analysis."""
    if not bench_result.analysis:
        print(f"  No enhanced analysis available")
        return
    
    analysis = bench_result.analysis
    
    # N* Interpolation
    if hasattr(analysis, 'n_star_interpolation') and analysis.n_star_interpolation:
        print(f"\n  N* INTERPOLATION (power-law fit):")
        for protocol, data in analysis.n_star_interpolation.items():
            if hasattr(data, 'n_star'):
                print(f"    {protocol}: N* = {data.n_star:.0f}")
    
    # Statistical Tests
    if hasattr(analysis, 'statistical_tests') and analysis.statistical_tests:
        print(f"\n  STATISTICAL TESTS:")
        st = analysis.statistical_tests
        if hasattr(st, 'ks_statistic'):
            print(f"    K-S statistic: {st.ks_statistic:.4f}")
            print(f"    K-S p-value: {st.ks_pvalue:.4f}")
            sig = "YES" if st.ks_pvalue < 0.05 else "NO"
            print(f"    Distributions significantly different: {sig}")
        if hasattr(st, 'ssf_estimate'):
            print(f"    SSF estimate: {st.ssf_estimate:.2f}x")
        if hasattr(st, 'ssf_ci_low') and hasattr(st, 'ssf_ci_high'):
            print(f"    SSF 95% CI: [{st.ssf_ci_low:.2f}, {st.ssf_ci_high:.2f}]")
    
    # Crossover Analysis
    if hasattr(analysis, 'crossover_analysis') and analysis.crossover_analysis:
        print(f"\n  CROSSOVER ANALYSIS:")
        ca = analysis.crossover_analysis
        if hasattr(ca, 'crossover_n') and ca.crossover_n:
            print(f"    Crossover N: {ca.crossover_n:.0f}")
        if hasattr(ca, 'shadows_wins_above'):
            print(f"    Shadows wins above crossover: {ca.shadows_wins_above}")
    
    # Locality Breakdown
    if hasattr(analysis, 'locality_analysis') and analysis.locality_analysis:
        print(f"\n  LOCALITY BREAKDOWN:")
        la = analysis.locality_analysis
        if hasattr(la, 'by_locality'):
            for k, data in sorted(la.by_locality.items()):
                if hasattr(data, 'shadows_mean_se') and hasattr(data, 'baseline_mean_se'):
                    ratio = data.shadows_mean_se / data.baseline_mean_se if data.baseline_mean_se > 0 else float('inf')
                    winner = "shadows" if ratio < 1 else "baseline"
                    print(f"    K={k}: ratio={ratio:.2f}x ({winner})")
    
    # Pilot Analysis
    if hasattr(analysis, 'pilot_analysis') and analysis.pilot_analysis:
        print(f"\n  PILOT ANALYSIS:")
        pa = analysis.pilot_analysis
        print(f"    Optimal fraction: {pa.optimal_fraction*100:.0f}%")
        if hasattr(pa, 'results'):
            print(f"    Fractions tested: {list(pa.results.keys())}")

print("\n" + "="*100)
print("ENHANCED STATISTICAL ANALYSIS")
print("="*100)

for run_key, run_data in all_results.items():
    bench_result = run_data['result']
    suite = run_data['suite']
    
    # Suite context
    comm = suite.commutation_analysis()
    comm_str = "commuting" if comm['fully_commuting'] else f"{comm['n_commuting_groups']} groups"
    
    print(f"\n--- {run_key} ({suite.n_observables} obs, {comm_str}) ---")
    display_enhanced_analysis(bench_result, run_key, suite)


ENHANCED STATISTICAL ANALYSIS

--- S-GHZ-4__workload_stabilizers (7 obs, 2 groups) ---

  CROSSOVER ANALYSIS:

  LOCALITY BREAKDOWN:

  PILOT ANALYSIS:
    Optimal fraction: 2%
    Fractions tested: [0.02, 0.05, 0.1, 0.2]

--- S-GHZ-4__commuting_z_only (11 obs, commuting) ---

  CROSSOVER ANALYSIS:

  LOCALITY BREAKDOWN:

  PILOT ANALYSIS:
    Optimal fraction: 2%
    Fractions tested: [0.02, 0.05, 0.1, 0.2]

--- S-BELL-2__workload_pair_correlations (6 obs, 3 groups) ---

  CROSSOVER ANALYSIS:

  LOCALITY BREAKDOWN:

  PILOT ANALYSIS:
    Optimal fraction: 2%
    Fractions tested: [0.02, 0.05, 0.1, 0.2]


In [10]:
# =============================================================================
# OBJECTIVE-LEVEL ANALYSIS (Work Item 4: Task-Level Metrics)
# =============================================================================
# For suites with weighted objectives (QAOA cost, chemistry energy), compute
# the error in the OBJECTIVE, not individual observables.
#
# This is the actual metric practitioners care about:
#   - QAOA: C = Σ (1 - ⟨ZZ⟩) / 2   (MAX-CUT cost)
#   - Chemistry: E = Σ c_k ⟨P_k⟩   (ground state energy)

from quartumse.analysis.objective_metrics import (
    compute_objective_metrics,
    format_objective_analysis,
)

print("\n" + "="*100)
print("OBJECTIVE-LEVEL ANALYSIS (Weighted Suites Only)")
print("="*100)

weighted_runs = []
for run_key, run_data in all_results.items():
    suite = run_data['suite']
    if suite.objective == ObjectiveType.WEIGHTED_SUM and suite.weights:
        weighted_runs.append((run_key, run_data))

if not weighted_runs:
    print("\nNo weighted suites found. Enable QAOA workload or Chemistry suites to see objective metrics.")
else:
    print(f"\nFound {len(weighted_runs)} weighted suite(s):")
    
    for run_key, run_data in weighted_runs:
        bench_result = run_data['result']
        suite = run_data['suite']
        
        print(f"\n{'='*80}")
        print(f"{run_key}")
        print(f"  Objective type: {suite.suite_type.value}")
        print(f"  Weighted observables: {len(suite.weights)}")
        print(f"{'='*80}")
        
        # Determine objective type for computation
        obj_type = "qaoa_cost" if "qaoa" in run_key.lower() or "cost" in suite.name.lower() else "weighted_sum"
        
        # Compute objective metrics
        obj_analysis = compute_objective_metrics(
            long_form_results=bench_result.long_form_results,
            weights=suite.weights,
            objective_type=obj_type,
            true_objective=None,  # Could add ground truth if available
            target_epsilon=CONFIG.epsilon,
            n_bootstrap=500,
            seed=CONFIG.seed,
        )
        
        # Display formatted results
        print(format_objective_analysis(obj_analysis))
        
        # Store in results for later
        run_data['objective_analysis'] = obj_analysis

print("\n" + "="*100)
print("KEY INSIGHT: For weighted objectives, what matters is the TOTAL error,")
print("             not individual observable errors. This may change which protocol wins!")
print("="*100)


OBJECTIVE-LEVEL ANALYSIS (Weighted Suites Only)

No weighted suites found. Enable QAOA workload or Chemistry suites to see objective metrics.

KEY INSIGHT: For weighted objectives, what matters is the TOTAL error,
             not individual observable errors. This may change which protocol wins!


In [11]:
# =============================================================================
# POST-HOC QUERYING BENCHMARK (Work Item 3)
# =============================================================================
# This simulates the core advantage of classical shadows:
#   "Measure once, decide observables later"
#
# Cost accounting:
#   - Shadows: quantum cost = ONE acquisition; all new queries are FREE
#   - Direct: pay for each new basis not already measured
#
# This quantifies the "option value" of shadows.

from quartumse.analysis.posthoc_benchmark import (
    run_posthoc_benchmark_from_suite,
    format_posthoc_result,
)

print("\n" + "="*100)
print("POST-HOC QUERYING BENCHMARK")
print("="*100)

# Find posthoc suites (if any were run)
posthoc_runs = []
for run_key, run_data in all_results.items():
    suite = run_data['suite']
    if suite.suite_type == SuiteType.POSTHOC:
        posthoc_runs.append((run_key, run_data))

# Also check for posthoc libraries in circuit definitions (even if not benchmarked)
posthoc_available = []
for cid, info in circuits.items():
    for suite_name, suite in info.get('suites', {}).items():
        if 'posthoc' in suite_name.lower() or suite.suite_type == SuiteType.POSTHOC:
            posthoc_available.append((cid, suite_name, suite))

if not posthoc_available and not posthoc_runs:
    print("\nNo post-hoc suites found.")
    print("Enable 'posthoc': True in SUITES_TO_RUN to run post-hoc benchmarks.")
    print("\nExample configuration:")
    print("  SUITES_TO_RUN = {")
    print("      'workload': True,")
    print("      'posthoc': True,  # <-- Enable this")
    print("  }")
else:
    # Run post-hoc simulation on available posthoc suites
    print(f"\nFound {len(posthoc_available)} post-hoc suite(s):")
    
    for cid, suite_name, suite in posthoc_available:
        print(f"\n{'='*80}")
        print(f"POST-HOC SIMULATION: {cid}:{suite_name}")
        print(f"  Library size: {suite.n_observables} observables")
        print(f"{'='*80}")
        
        # Configure simulation
        n_rounds = 5
        obs_per_round = max(10, suite.n_observables // 10)  # ~10% per round
        
        # Run simulation
        result = run_posthoc_benchmark_from_suite(
            posthoc_suite=suite,
            n_rounds=n_rounds,
            observables_per_round=obs_per_round,
            shadows_shots=max(CONFIG.n_shots_grid),  # Use max shot budget
            direct_shots_per_basis=100,  # Shots per basis for direct
            seed=CONFIG.seed,
        )
        
        # Display results
        print(format_posthoc_result(result))
        print()
        
        # Cumulative cost curves
        print("\nCUMULATIVE COST CURVES:")
        print(f"{'Round':<8} {'Cum Shadows':>15} {'Cum Direct':>15} {'Cum Obs':>12} {'Savings':>10}")
        print("-" * 65)
        
        shadows = result.shadows_costs
        direct = result.direct_costs
        
        if shadows and direct:
            for i in range(result.n_rounds):
                savings = direct.cumulative_shots[i] / shadows.cumulative_shots[i] if shadows.cumulative_shots[i] > 0 else float('inf')
                print(
                    f"{i:<8} {shadows.cumulative_shots[i]:>15,} {direct.cumulative_shots[i]:>15,} "
                    f"{shadows.cumulative_observables_answered[i]:>12} {savings:>9.1f}x"
                )

print("\n" + "="*100)
print("KEY INSIGHT: Shadows' quantum cost is FIXED after acquisition.")
print("             Direct measurement cost GROWS with each new query round.")
print("             The more observables you query later, the more shadows saves.")
print("="*100)


POST-HOC QUERYING BENCHMARK

No post-hoc suites found.
Enable 'posthoc': True in SUITES_TO_RUN to run post-hoc benchmarks.

Example configuration:
  SUITES_TO_RUN = {
      'workload': True,
      'posthoc': True,  # <-- Enable this
  }

KEY INSIGHT: Shadows' quantum cost is FIXED after acquisition.
             Direct measurement cost GROWS with each new query round.
             The more observables you query later, the more shadows saves.


In [12]:
# =============================================================================
# CROSS-CIRCUIT CONSOLIDATED COMPARISON (Suite-Aware)
# =============================================================================

print("\n" + "="*100)
print("CROSS-CIRCUIT COMPARISON BY SUITE TYPE")
print("="*100)

# Group results by suite type for analysis
by_suite_type = defaultdict(list)
for run_key, run_data in all_results.items():
    suite_type = run_data['suite'].suite_type.value
    by_suite_type[suite_type].append((run_key, run_data))

# Summary table for each suite type
for suite_type, runs in by_suite_type.items():
    print(f"\n{'='*80}")
    print(f"SUITE TYPE: {suite_type.upper()}")
    print(f"{'='*80}")
    
    print(f"{'Run Key':<35} {'Q':>3} {'Obs':>5} {'Comm?':>6} {'Shadows SE':>12} {'Grouped SE':>12} {'Ratio':>8} {'Winner':>10}")
    print("-" * 105)
    
    shadows_wins = 0
    total_runs = 0
    
    for run_key, run_data in runs:
        bench_result = run_data['result']
        suite = run_data['suite']
        n_qubits = run_data['n_qubits']
        
        # Check commutation
        comm = suite.commutation_analysis()
        comm_str = "YES" if comm['fully_commuting'] else "no"
        
        summaries = bench_result.summary.get('protocol_summaries', {})
        
        shadows_se = summaries.get('classical_shadows_v0', {}).get('mean_se', float('inf'))
        grouped_se = summaries.get('direct_grouped', {}).get('mean_se', float('inf'))
        
        ratio = shadows_se / grouped_se if grouped_se > 0 else float('inf')
        winner = 'Shadows' if ratio < 1 else 'Grouped'
        
        if ratio < 1:
            shadows_wins += 1
        total_runs += 1
        
        print(f"{run_key:<35} {n_qubits:>3} {suite.n_observables:>5} {comm_str:>6} "
              f"{shadows_se:>12.4f} {grouped_se:>12.4f} {ratio:>8.2f}x {winner:>10}")
    
    print("-" * 105)
    print(f"  {suite_type.upper()}: Shadows wins {shadows_wins}/{total_runs} runs")

# Overall summary
print("\n" + "="*100)
print("OVERALL SUMMARY")
print("="*100)

total_wins_shadows = 0
total_runs = 0
commuting_shadows_wins = 0
commuting_total = 0
noncommuting_shadows_wins = 0
noncommuting_total = 0

for run_key, run_data in all_results.items():
    bench_result = run_data['result']
    suite = run_data['suite']
    comm = suite.commutation_analysis()
    
    summaries = bench_result.summary.get('protocol_summaries', {})
    shadows_se = summaries.get('classical_shadows_v0', {}).get('mean_se', float('inf'))
    grouped_se = summaries.get('direct_grouped', {}).get('mean_se', float('inf'))
    
    shadows_won = shadows_se < grouped_se
    
    total_runs += 1
    if shadows_won:
        total_wins_shadows += 1
    
    if comm['fully_commuting']:
        commuting_total += 1
        if shadows_won:
            commuting_shadows_wins += 1
    else:
        noncommuting_total += 1
        if shadows_won:
            noncommuting_shadows_wins += 1

print(f"\nTotal runs: {total_runs}")
print(f"Shadows wins overall: {total_wins_shadows}/{total_runs} ({100*total_wins_shadows/total_runs:.1f}%)")
if commuting_total > 0:
    print(f"Shadows wins on COMMUTING suites: {commuting_shadows_wins}/{commuting_total} ({100*commuting_shadows_wins/commuting_total:.1f}%)")
if noncommuting_total > 0:
    print(f"Shadows wins on NON-COMMUTING suites: {noncommuting_shadows_wins}/{noncommuting_total} ({100*noncommuting_shadows_wins/noncommuting_total:.1f}%)")

print("\nKEY INSIGHT:")
print("  - For COMMUTING suites (e.g., QAOA cost), grouped measurement should dominate")
print("  - For NON-COMMUTING suites (e.g., stress), shadows may become competitive")
print("  - The crossover depends on K (number of observables) and locality distribution")


CROSS-CIRCUIT COMPARISON BY SUITE TYPE

SUITE TYPE: WORKLOAD
Run Key                               Q   Obs  Comm?   Shadows SE   Grouped SE    Ratio     Winner
---------------------------------------------------------------------------------------------------------
S-GHZ-4__workload_stabilizers         4     7     no       0.1631       0.0000      infx    Grouped
S-BELL-2__workload_pair_correlations   4     6     no       0.1260       0.0000      infx    Grouped
---------------------------------------------------------------------------------------------------------
  WORKLOAD: Shadows wins 0/2 runs

SUITE TYPE: COMMUTING
Run Key                               Q   Obs  Comm?   Shadows SE   Grouped SE    Ratio     Winner
---------------------------------------------------------------------------------------------------------
S-GHZ-4__commuting_z_only             4    11    YES       0.1312       0.0163     8.07x    Grouped
----------------------------------------------------------------

In [13]:
# =============================================================================
# SAVE CONSOLIDATED RESULTS (with Suite Metadata)
# =============================================================================

output_dir = Path(CONFIG.output_base_dir)
output_dir.mkdir(parents=True, exist_ok=True)

consolidated = {
    'timestamp': datetime.now().isoformat(),
    'n_runs': len(all_results),
    'config': {
        'mode': CONFIG.mode.value,
        'n_shots_grid': CONFIG.n_shots_grid,
        'n_replicates': CONFIG.n_replicates,
        'epsilon': CONFIG.epsilon,
    },
    'suites_enabled': {k: v for k, v in SUITES_TO_RUN.items() if v},
    'circuits_enabled': [k for k, v in CIRCUITS_TO_RUN.items() if v],
    'runs': {},
}

for run_key, run_data in all_results.items():
    bench_result = run_data['result']
    suite = run_data['suite']
    comm = suite.commutation_analysis()
    
    consolidated['runs'][run_key] = {
        'circuit_id': run_data['circuit_id'],
        'suite_name': run_data['suite_name'],
        'n_qubits': run_data['n_qubits'],
        # Suite metadata
        'suite_metadata': {
            'suite_type': suite.suite_type.value,
            'objective': suite.objective.value,
            'n_observables': suite.n_observables,
            'fully_commuting': comm['fully_commuting'],
            'n_commuting_groups': comm['n_commuting_groups'],
            'has_weights': suite.weights is not None,
            'description': suite.description,
        },
        # Benchmark results
        'run_id': bench_result.run_id,
        'output_dir': str(bench_result.output_dir),
        'summary': bench_result.summary,
    }

timestamp = datetime.now().strftime('%Y%m%d_%H%M%S')
consolidated_path = output_dir / f'consolidated_{timestamp}.json'
with open(consolidated_path, 'w') as f:
    json.dump(consolidated, f, indent=2, default=str)

print(f"Consolidated results saved: {consolidated_path}")
print(f"\nSummary:")
print(f"  Total runs: {len(all_results)}")
print(f"  Suites enabled: {list(consolidated['suites_enabled'].keys())}")
print(f"  Circuits: {consolidated['circuits_enabled']}")
print(f"\nIndividual run directories:")
for run_key, run_data in all_results.items():
    print(f"  {run_key}: {run_data['result'].output_dir}")

Consolidated results saved: benchmark_results\consolidated_20260123_143543.json

Summary:
  Total runs: 3
  Suites enabled: ['workload', 'commuting']
  Circuits: ['S-GHZ-4', 'S-BELL-2']

Individual run directories:
  S-GHZ-4__workload_stabilizers: benchmark_results\S-GHZ-4__workload_stabilizers_20260123_142753_1e3ac94c
  S-GHZ-4__commuting_z_only: benchmark_results\S-GHZ-4__commuting_z_only_20260123_142827_6ab5f0ce
  S-BELL-2__workload_pair_correlations: benchmark_results\S-BELL-2__workload_pair_correlations_20260123_142850_aaa891e0


---

## Summary

This notebook provides **complete benchmarking** of classical shadows vs direct measurement:

### Tasks Evaluated (Measurements Bible)

| Task | Question | Output |
|------|----------|--------|
| 1 | Worst-case N* (all obs)? | N* per protocol |
| 2 | Average N* (mean)? | N* per protocol |
| 3 | SE distribution at fixed N? | mean, median, max |
| 4 | Dominance (% wins)? | Winner + breakdown |
| 5 | Optimal pilot fraction? | % of budget |
| 6 | Bias-variance decomposition? | Bias2, Var, MSE |
| 7 | Noise sensitivity? | (with sweep) |
| 8 | Adaptive efficiency? | (from pilot) |

### Enhanced Analysis

- Power-law N* interpolation
- K-S distribution tests
- Bootstrap confidence intervals
- Per-observable crossover analysis
- Locality breakdown (k=1,2,3,...,n)
- Cost-normalized metrics

### Obsolete Notebooks

This notebook supersedes:
- `benchmark_shadows_vs_baselines.ipynb`
- `notebook_j_full_publication_benchmark_ghz_shadows_v0.ipynb`
- `notebook_k_locality_benchmark.ipynb`
- `notebook_l_random_bloch_benchmark.ipynb`
- `notebook_l_comprehensive_benchmark.ipynb`
- `notebook_benchmark_suite.ipynb`