# PennyLane Classical Shadows Benchmark

**Benchmark notebook using PennyLane** for classical shadows comparison.

This notebook benchmarks:
1. **PennyLane's classical shadows** implementation
2. **QuartumSE's classical shadows** (our implementation)
3. **Direct measurement baselines** (grouped, optimized)

## Purpose

Compare our QuartumSE implementation against PennyLane's established library to:
- Validate correctness of our implementation
- Compare performance (accuracy, variance)
- Identify any differences in methodology

## Requirements

```bash
pip install pennylane pennylane-qiskit
```

In [None]:
# =============================================================================
# SETUP
# =============================================================================
import sys
sys.path.insert(0, '../src')

import numpy as np
from collections import defaultdict
from datetime import datetime
from pathlib import Path
import json
import warnings
warnings.filterwarnings('ignore')

# PennyLane imports
import pennylane as qml
from pennylane import numpy as pnp

# QuartumSE imports (for comparison)
from quartumse import Observable, ObservableSet
from quartumse.observables.suites import (
    ObservableSuite,
    ObjectiveType,
    SuiteType,
    make_ghz_suites,
    make_bell_suites,
    make_ising_suites,
    make_qaoa_ring_suites,
)

print(f"PennyLane version: {qml.__version__}")
print(f"NumPy version: {np.__version__}")
print("\nSetup complete!")

---

## 1. PennyLane Circuit Definitions

All circuits implemented using PennyLane's native interface.

In [None]:
# =============================================================================
# PENNYLANE CIRCUIT BUILDERS
# =============================================================================

def pl_ghz_circuit(n_qubits: int):
    """GHZ state preparation in PennyLane.
    
    |GHZ> = (|00...0> + |11...1>) / sqrt(2)
    """
    qml.Hadamard(wires=0)
    for i in range(1, n_qubits):
        qml.CNOT(wires=[i - 1, i])


def pl_bell_pairs_circuit(n_pairs: int):
    """Parallel Bell pairs in PennyLane.
    
    Creates n_pairs independent Bell states.
    """
    for i in range(n_pairs):
        qml.Hadamard(wires=2 * i)
        qml.CNOT(wires=[2 * i, 2 * i + 1])


def pl_random_clifford_circuit(n_qubits: int, depth: int, seed: int = 42):
    """Random Clifford circuit in PennyLane."""
    rng = np.random.default_rng(seed)
    clifford_gates = [qml.Hadamard, qml.S, qml.adjoint(qml.S), 
                      qml.PauliX, qml.PauliY, qml.PauliZ]
    
    for _ in range(depth):
        for q in range(n_qubits):
            gate = rng.choice(clifford_gates)
            gate(wires=q)
        for q in range(0, n_qubits - 1, 2):
            if rng.random() > 0.3:
                qml.CNOT(wires=[q, q + 1])


def pl_ising_trotter_circuit(n_qubits: int, steps: int = 3, dt: float = 0.5):
    """Trotterized transverse-field Ising model in PennyLane.
    
    H = -J * sum(ZZ) - h * sum(X)
    """
    J, h = 1.0, 0.5
    
    # Initial state: |+>^n
    for q in range(n_qubits):
        qml.Hadamard(wires=q)
    
    # Trotter steps
    for _ in range(steps):
        # ZZ interactions
        for q in range(n_qubits - 1):
            qml.CNOT(wires=[q, q + 1])
            qml.RZ(2 * J * dt, wires=q + 1)
            qml.CNOT(wires=[q, q + 1])
        # X field
        for q in range(n_qubits):
            qml.RX(2 * h * dt, wires=q)


def pl_h2_ansatz_circuit(theta: float = 0.5):
    """H2 molecule VQE ansatz in PennyLane (4 qubits)."""
    # HF reference
    qml.PauliX(wires=0)
    qml.PauliX(wires=1)
    # Double excitation
    qml.CNOT(wires=[1, 2])
    qml.RY(theta, wires=2)
    qml.CNOT(wires=[1, 2])
    qml.CNOT(wires=[0, 3])
    qml.RY(theta / 2, wires=3)
    qml.CNOT(wires=[0, 3])


def pl_qaoa_maxcut_ring_circuit(n_qubits: int, p: int = 1, 
                                gamma: float = 0.5, beta: float = 0.5):
    """QAOA for MAX-CUT on ring graph in PennyLane."""
    # Initial superposition
    for q in range(n_qubits):
        qml.Hadamard(wires=q)
    
    # p QAOA layers
    for _ in range(p):
        # Cost unitary (ZZ on ring edges)
        for q in range(n_qubits):
            q_next = (q + 1) % n_qubits
            qml.CNOT(wires=[q, q_next])
            qml.RZ(2 * gamma, wires=q_next)
            qml.CNOT(wires=[q, q_next])
        # Mixer unitary
        for q in range(n_qubits):
            qml.RX(2 * beta, wires=q)


def pl_ghz_phase_sensing_circuit(n_qubits: int, phi: float = 0.1):
    """GHZ state with phase encoding for metrology."""
    # Create GHZ
    pl_ghz_circuit(n_qubits)
    # Encode phase
    for q in range(n_qubits):
        qml.RZ(phi, wires=q)


print("PennyLane circuit builders defined!")

---

## 2. PennyLane Classical Shadows

PennyLane provides built-in classical shadows functionality via:
- `qml.classical_shadow()` - Collect shadow snapshots
- `qml.shadow_expval()` - Estimate expectation values from shadows

In [None]:
# =============================================================================
# PENNYLANE CLASSICAL SHADOWS UTILITIES
# =============================================================================

def create_pennylane_shadow_device(n_qubits: int, shots: int, seed: int = 42):
    """Create a PennyLane device for classical shadows."""
    return qml.device('default.qubit', wires=n_qubits, shots=shots, seed=seed)


def pauli_string_to_pennylane(pauli_string: str):
    """Convert Pauli string to PennyLane observable.
    
    Args:
        pauli_string: e.g., 'XZIY' (qubit 0 = X, qubit 1 = Z, qubit 2 = I, qubit 3 = Y)
    
    Returns:
        PennyLane observable (tensor product)
    """
    pauli_map = {
        'I': qml.Identity,
        'X': qml.PauliX,
        'Y': qml.PauliY,
        'Z': qml.PauliZ,
    }
    
    terms = []
    for i, p in enumerate(pauli_string):
        if p != 'I':
            terms.append(pauli_map[p](i))
    
    if not terms:
        # All identity - return identity on first qubit
        return qml.Identity(0)
    elif len(terms) == 1:
        return terms[0]
    else:
        # Tensor product
        result = terms[0]
        for t in terms[1:]:
            result = result @ t
        return result


def collect_pennylane_shadows(circuit_fn, n_qubits: int, n_shadows: int, seed: int = 42):
    """Collect classical shadow snapshots using PennyLane.
    
    Args:
        circuit_fn: Function that applies the state preparation circuit
        n_qubits: Number of qubits
        n_shadows: Number of shadow snapshots to collect
        seed: Random seed
    
    Returns:
        Tuple of (bits, recipes) arrays
    """
    dev = qml.device('default.qubit', wires=n_qubits, shots=n_shadows, seed=seed)
    
    @qml.qnode(dev)
    def shadow_circuit():
        circuit_fn()
        return qml.classical_shadow(wires=range(n_qubits))
    
    bits, recipes = shadow_circuit()
    return bits, recipes


def estimate_from_pennylane_shadows(bits, recipes, observables: list, coefficient: float = 1.0):
    """Estimate expectation values from PennyLane shadow data.
    
    Args:
        bits: Shadow measurement bits (n_shadows, n_qubits)
        recipes: Shadow basis recipes (n_shadows, n_qubits)
        observables: List of PennyLane observables
        coefficient: Coefficient to multiply estimates by
    
    Returns:
        List of (expectation, std_error) tuples
    """
    results = []
    for obs in observables:
        expval = qml.shadows.expval(obs, bits, recipes)
        # Bootstrap or analytical SE estimation
        # For now, use simple estimate
        results.append((float(expval) * coefficient, None))
    return results


class PennyLaneShadowsEstimator:
    """Wrapper class for PennyLane classical shadows.
    
    Provides interface compatible with QuartumSE benchmarking.
    """
    
    def __init__(self, n_qubits: int, n_shadows: int, seed: int = 42):
        self.n_qubits = n_qubits
        self.n_shadows = n_shadows
        self.seed = seed
        self.bits = None
        self.recipes = None
        self._collected = False
    
    def collect(self, circuit_fn):
        """Collect shadow snapshots."""
        self.bits, self.recipes = collect_pennylane_shadows(
            circuit_fn, self.n_qubits, self.n_shadows, self.seed
        )
        self._collected = True
        return self
    
    def estimate(self, pauli_string: str, coefficient: float = 1.0) -> tuple:
        """Estimate expectation value for a Pauli observable.
        
        Args:
            pauli_string: e.g., 'XZIY'
            coefficient: Observable coefficient
        
        Returns:
            Tuple of (expectation_value, standard_error)
        """
        if not self._collected:
            raise ValueError("Must call collect() before estimate()")
        
        obs = pauli_string_to_pennylane(pauli_string)
        
        # Use PennyLane's shadow expectation value
        expval = qml.shadows.expval(obs, self.bits, self.recipes)
        
        # Compute standard error via bootstrap
        se = self._bootstrap_se(obs, n_bootstrap=100)
        
        return float(expval) * coefficient, se * abs(coefficient)
    
    def _bootstrap_se(self, obs, n_bootstrap: int = 100) -> float:
        """Estimate standard error via bootstrap."""
        rng = np.random.default_rng(self.seed + 1000)
        n_shadows = self.bits.shape[0]
        
        estimates = []
        for _ in range(n_bootstrap):
            idx = rng.choice(n_shadows, size=n_shadows, replace=True)
            boot_bits = self.bits[idx]
            boot_recipes = self.recipes[idx]
            est = qml.shadows.expval(obs, boot_bits, boot_recipes)
            estimates.append(float(est))
        
        return float(np.std(estimates))
    
    def estimate_all(self, observable_set) -> dict:
        """Estimate all observables in a set.
        
        Args:
            observable_set: QuartumSE ObservableSet
        
        Returns:
            Dict mapping observable_id to (estimate, se)
        """
        results = {}
        for obs in observable_set.observables:
            est, se = self.estimate(obs.pauli_string, obs.coefficient)
            results[obs.observable_id] = {'estimate': est, 'se': se}
        return results


print("PennyLane shadows utilities defined!")

---

## 3. PennyLane Direct Measurement

For comparison, implement direct measurement baselines in PennyLane.

In [None]:
# =============================================================================
# PENNYLANE DIRECT MEASUREMENT
# =============================================================================

class PennyLaneDirectEstimator:
    """Direct measurement estimator using PennyLane.
    
    Measures each observable in its native basis.
    """
    
    def __init__(self, n_qubits: int, shots_per_observable: int, seed: int = 42):
        self.n_qubits = n_qubits
        self.shots_per_observable = shots_per_observable
        self.seed = seed
        self._circuit_fn = None
    
    def set_circuit(self, circuit_fn):
        """Set the state preparation circuit."""
        self._circuit_fn = circuit_fn
        return self
    
    def estimate(self, pauli_string: str, coefficient: float = 1.0) -> tuple:
        """Estimate expectation value via direct measurement.
        
        Args:
            pauli_string: e.g., 'XZIY'
            coefficient: Observable coefficient
        
        Returns:
            Tuple of (expectation_value, standard_error)
        """
        if self._circuit_fn is None:
            raise ValueError("Must call set_circuit() before estimate()")
        
        obs = pauli_string_to_pennylane(pauli_string)
        
        dev = qml.device('default.qubit', wires=self.n_qubits, 
                        shots=self.shots_per_observable, seed=self.seed)
        
        @qml.qnode(dev)
        def measure_circuit():
            self._circuit_fn()
            return qml.expval(obs)
        
        # Run multiple times to get samples for SE estimation
        estimates = []
        rng = np.random.default_rng(self.seed)
        n_reps = 10  # Number of repetitions for SE estimation
        
        for i in range(n_reps):
            dev_i = qml.device('default.qubit', wires=self.n_qubits,
                              shots=self.shots_per_observable, 
                              seed=int(rng.integers(0, 2**31)))
            
            @qml.qnode(dev_i)
            def measure_i():
                self._circuit_fn()
                return qml.expval(obs)
            
            estimates.append(float(measure_i()))
        
        mean_est = np.mean(estimates) * coefficient
        se = np.std(estimates, ddof=1) / np.sqrt(n_reps) * abs(coefficient)
        
        return mean_est, se
    
    def estimate_all(self, observable_set) -> dict:
        """Estimate all observables in a set."""
        results = {}
        for obs in observable_set.observables:
            est, se = self.estimate(obs.pauli_string, obs.coefficient)
            results[obs.observable_id] = {'estimate': est, 'se': se}
        return results


def compute_ground_truth_pennylane(circuit_fn, n_qubits: int, observable_set) -> dict:
    """Compute exact ground truth using PennyLane statevector.
    
    Args:
        circuit_fn: State preparation function
        n_qubits: Number of qubits
        observable_set: QuartumSE ObservableSet
    
    Returns:
        Dict mapping observable_id to exact expectation value
    """
    dev = qml.device('default.qubit', wires=n_qubits)
    
    truth = {}
    for obs in observable_set.observables:
        pl_obs = pauli_string_to_pennylane(obs.pauli_string)
        
        @qml.qnode(dev)
        def exact_expval():
            circuit_fn()
            return qml.expval(pl_obs)
        
        truth[obs.observable_id] = float(exact_expval()) * obs.coefficient
    
    return truth


print("PennyLane direct measurement utilities defined!")

---

## 4. Benchmark Configuration

In [None]:
# =============================================================================
# BENCHMARK CONFIGURATION
# =============================================================================

# Shot budgets to test
N_SHOTS_GRID = [100, 500, 1000, 2000]

# Number of independent runs for statistics
N_REPLICATES = 5

# Random seed
SEED = 42

# Circuits to benchmark
CIRCUITS_CONFIG = {
    'GHZ-4': {
        'enabled': True,
        'n_qubits': 4,
        'circuit_fn': lambda: pl_ghz_circuit(4),
        'suites': make_ghz_suites(4),
    },
    'GHZ-6': {
        'enabled': False,
        'n_qubits': 6,
        'circuit_fn': lambda: pl_ghz_circuit(6),
        'suites': make_ghz_suites(6),
    },
    'Bell-2': {
        'enabled': True,
        'n_qubits': 4,
        'circuit_fn': lambda: pl_bell_pairs_circuit(2),
        'suites': make_bell_suites(2),
    },
    'Ising-4': {
        'enabled': False,
        'n_qubits': 4,
        'circuit_fn': lambda: pl_ising_trotter_circuit(4, steps=3),
        'suites': make_ising_suites(4),
    },
    'QAOA-5': {
        'enabled': False,
        'n_qubits': 5,
        'circuit_fn': lambda: pl_qaoa_maxcut_ring_circuit(5, p=1),
        'suites': make_qaoa_ring_suites(5),
    },
}

# Which suite types to run
SUITE_TYPES_ENABLED = ['workload', 'commuting']

# Filter enabled circuits
enabled_circuits = {k: v for k, v in CIRCUITS_CONFIG.items() if v['enabled']}

print(f"Shots grid: {N_SHOTS_GRID}")
print(f"Replicates: {N_REPLICATES}")
print(f"Enabled circuits: {list(enabled_circuits.keys())}")
print(f"Suite types: {SUITE_TYPES_ENABLED}")

---

## 5. Run PennyLane Benchmarks

In [None]:
# =============================================================================
# RUN PENNYLANE BENCHMARKS
# =============================================================================

def run_pennylane_benchmark(
    circuit_fn,
    n_qubits: int,
    observable_set,
    n_shots: int,
    seed: int,
):
    """Run benchmark comparing PennyLane shadows vs direct measurement.
    
    Returns:
        Dict with results for each method
    """
    results = {}
    
    # 1. PennyLane Classical Shadows
    shadows = PennyLaneShadowsEstimator(n_qubits, n_shots, seed)
    shadows.collect(circuit_fn)
    results['pennylane_shadows'] = shadows.estimate_all(observable_set)
    
    # 2. PennyLane Direct Measurement (uniform allocation)
    n_obs = len(observable_set.observables)
    shots_per_obs = max(1, n_shots // n_obs)
    direct = PennyLaneDirectEstimator(n_qubits, shots_per_obs, seed)
    direct.set_circuit(circuit_fn)
    results['pennylane_direct'] = direct.estimate_all(observable_set)
    
    return results


def run_full_benchmark_suite():
    """Run complete benchmark suite across all configurations."""
    all_results = {}
    
    for circuit_name, config in enabled_circuits.items():
        print(f"\n{'='*70}")
        print(f"CIRCUIT: {circuit_name} ({config['n_qubits']} qubits)")
        print(f"{'='*70}")
        
        circuit_fn = config['circuit_fn']
        n_qubits = config['n_qubits']
        
        # Filter suites by type
        suites = {}
        for suite_name, suite in config['suites'].items():
            if suite.suite_type.value in SUITE_TYPES_ENABLED:
                suites[suite_name] = suite
            elif any(t in suite_name for t in SUITE_TYPES_ENABLED):
                suites[suite_name] = suite
        
        for suite_name, suite in suites.items():
            print(f"\n  Suite: {suite_name} ({suite.n_observables} observables)")
            
            # Compute ground truth
            print("    Computing ground truth...")
            truth = compute_ground_truth_pennylane(
                circuit_fn, n_qubits, suite.observable_set
            )
            
            suite_results = {
                'circuit': circuit_name,
                'suite': suite_name,
                'n_qubits': n_qubits,
                'n_observables': suite.n_observables,
                'ground_truth': truth,
                'runs': [],
            }
            
            for n_shots in N_SHOTS_GRID:
                print(f"    N={n_shots}: ", end="", flush=True)
                
                for rep in range(N_REPLICATES):
                    seed = SEED + rep * 1000 + n_shots
                    
                    results = run_pennylane_benchmark(
                        circuit_fn, n_qubits, suite.observable_set, n_shots, seed
                    )
                    
                    # Compute errors
                    for method, estimates in results.items():
                        for obs_id, data in estimates.items():
                            true_val = truth[obs_id]
                            error = abs(data['estimate'] - true_val)
                            
                            suite_results['runs'].append({
                                'n_shots': n_shots,
                                'replicate': rep,
                                'method': method,
                                'observable_id': obs_id,
                                'estimate': data['estimate'],
                                'se': data['se'],
                                'true_value': true_val,
                                'abs_error': error,
                            })
                    
                    print(".", end="", flush=True)
                print(" done")
            
            key = f"{circuit_name}__{suite_name}"
            all_results[key] = suite_results
    
    return all_results


print("Starting PennyLane benchmark suite...")
print(f"This will run {len(enabled_circuits)} circuits x {len(N_SHOTS_GRID)} shots x {N_REPLICATES} replicates")

pennylane_results = run_full_benchmark_suite()

print(f"\n{'='*70}")
print(f"BENCHMARK COMPLETE: {len(pennylane_results)} suite runs")
print(f"{'='*70}")

---

## 6. Results Analysis

In [None]:
# =============================================================================
# ANALYZE PENNYLANE BENCHMARK RESULTS
# =============================================================================

def analyze_benchmark_results(results: dict):
    """Analyze and summarize benchmark results."""
    
    for key, data in results.items():
        print(f"\n{'='*80}")
        print(f"{key}")
        print(f"  {data['n_qubits']}q, {data['n_observables']} observables")
        print(f"{'='*80}")
        
        # Group by method and n_shots
        by_method_shots = defaultdict(lambda: defaultdict(list))
        for run in data['runs']:
            method = run['method']
            n_shots = run['n_shots']
            by_method_shots[method][n_shots].append(run)
        
        methods = sorted(by_method_shots.keys())
        
        # Header
        print(f"\n{'N_shots':>10}", end="")
        for method in methods:
            short = method.replace('pennylane_', 'PL_')
            print(f" {short + '_SE':>15} {short + '_err':>15}", end="")
        print("  Winner")
        print("-" * (10 + len(methods) * 32 + 10))
        
        # Results by N
        for n_shots in N_SHOTS_GRID:
            print(f"{n_shots:>10}", end="")
            
            method_stats = {}
            for method in methods:
                runs = by_method_shots[method][n_shots]
                if runs:
                    ses = [r['se'] for r in runs if r['se'] is not None]
                    errors = [r['abs_error'] for r in runs]
                    mean_se = np.mean(ses) if ses else float('nan')
                    mean_err = np.mean(errors)
                    method_stats[method] = {'se': mean_se, 'err': mean_err}
                    print(f" {mean_se:>15.4f} {mean_err:>15.4f}", end="")
                else:
                    print(f" {'N/A':>15} {'N/A':>15}", end="")
            
            # Determine winner by mean absolute error
            if method_stats:
                winner = min(method_stats, key=lambda m: method_stats[m]['err'])
                short_winner = winner.replace('pennylane_', '')
                print(f"  {short_winner}")
            else:
                print()
        
        print()


analyze_benchmark_results(pennylane_results)

In [None]:
# =============================================================================
# SHADOWS VS DIRECT COMPARISON SUMMARY
# =============================================================================

def compute_method_comparison(results: dict):
    """Compute head-to-head comparison between methods."""
    
    print("\n" + "="*80)
    print("METHOD COMPARISON SUMMARY")
    print("="*80)
    
    shadows_wins = 0
    direct_wins = 0
    ties = 0
    
    for key, data in results.items():
        # Get results at max N
        max_n = max(N_SHOTS_GRID)
        
        shadows_errors = []
        direct_errors = []
        
        for run in data['runs']:
            if run['n_shots'] == max_n:
                if run['method'] == 'pennylane_shadows':
                    shadows_errors.append(run['abs_error'])
                elif run['method'] == 'pennylane_direct':
                    direct_errors.append(run['abs_error'])
        
        if shadows_errors and direct_errors:
            shadows_mean = np.mean(shadows_errors)
            direct_mean = np.mean(direct_errors)
            
            if shadows_mean < direct_mean * 0.95:  # 5% margin
                winner = "SHADOWS"
                shadows_wins += 1
            elif direct_mean < shadows_mean * 0.95:
                winner = "DIRECT"
                direct_wins += 1
            else:
                winner = "TIE"
                ties += 1
            
            ratio = shadows_mean / direct_mean if direct_mean > 0 else float('inf')
            
            print(f"\n{key}:")
            print(f"  Shadows mean error: {shadows_mean:.4f}")
            print(f"  Direct mean error:  {direct_mean:.4f}")
            print(f"  Ratio (S/D):        {ratio:.2f}x")
            print(f"  Winner:             {winner}")
    
    print(f"\n{'='*80}")
    print(f"OVERALL: Shadows wins {shadows_wins}, Direct wins {direct_wins}, Ties {ties}")
    print(f"{'='*80}")


compute_method_comparison(pennylane_results)

---

## 7. Comparison with QuartumSE

Now compare PennyLane's results with our QuartumSE implementation.

In [None]:
# =============================================================================
# RUN QUARTUMSE FOR COMPARISON
# =============================================================================

from qiskit import QuantumCircuit
from quartumse.shadows import RandomLocalCliffordShadows
from quartumse.shadows.config import ShadowConfig
from quartumse.shadows.core import Observable as ShadowObservable


def build_qiskit_ghz(n_qubits: int) -> QuantumCircuit:
    """GHZ circuit in Qiskit (for QuartumSE)."""
    qc = QuantumCircuit(n_qubits, name=f'GHZ_{n_qubits}q')
    qc.h(0)
    for i in range(1, n_qubits):
        qc.cx(i - 1, i)
    return qc


def build_qiskit_bell_pairs(n_pairs: int) -> QuantumCircuit:
    """Bell pairs circuit in Qiskit."""
    n_qubits = 2 * n_pairs
    qc = QuantumCircuit(n_qubits, name=f'Bell_{n_pairs}pairs')
    for i in range(n_pairs):
        qc.h(2 * i)
        qc.cx(2 * i, 2 * i + 1)
    return qc


def run_quartumse_shadows(circuit, observable_set, n_shots: int, seed: int):
    """Run QuartumSE classical shadows."""
    from qiskit.quantum_info import Statevector
    
    n_qubits = circuit.num_qubits
    
    # Configure shadows
    config = ShadowConfig(
        num_shadows=n_shots,
        random_seed=seed,
        median_of_means=False,
    )
    shadows = RandomLocalCliffordShadows(config)
    
    # Generate random bases and simulate measurements
    rng = np.random.default_rng(seed)
    measurement_bases = rng.integers(0, 3, size=(n_shots, n_qubits))
    
    # Simulate measurements
    sv = Statevector.from_instruction(circuit)
    outcomes = np.zeros((n_shots, n_qubits), dtype=int)
    
    for shot_idx in range(n_shots):
        bases = measurement_bases[shot_idx]
        rotated_circuit = circuit.copy()
        
        for q in range(n_qubits):
            if bases[q] == 1:  # X basis
                rotated_circuit.h(q)
            elif bases[q] == 2:  # Y basis
                rotated_circuit.sdg(q)
                rotated_circuit.h(q)
        
        rotated_sv = Statevector.from_instruction(rotated_circuit)
        probs = rotated_sv.probabilities()
        outcome_int = rng.choice(len(probs), p=probs)
        
        for q in range(n_qubits):
            outcomes[shot_idx, q] = (outcome_int >> q) & 1
    
    # Reconstruct shadow
    shadows.reconstruct_classical_shadow(outcomes, measurement_bases)
    
    # Estimate observables
    results = {}
    for obs in observable_set.observables:
        shadow_obs = ShadowObservable(
            pauli_string=obs.pauli_string,
            coefficient=obs.coefficient,
        )
        estimate = shadows.estimate_observable(shadow_obs)
        
        # Compute SE from variance
        se = np.sqrt(estimate.variance / n_shots) if estimate.variance > 0 else 0.0
        
        results[obs.observable_id] = {
            'estimate': estimate.expectation_value,
            'se': se,
        }
    
    return results


print("QuartumSE comparison functions defined!")

In [None]:
# =============================================================================
# CROSS-LIBRARY COMPARISON: PennyLane vs QuartumSE
# =============================================================================

def run_cross_library_comparison():
    """Compare PennyLane and QuartumSE shadows implementations."""
    
    print("\n" + "="*80)
    print("CROSS-LIBRARY COMPARISON: PennyLane vs QuartumSE")
    print("="*80)
    
    # Test configuration
    n_qubits = 4
    n_shots = 1000
    seed = 42
    
    # Use GHZ-4 workload suite
    suites = make_ghz_suites(4)
    suite = suites.get('workload_stabilizers') or list(suites.values())[0]
    observable_set = suite.observable_set
    
    print(f"\nTest configuration:")
    print(f"  Circuit: GHZ-{n_qubits}")
    print(f"  N_shots: {n_shots}")
    print(f"  Observables: {len(observable_set.observables)}")
    
    # Compute ground truth
    print("\nComputing ground truth...")
    truth = compute_ground_truth_pennylane(
        lambda: pl_ghz_circuit(n_qubits), 
        n_qubits, 
        observable_set
    )
    
    # Run PennyLane shadows
    print("Running PennyLane shadows...")
    pl_shadows = PennyLaneShadowsEstimator(n_qubits, n_shots, seed)
    pl_shadows.collect(lambda: pl_ghz_circuit(n_qubits))
    pl_results = pl_shadows.estimate_all(observable_set)
    
    # Run QuartumSE shadows
    print("Running QuartumSE shadows...")
    qiskit_circuit = build_qiskit_ghz(n_qubits)
    qse_results = run_quartumse_shadows(qiskit_circuit, observable_set, n_shots, seed)
    
    # Compare results
    print("\n" + "="*80)
    print("RESULTS COMPARISON")
    print("="*80)
    
    print(f"\n{'Observable':<20} {'True':>10} {'PL_est':>10} {'QSE_est':>10} {'PL_err':>10} {'QSE_err':>10} {'Diff':>10}")
    print("-" * 82)
    
    pl_errors = []
    qse_errors = []
    
    for obs in observable_set.observables:
        obs_id = obs.observable_id
        true_val = truth[obs_id]
        
        pl_est = pl_results[obs_id]['estimate']
        qse_est = qse_results[obs_id]['estimate']
        
        pl_err = abs(pl_est - true_val)
        qse_err = abs(qse_est - true_val)
        diff = abs(pl_est - qse_est)
        
        pl_errors.append(pl_err)
        qse_errors.append(qse_err)
        
        print(f"{obs_id:<20} {true_val:>10.4f} {pl_est:>10.4f} {qse_est:>10.4f} "
              f"{pl_err:>10.4f} {qse_err:>10.4f} {diff:>10.4f}")
    
    print("-" * 82)
    print(f"{'MEAN':<20} {'':<10} {'':<10} {'':<10} "
          f"{np.mean(pl_errors):>10.4f} {np.mean(qse_errors):>10.4f} ")
    print(f"{'MAX':<20} {'':<10} {'':<10} {'':<10} "
          f"{np.max(pl_errors):>10.4f} {np.max(qse_errors):>10.4f} ")
    
    # Statistical comparison
    print("\n" + "="*80)
    print("STATISTICAL SUMMARY")
    print("="*80)
    
    pl_mean = np.mean(pl_errors)
    qse_mean = np.mean(qse_errors)
    ratio = pl_mean / qse_mean if qse_mean > 0 else float('inf')
    
    print(f"\nPennyLane mean absolute error: {pl_mean:.4f}")
    print(f"QuartumSE mean absolute error:  {qse_mean:.4f}")
    print(f"Ratio (PL / QSE):               {ratio:.2f}x")
    
    if ratio < 0.95:
        print("\n--> PennyLane performs BETTER")
    elif ratio > 1.05:
        print("\n--> QuartumSE performs BETTER")
    else:
        print("\n--> Implementations are COMPARABLE")
    
    return {
        'pennylane_errors': pl_errors,
        'quartumse_errors': qse_errors,
        'truth': truth,
    }


comparison_results = run_cross_library_comparison()

In [None]:
# =============================================================================
# SCALING COMPARISON: Error vs N_shots
# =============================================================================

def run_scaling_comparison():
    """Compare how error scales with shot budget."""
    
    print("\n" + "="*80)
    print("SCALING COMPARISON: Error vs N_shots")
    print("="*80)
    
    n_qubits = 4
    shot_budgets = [100, 200, 500, 1000, 2000]
    n_reps = 3
    
    # Use GHZ-4 stabilizers
    suites = make_ghz_suites(4)
    suite = suites.get('workload_stabilizers') or list(suites.values())[0]
    observable_set = suite.observable_set
    
    # Ground truth
    truth = compute_ground_truth_pennylane(
        lambda: pl_ghz_circuit(n_qubits),
        n_qubits,
        observable_set
    )
    
    # Qiskit circuit for QuartumSE
    qiskit_circuit = build_qiskit_ghz(n_qubits)
    
    scaling_results = {'pennylane': {}, 'quartumse': {}}
    
    for n_shots in shot_budgets:
        print(f"\nN_shots = {n_shots}: ", end="", flush=True)
        
        pl_errors_all = []
        qse_errors_all = []
        
        for rep in range(n_reps):
            seed = SEED + rep * 1000 + n_shots
            
            # PennyLane
            pl_shadows = PennyLaneShadowsEstimator(n_qubits, n_shots, seed)
            pl_shadows.collect(lambda: pl_ghz_circuit(n_qubits))
            pl_results = pl_shadows.estimate_all(observable_set)
            
            # QuartumSE
            qse_results = run_quartumse_shadows(qiskit_circuit, observable_set, n_shots, seed)
            
            # Compute errors
            for obs_id in truth:
                pl_err = abs(pl_results[obs_id]['estimate'] - truth[obs_id])
                qse_err = abs(qse_results[obs_id]['estimate'] - truth[obs_id])
                pl_errors_all.append(pl_err)
                qse_errors_all.append(qse_err)
            
            print(".", end="", flush=True)
        
        scaling_results['pennylane'][n_shots] = {
            'mean_error': np.mean(pl_errors_all),
            'std_error': np.std(pl_errors_all),
        }
        scaling_results['quartumse'][n_shots] = {
            'mean_error': np.mean(qse_errors_all),
            'std_error': np.std(qse_errors_all),
        }
        print(" done")
    
    # Display scaling results
    print("\n" + "="*80)
    print("SCALING RESULTS")
    print("="*80)
    
    print(f"\n{'N_shots':>10} {'PL_error':>12} {'QSE_error':>12} {'Ratio':>10}")
    print("-" * 48)
    
    for n_shots in shot_budgets:
        pl_err = scaling_results['pennylane'][n_shots]['mean_error']
        qse_err = scaling_results['quartumse'][n_shots]['mean_error']
        ratio = pl_err / qse_err if qse_err > 0 else float('inf')
        
        print(f"{n_shots:>10} {pl_err:>12.4f} {qse_err:>12.4f} {ratio:>10.2f}x")
    
    return scaling_results


scaling_results = run_scaling_comparison()

---

## 8. Save Results

In [None]:
# =============================================================================
# SAVE BENCHMARK RESULTS
# =============================================================================

output_dir = Path('benchmark_results/pennylane')
output_dir.mkdir(parents=True, exist_ok=True)

timestamp = datetime.now().strftime('%Y%m%d_%H%M%S')

# Save PennyLane benchmark results
pennylane_output = {
    'timestamp': datetime.now().isoformat(),
    'config': {
        'n_shots_grid': N_SHOTS_GRID,
        'n_replicates': N_REPLICATES,
        'seed': SEED,
    },
    'results': {k: {
        'circuit': v['circuit'],
        'suite': v['suite'],
        'n_qubits': v['n_qubits'],
        'n_observables': v['n_observables'],
        'runs': v['runs'],
    } for k, v in pennylane_results.items()},
}

with open(output_dir / f'pennylane_benchmark_{timestamp}.json', 'w') as f:
    json.dump(pennylane_output, f, indent=2, default=str)

# Save scaling comparison
scaling_output = {
    'timestamp': datetime.now().isoformat(),
    'scaling_results': scaling_results,
}

with open(output_dir / f'scaling_comparison_{timestamp}.json', 'w') as f:
    json.dump(scaling_output, f, indent=2, default=str)

print(f"Results saved to: {output_dir}")
print(f"  - pennylane_benchmark_{timestamp}.json")
print(f"  - scaling_comparison_{timestamp}.json")

---

## Summary

This notebook benchmarks:

### Methods Compared

| Method | Library | Description |
|--------|---------|-------------|
| `pennylane_shadows` | PennyLane | Built-in classical shadows |
| `pennylane_direct` | PennyLane | Direct measurement baseline |
| `quartumse_shadows` | QuartumSE | Our shadows implementation |

### Key Findings

1. **Implementation Validation**: Compare QuartumSE vs PennyLane to validate correctness
2. **Performance Comparison**: Identify any accuracy differences
3. **Scaling Behavior**: Verify 1/sqrt(N) error scaling

### Next Steps

- Add noise models for realistic comparison
- Test on larger circuits
- Compare computational efficiency (runtime)