## PTSBE accuracy verification: noisy-reference XEB

This notebook validates that the PTSBE (Pre-Trajectory Sampling with Batch Execution) simulation implementation produces the same measurement distribution as the trusted density-matrix (DM) simulator for identical noisy circuits.

Histogram-based metrics like Hellinger fidelity break down at scale: with $D = 2^n$ possible bitstrings and a fixed shot budget $N$, most bins are empty once $n > 20$, so bin-by-bin comparisons are dominated by sampling noise. XEB avoids this by scoring each sample against the reference probability for that single outcome, giving an estimator that converges at $O(1/\sqrt{N})$ regardless of $D$. For a detailed introduction to XEB theory, see [Google's XEB tutorial](https://quantumai.google/cirq/noise/qcvv/xeb_theory).

We adapt Google's [cross-entropy benchmarking (XEB)](https://quantumai.google/cirq/noise/qcvv/xeb_theory) to work with a noisy reference distribution instead of an ideal statevector (as we can no longer rely on the Porter-Thomas distribution when our reference is noisy). The resulting metric, $F_{\text{noisy}}$, directly measures whether PTSBE samples from the same distribution as the DM simulator.

*Scalability note.* Computing $p_{\text{DM}}(x)$ requires the exact diagonal of the $2^n \times 2^n$ density matrix, which limits this approach to moderate qubit counts. For larger simulations where the full density matrix is infeasible, a sample-only metric like [Maximum Mean Discrepancy (MMD)](https://www.kaggle.com/code/onurtunali/maximum-mean-discrepancy) could compare the two sampling distributions directly, without access to exact probabilities. MMD measures the distance between two distributions using only samples from each, making it applicable at any scale where both PTSBE and standard noisy simulation can produce shots.

### Circuit family

We use XEB-style random circuits following Google's structure ([Arute et al., Nature 2019](https://www.nature.com/articles/s41586-019-1666-5)). Each circuit has $n$ qubits arranged in a linear chain and $m$ depth cycles. One cycle consists of:

- Single-qubit layer: A random gate from $\{\sqrt{X},\, \sqrt{Y},\, \sqrt{W}\}$ applied to each qubit, where $W = (X + Y)/\sqrt{2}$. No gate repeats on the same qubit in consecutive cycles.
- Two-qubit layer: CX (CNOT) gates in an alternating nearest-neighbor tiling (even pairs on even cycles, odd pairs on odd cycles).

This gate set produces distributions that are highly sensitive to per-gate errors, making it a strong probe for correctness. The implementation below uses $R_y(\theta)$ with random $\theta$ as a practical stand-in for the XEB gate set. The validation methodology is independent of the specific single-qubit gates chosen.

### Deriving the metric

The linear XEB fidelity estimator from [Arute et al.](https://www.nature.com/articles/s41586-019-1666-5) ([Supplementary Information, Eq. 17](https://static-content.springer.com/esm/art%3A10.1038%2Fs41586-019-1666-5/MediaObjects/41586_2019_1666_MOESM1_ESM.pdf)) relates the experimentally observed mean of $Dp_s(q) - 1$ to the fidelity $F$ and the sum of squared ideal probabilities:

$$\overline{\langle Dp_s(q) - 1\rangle} = F \left(D\sum_q p_s(q)^2 - 1\right)$$

Solving for $F$ and writing $e_U = \sum_q p_s(q)^2$ and $\hat{m}_U = \frac{1}{N}\sum_i p_s(x_i)$:

$$F = \frac{\hat{m}_U - 1/D}{e_U - 1/D}$$

In the standard setting, $p_s = p_{\text{ideal}}$ and the Porter-Thomas distribution gives $e_U \approx 2/D$, which simplifies the above to the familiar $F = D\langle p_{\text{ideal}}(x_i)\rangle - 1$. We cannot use this simplification because our reference is a noisy mixed state whose diagonal is not Porter-Thomas distributed.

We replace $p_s$ with $p_{\text{DM}}(x) = \langle x | \rho | x \rangle$, the diagonal of the density matrix from the trusted DM simulator. We then compute $e_U = \sum_x p_{\text{DM}}(x)^2$ exactly from the density matrix, with no distributional assumption:

$$Z = \sum_x p_{\text{DM}}(x)^2$$

The resulting metric is:

$$F_{\text{noisy}} = \frac{\hat{m}_U - 1/D}{Z - 1/D} = \frac{\frac{1}{N}\sum_i p_{\text{DM}}(x_i) \;-\; 1/D}{Z \;-\; 1/D} \qquad \text{where } D = 2^n$$

- $F_{\text{noisy}} = 1.0$: the PTSBE samples reproduce the exact noisy distribution ($\hat{m}_U = Z$, i.e. the cross-correlation with the reference equals the reference's self-score).
- $F_{\text{noisy}} = 0.0$: the PTSBE samples are indistinguishable from uniform ($\hat{m}_U = 1/D$).

### Protocol

For a random circuit $C$ drawn from the family above:

1. **Reference**: Run $C$ with noise on `density-matrix-cpu`. Extract the diagonal of the final density matrix to get $p_{\text{DM}}(x)$ for all $x \in \{0,1\}^n$. Compute $Z = \sum_x p_{\text{DM}}(x)^2$.
2. **PTSBE**: Run the same noisy circuit via `cudaq.ptsbe.sample()` to produce $N$ samples $\{x_1, \ldots, x_N\}$.
3. **Score**: Compute $\hat{m}_U = \frac{1}{N}\sum_i p_{\text{DM}}(x_i)$ and $F_{\text{noisy}} = (\hat{m}_U - 1/D) / (Z - 1/D)$.

In [1]:
import os, sys
os.environ["OMP_NUM_THREADS"] = str(os.cpu_count())

# TODO: Remove this sys.path override before pushing the PR.
sys.path.insert(0, os.path.expanduser(
    "~/Devel/cudaq-pstbe/vendor/cuda-quantum/build/python"))

import cudaq
import numpy as np
import time

print(f"cudaq loaded from: {cudaq.__path__[0]}")
assert hasattr(cudaq, "ptsbe"), "cudaq.ptsbe not found -- is the build current?"

DM_TARGET = "density-matrix-cpu"
try:
    cudaq.set_target("nvidia")
    PTSBE_TARGET = "nvidia"
except Exception:
    PTSBE_TARGET = "qpp-cpu"

cudaq.set_target(DM_TARGET)
cudaq.set_random_seed(42)
print(f"PTSBE target: {PTSBE_TARGET}")

cudaq loaded from: /home/talexander/Devel/cudaq-pstbe/vendor/cuda-quantum/build/python/cudaq
PTSBE target: nvidia


### Section A: Single-circuit walkthrough

We walk through the full protocol on one fixed circuit ($n=8$, $m=8$), introducing each utility function as we need it.

**Circuit construction.** `generate_angles` draws random $R_y$ rotation angles. `build_xeb_circuit` assembles cycles of $R_y$ rotations followed by a nearest-neighbor CNOT chain. Pass `with_mz=False` for `get_state()` (no measurement collapse) and `with_mz=True` for `sample()`.

In [2]:
def generate_angles(n, depth, rng):
    """Generate random rotation angles for an XEB circuit."""
    return rng.uniform(0, 2 * np.pi, size=n * depth).tolist()


def build_xeb_circuit(n, depth, angles, with_mz=True):
    """Build a random-rotation + CNOT-ladder circuit via the builder API."""
    kernel = cudaq.make_kernel()
    q = kernel.qalloc(n)
    idx = 0
    for d in range(depth):
        for i in range(n):
            kernel.ry(angles[idx], q[i])
            idx += 1
        for i in range(n - 1):
            kernel.cx(q[i], q[i + 1])
    if with_mz:
        kernel.mz(q)
    return kernel


rng = np.random.default_rng(0)
test_angles = generate_angles(n=3, depth=2, rng=rng)
test_kernel = build_xeb_circuit(3, 2, test_angles)
counts = cudaq.sample(test_kernel, shots_count=1000)
print(f"3-qubit, depth-2 circuit (no noise): {len(counts)} unique bitstrings from 1000 shots")
print(counts)

3-qubit, depth-2 circuit (no noise): 8 unique bitstrings from 1000 shots
{ 000:108 001:14 010:70 011:3 100:23 101:267 110:55 111:460 }



**Noise model.** PTSBE requires unitary-mixture noise channels (each Kraus operator is a scaled Pauli unitary). `build_noise_model` applies symmetric Pauli noise on single-qubit gates ($X/Y/Z$ each at $p_1/3$), two-qubit gates (15-term Pauli at $p_2/15$ each), and bit-flip readout error ($p_{\text{meas}}$).

In [3]:
def build_noise_model(p1=0.001, p2=0.01, p_meas=0.01):
    """Build a Pauli noise model with measurement readout error.

    PTSBE requires unitary-mixture channels. Pauli channels satisfy this:
    each Kraus operator is a Pauli unitary scaled by sqrt(p).
    """
    noise = cudaq.NoiseModel()
    noise.add_all_qubit_channel('ry', cudaq.Pauli1([p1 / 3] * 3))
    noise.add_all_qubit_channel('cx', cudaq.Pauli2([p2 / 15] * 15))
    noise.add_all_qubit_channel('mz', cudaq.BitFlipChannel(p_meas))
    return noise


noise_model = build_noise_model()
noisy_counts = cudaq.sample(test_kernel, noise_model=noise_model, shots_count=1000)
print(f"Same circuit with Pauli noise (p1=0.001, p2=0.01, p_meas=0.01):")
print(f"  {len(noisy_counts)} unique bitstrings from 1000 shots")
print(noisy_counts)

Same circuit with Pauli noise (p1=0.001, p2=0.01, p_meas=0.01):
  8 unique bitstrings from 1000 shots
{ 000:111 001:11 010:87 011:13 100:39 101:245 110:56 111:438 }



#### Density matrix reference

We build both kernel variants (with and without measurement) from the same angles up front, ensuring both simulators run the exact same circuit. `get_dm_diagonal` takes the measurement-free kernel, runs it on `density-matrix-cpu` with noise, and extracts the diagonal $p_{\text{DM}}(x) = \langle x|\rho|x\rangle$.

In [4]:
# Diagnostic: verify index mapping between DM diagonal and nvidia bitstrings.
# Prepare |100⟩ (X on qubit 0 only) and check which DM diagonal index is 1.0.
diag_k = cudaq.make_kernel()
diag_q = diag_k.qalloc(3)
diag_k.x(diag_q[0])

cudaq.set_target(DM_TARGET)
dm_state = np.array(cudaq.get_state(diag_k))
dm_diag = np.real(np.diag(dm_state))
dm_idx = int(np.argmax(dm_diag))
print(f"DM diagonal: |1,0,0> is at index {dm_idx} (binary {dm_idx:03b})")

cudaq.set_target(PTSBE_TARGET)
mz_k = cudaq.make_kernel()
mz_q = mz_k.qalloc(3)
mz_k.x(mz_q[0])
mz_k.mz(mz_q)
sample_bs = list(cudaq.sample(mz_k, shots_count=1).items())[0][0]
sample_idx = int(sample_bs, 2)
print(f"nvidia sample: |1,0,0> -> bitstring '{sample_bs}' = index {sample_idx}")
print(f"Indices match: {dm_idx == sample_idx}")
print(f"Reversed match: {dm_idx == int(sample_bs[::-1], 2)}")
print(f"Bit reversal {'IS' if dm_idx != sample_idx else 'is NOT'} needed")

DM diagonal: |1,0,0> is at index 1 (binary 001)
nvidia sample: |1,0,0> -> bitstring '100' = index 4
Indices match: False
Reversed match: True
Bit reversal IS needed


In [5]:
def _bit_reverse_permutation(n):
    """Return index array that maps MSB-ordered diagonal to LSB ordering."""
    D = 2 ** n
    return np.array([int(f"{i:0{n}b}"[::-1], 2) for i in range(D)])


def apply_readout_noise(probs, n, p_meas):
    """Apply per-qubit bit-flip measurement noise to a probability vector.

    For each qubit, independently flips the measurement outcome with
    probability p_meas.  This is the classical stochastic map
    [[1-p, p], [p, 1-p]] applied along each qubit axis.
    """
    if p_meas <= 0:
        return probs.copy()
    p = probs.reshape([2] * n).copy()
    for q in range(n):
        p0 = np.take(p, 0, axis=q)
        p1 = np.take(p, 1, axis=q)
        new_p0 = (1 - p_meas) * p0 + p_meas * p1
        new_p1 = p_meas * p0 + (1 - p_meas) * p1
        idx0 = [slice(None)] * n
        idx0[q] = 0
        idx1 = [slice(None)] * n
        idx1[q] = 1
        p[tuple(idx0)] = new_p0
        p[tuple(idx1)] = new_p1
    return p.flatten()


def get_dm_diagonal(kernel, noise_model, n, p_meas=0.0):
    """Run DM simulation with noise and return the diagonal probabilities.

    The density-matrix-cpu target (QPP) uses MSB qubit ordering. We permute
    the diagonal to LSB ordering so indices match cudaq.sample() bitstrings.
    If p_meas > 0, per-qubit bit-flip readout noise is applied classically.
    """
    cudaq.set_target(DM_TARGET)
    cudaq.set_noise(noise_model)
    state = cudaq.get_state(kernel)
    cudaq.unset_noise()
    dm = np.array(state)
    diag = np.real(np.diag(dm))
    probs = diag[_bit_reverse_permutation(n)]
    if p_meas > 0:
        probs = apply_readout_noise(probs, n, p_meas)
    return probs


n, depth = 8, 8
rng = np.random.default_rng(42)
angles = generate_angles(n, depth, rng)
noise_model = build_noise_model()
D = 2 ** n

kernel_no_mz = build_xeb_circuit(n, depth, angles, with_mz=False)
kernel_mz = build_xeb_circuit(n, depth, angles, with_mz=True)

p_dm = get_dm_diagonal(kernel_no_mz, noise_model, n, p_meas=0.01)
Z = np.sum(p_dm ** 2)
print(f"Circuit: n={n}, depth={depth}")
print(f"D = 2^{n} = {D}")
print(f"Z = sum p_DM(x)^2 = {Z:.6e}")
print(f"Uniform baseline 1/D = {1.0/D:.6e}")
print(f"Z - 1/D = {Z - 1.0/D:.6e}")
print(f"Top 5 probabilities: {sorted(p_dm, reverse=True)[:5]}")

Circuit: n=8, depth=8
D = 2^8 = 256
Z = sum p_DM(x)^2 = 6.766674e-03
Uniform baseline 1/D = 3.906250e-03
Z - 1/D = 2.860424e-03
Top 5 probabilities: [0.024827088555991517, 0.018399210799349003, 0.01797522610285463, 0.01662706379473549, 0.01576152828280933]


#### Noisy sampling

`run_noisy_sample` abstracts over the sampling backend. Use `method="standard"` for standard noisy sampling (`cudaq.sample` on `density-matrix-cpu`) and `method="ptsbe"` for PTSBE. This lets us A/B test the two methods on every experiment.

In [6]:
def run_noisy_sample(kernel, noise_model, shots, method="ptsbe",
                     max_traj=1000, seed=42):
    """Sample a noisy circuit using either standard DM simulation or PTSBE.

    method="standard" -- cudaq.sample() with noise on density-matrix-cpu
    method="ptsbe"    -- cudaq.ptsbe.sample() on PTSBE_TARGET
    """
    if method == "standard":
        cudaq.set_target(DM_TARGET)
        return cudaq.sample(kernel, noise_model=noise_model, shots_count=shots)
    elif method == "ptsbe":
        cudaq.set_target(PTSBE_TARGET)
        strategy = cudaq.ptsbe.ProbabilisticSamplingStrategy(seed=seed)
        return cudaq.ptsbe.sample(
            kernel, noise_model=noise_model, shots_count=shots,
            sampling_strategy=strategy, max_trajectories=max_traj,
        )
    else:
        raise ValueError(f"Unknown method: {method}")


N = 100_000
for method in ["standard", "ptsbe"]:
    result = run_noisy_sample(kernel_mz, noise_model, shots=N, method=method)
    top5 = sorted(result.items(), key=lambda kv: kv[1], reverse=True)[:5]
    print(f"{method:>8s}: {len(result)} unique bitstrings, "
          f"top 5: {[(bs, c) for bs, c in top5]}")

standard: 256 unique bitstrings, top 5: [('00010101', 2445), ('11000111', 1796), ('10111000', 1715), ('00001001', 1700), ('11111000', 1578)]
   ptsbe: 256 unique bitstrings, top 5: [('00010101', 3223), ('11000111', 2289), ('10111000', 2252), ('00001001', 2066), ('01000010', 1902)]


#### Scoring

`compute_f_noisy` looks up $p_{\text{DM}}(x_i)$ for each sampled bitstring, computes the mean $\hat{m}_U$, and returns $F_{\text{noisy}} = (\hat{m}_U - 1/D) / (Z - 1/D)$. We score both `standard` (density-matrix noisy sampling via `cudaq.sample`) and `ptsbe` side-by-side. The standard result serves as a sanity check: both methods should produce $F_{\text{noisy}} \approx 1.0$.

In [7]:
def compute_f_noisy(p_dm, result, n):
    """Compute noisy-reference XEB fidelity F_noisy."""
    D = 2 ** n
    N = result.get_total_shots()
    Z = np.sum(p_dm ** 2)
    e = sum(count * p_dm[int(bs, 2)] for bs, count in result.items()) / N
    f = (e - 1.0 / D) / (Z - 1.0 / D)
    return f, e, Z


N = 100_000
print(f"A/B comparison (n={n}, depth={depth}, N={N} shots):\n")
for method in ["standard", "ptsbe"]:
    result = run_noisy_sample(kernel_mz, noise_model, shots=N, method=method)
    f, e, _ = compute_f_noisy(p_dm, result, n)
    top5 = sorted(result.items(), key=lambda kv: kv[1], reverse=True)[:5]
    print(f"  {method:>8s}: F_noisy = {f:.4f}, E = {e:.6e}")
    print(f"            top 5: {[(bs, c) for bs, c in top5]}")

A/B comparison (n=8, depth=8, N=100000 shots):

  standard: F_noisy = 1.0041, E = 6.778538e-03
            top 5: [('00010101', 2439), ('10111000', 1839), ('11000111', 1834), ('00001001', 1595), ('11111000', 1590)]
     ptsbe: F_noisy = 1.2951, E = 7.610871e-03
            top 5: [('00010101', 3117), ('11000111', 2314), ('10111000', 2289), ('00001001', 1997), ('11111000', 1923)]


#### Trajectory truncation bias

PTSBE generates a finite number of Kraus trajectories (`max_trajectories`) from an exponentially large trajectory space. The proportional shot allocation distributes all requested shots across only the selected trajectories, renormalized by their combined probability mass $P_S = \sum_{k \in S} p_k$. When $P_S < 1$, the selected (predominantly low-error) trajectories receive more weight than they should, making the output distribution sharper than the true density matrix. This drives $F_{\text{noisy}} > 1$.

The diagnostic below sweeps `max_trajectories` and reports the captured probability mass $P_S$ alongside $F_{\text{noisy}}$, confirming the relationship.

In [8]:
# Diagnostic: trajectory truncation bias.
# When max_trajectories is much smaller than the total trajectory space,
# the proportional shot allocation over-weights the selected (low-error)
# trajectories.  Increasing max_trajectories should bring F_noisy → 1.0.
p_dm_ref = get_dm_diagonal(kernel_no_mz, noise_model, n, p_meas=0.01)

print(f"Trajectory truncation diagnostic (n={n}, depth={depth}, N={N})\n")
print(f"{'max_traj':>10s}  {'n_traj':>8s}  {'P_captured':>10s}  {'F_noisy':>8s}")
for mt in [50, 200, 500, 1000, 2000, 5000]:
    cudaq.set_target(PTSBE_TARGET)
    strategy = cudaq.ptsbe.ProbabilisticSamplingStrategy(seed=42)
    res = cudaq.ptsbe.sample(
        kernel_mz, noise_model=noise_model, shots_count=N,
        sampling_strategy=strategy, max_trajectories=mt,
        return_execution_data=True,
    )
    trajs = res.ptsbe_execution_data.trajectories
    p_captured = sum(t.probability for t in trajs)
    n_traj = len(trajs)
    f, _, _ = compute_f_noisy(p_dm_ref, res, n)
    print(f"{mt:>10d}  {n_traj:>8d}  {p_captured:>10.6f}  {f:>8.4f}")

p_meas=0.0 (readout noise excluded in reference):
  standard: F_noisy = 0.9969
     ptsbe: F_noisy = 1.2528
p_meas=0.01 (readout noise included in reference):
  standard: F_noisy = 0.9989
     ptsbe: F_noisy = 1.3024


### Section B: Negative control

Demonstrate that $F_{\text{noisy}}$ has discriminating power. Two conditions:
- **Matched**: PTSBE uses the same noise model as the DM reference ($p_1=0.001$, $p_2=0.01$).
- **Mismatched**: PTSBE uses heavily biased noise ($p_1=0.05$, $p_2=0.15$, $p_{\text{meas}}=0.10$), driving the output distribution toward uniform.

The mismatched case should score well below 1.0, proving the metric detects distribution differences.

In [9]:
n, depth = 8, 8
rng = np.random.default_rng(100)
angles = generate_angles(n, depth, rng)
kernel_no_mz = build_xeb_circuit(n, depth, angles, with_mz=False)
kernel_mz = build_xeb_circuit(n, depth, angles, with_mz=True)

noise_matched = build_noise_model()
noise_mismatched = build_noise_model(p1=0.05, p2=0.15, p_meas=0.10)

p_dm = get_dm_diagonal(kernel_no_mz, noise_matched, n, p_meas=0.01)
N = 100_000

print("Negative control (n=8, depth=8)\n")
for label, noise in [("matched", noise_matched), ("mismatched", noise_mismatched)]:
    print(f"  {label}:")
    for method in ["standard", "ptsbe"]:
        result = run_noisy_sample(kernel_mz, noise, shots=N, method=method)
        f, _, _ = compute_f_noisy(p_dm, result, n)
        print(f"    {method:>8s}: F_noisy = {f:.4f}")

KeyboardInterrupt: 

### Section C: Width/depth correctness sweep

$F_{\text{noisy}}$ vs depth for each width. All values should cluster near 1.0. This is the primary correctness demonstration.

The CPU density-matrix simulator stores a $2^n \times 2^n$ complex matrix, so practical limits are $n = 10$-$12$. The full test suite (in CI) should use $n$ up to 24 with GPU-backed simulation.

In [None]:
widths = [4, 6, 8, 10]
depths = [2, 4, 8]
n_instances = 5
N = 50_000
methods = ["standard", "ptsbe"]

sweep_results = {m: {} for m in methods}
rng = np.random.default_rng(42)

for n in widths:
    noise_model = build_noise_model()
    for depth in depths:
        f_by_method = {m: [] for m in methods}
        for inst in range(n_instances):
            angles = generate_angles(n, depth, rng)
            k_no_mz = build_xeb_circuit(n, depth, angles, with_mz=False)
            k_mz = build_xeb_circuit(n, depth, angles, with_mz=True)
            p_dm = get_dm_diagonal(k_no_mz, noise_model, n, p_meas=0.01)
            for m in methods:
                result = run_noisy_sample(k_mz, noise_model,
                                          shots=N, max_traj=500, method=m)
                f, _, _ = compute_f_noisy(p_dm, result, n)
                f_by_method[m].append(f)
        parts = []
        for m in methods:
            mean_f = np.mean(f_by_method[m])
            std_f = np.std(f_by_method[m])
            sweep_results[m][(n, depth)] = (mean_f, std_f)
            parts.append(f"{m}={mean_f:.4f}+/-{std_f:.4f}")
        print(f"n={n:2d}, depth={depth:2d}:  {',  '.join(parts)}")

for m in methods:
    ok = all(v[0] >= 0.95 for v in sweep_results[m].values())
    print(f"\n{m}: all (n,depth) pass F >= 0.95: {ok}")

### Section D: Trajectory convergence

$F_{\text{noisy}}$ vs number of trajectories for a fixed circuit and fixed total shots. As trajectories increase, $F_{\text{noisy}}$ should converge toward 1.0. This shows how many trajectories are needed for a given accuracy.

In [None]:
n, depth = 8, 8
N = 100_000
rng = np.random.default_rng(200)
angles = generate_angles(n, depth, rng)
kernel_no_mz = build_xeb_circuit(n, depth, angles, with_mz=False)
kernel_mz = build_xeb_circuit(n, depth, angles, with_mz=True)
noise_model = build_noise_model()
p_dm = get_dm_diagonal(kernel_no_mz, noise_model, n, p_meas=0.01)

traj_counts = [10, 25, 50, 100, 250, 500, 1000]

# Standard DM baseline (no trajectory parameter)
result_std = run_noisy_sample(kernel_mz, noise_model, shots=N, method="standard")
f_std, _, _ = compute_f_noisy(p_dm, result_std, n)

print(f"Trajectory convergence (n={n}, depth={depth}, N={N})")
print(f"  {'standard':>20s}: F_noisy = {f_std:.4f}")
for mt in traj_counts:
    result = run_noisy_sample(kernel_mz, noise_model,
                              shots=N, max_traj=mt, method="ptsbe")
    f, _, _ = compute_f_noisy(p_dm, result, n)
    print(f"  ptsbe (traj={mt:5d}): F_noisy = {f:.4f}")

### Section E: Shot convergence

$F_{\text{noisy}}$ vs number of shots $N$ for a fixed circuit and fixed trajectories. Shows how many samples are needed for the metric itself to stabilize. Error bars from multiple runs at each $N$.

In [None]:
n, depth = 8, 8
max_traj = 500
rng = np.random.default_rng(300)
angles = generate_angles(n, depth, rng)
kernel_no_mz = build_xeb_circuit(n, depth, angles, with_mz=False)
kernel_mz = build_xeb_circuit(n, depth, angles, with_mz=True)
noise_model = build_noise_model()
p_dm = get_dm_diagonal(kernel_no_mz, noise_model, n, p_meas=0.01)

shot_counts = [1_000, 5_000, 10_000, 50_000, 100_000, 500_000]
n_repeats = 3

print(f"Shot convergence (n={n}, depth={depth}, max_trajectories={max_traj})")
for N in shot_counts:
    parts = []
    for method in ["standard", "ptsbe"]:
        f_values = []
        for rep in range(n_repeats):
            result = run_noisy_sample(kernel_mz, noise_model,
                                      shots=N, max_traj=max_traj,
                                      seed=400 + rep, method=method)
            f, _, _ = compute_f_noisy(p_dm, result, n)
            f_values.append(f)
        parts.append(f"{method}={np.mean(f_values):.4f}+/-{np.std(f_values):.4f}")
    print(f"  N={N:>8d}:  {',  '.join(parts)}")

### Section F: Noise strength sweep

$F_{\text{noisy}}$ vs single-qubit error rate $p_1$ for a fixed circuit. Both simulators use the same noise at each rate. $F_{\text{noisy}}$ should stay near 1.0 across all rates, confirming the metric is not regime-dependent.

In [None]:
n, depth = 8, 6
N = 100_000
rng = np.random.default_rng(500)
angles = generate_angles(n, depth, rng)
kernel_no_mz = build_xeb_circuit(n, depth, angles, with_mz=False)
kernel_mz = build_xeb_circuit(n, depth, angles, with_mz=True)

p1_values = [0.0001, 0.0005, 0.001, 0.005, 0.01, 0.05]

print(f"Noise strength sweep (n={n}, depth={depth}, N={N})")
for p1 in p1_values:
    p2 = p1 * 10
    p_meas = p2
    noise_model = build_noise_model(p1=p1, p2=p2, p_meas=p_meas)
    p_dm = get_dm_diagonal(kernel_no_mz, noise_model, n, p_meas=p_meas)
    parts = []
    for method in ["standard", "ptsbe"]:
        result = run_noisy_sample(kernel_mz, noise_model,
                                  shots=N, max_traj=1000, method=method)
        f, _, Z = compute_f_noisy(p_dm, result, n)
        parts.append(f"{method}={f:.4f}")
    print(f"  p1={p1:.4f}:  {',  '.join(parts)}  (Z={Z:.6e})")

### Section G: Performance comparison

Wall-clock time vs number of shots for a fixed circuit. Two methods: standard noisy sampling (`cudaq.sample` on `density-matrix-cpu`) and PTSBE sampling (`cudaq.ptsbe.sample`). PTSBE amortizes the cost of noise simulation across many shots, so the advantage grows with shot count.

In [None]:
n, depth = 10, 8
rng = np.random.default_rng(600)
angles = generate_angles(n, depth, rng)
noise_model = build_noise_model()
kernel_mz = build_xeb_circuit(n, depth, angles, with_mz=True)

shot_counts = [1_000, 10_000, 100_000, 1_000_000]

print(f"Performance comparison (n={n}, depth={depth})")
print(f"{'Shots':>10s}  {'Std (s)':>10s}  {'PTSBE (s)':>10s}  {'Speedup':>8s}")
for N in shot_counts:
    t0 = time.perf_counter()
    run_noisy_sample(kernel_mz, noise_model, shots=N, method="standard")
    t_std = time.perf_counter() - t0

    t0 = time.perf_counter()
    run_noisy_sample(kernel_mz, noise_model, shots=N, max_traj=500, method="ptsbe")
    t_ptsbe = time.perf_counter() - t0

    speedup = t_std / t_ptsbe if t_ptsbe > 0 else float('inf')
    print(f"{N:>10d}  {t_std:>10.3f}  {t_ptsbe:>10.3f}  {speedup:>7.1f}x")

### Section H: $F_{\text{noisy}}$ distribution

Summary statistics of $F_{\text{noisy}}$ across all circuit instances from Section C. A correct implementation produces values tightly concentrated near 1.0.

In [None]:
print("F_noisy distribution across all circuit instances from Section C:\n")
for method in ["standard", "ptsbe"]:
    vals = [v[0] for v in sweep_results[method].values()]
    f_arr = np.array(vals)
    print(f"  {method}:")
    print(f"    Mean:   {f_arr.mean():.4f}")
    print(f"    Std:    {f_arr.std():.4f}")
    print(f"    Min:    {f_arr.min():.4f}")
    print(f"    Max:    {f_arr.max():.4f}")
    print(f"    Fraction >= 0.95: {np.mean(f_arr >= 0.95):.1%}")