## PTSBE end-to-end workflow

PTSBE (Pre-Trajectory Sampling with Batch Execution) is a method for sampling from noisy quantum circuits efficiently. Instead of simulating the full density matrix and sampling once per shot, PTSBE:

1. Traces the kernel to get the gate sequence and qubit layout.
2. Extracts noise sites by matching the noise model to the trace (each noisy gate becomes a *noise site* with a set of Kraus outcomes, e.g. $I$, $X$, $Y$, $Z$ for depolarization).
3. Generates trajectories — each trajectory is one possible *realization* of noise (one Kraus outcome per site). A *sampling strategy* decides which trajectories to use (e.g. by sampling trajectories proporitional to their likelihood).
4. Allocates shots across trajectories (e.g. by error likelihood).
5. Runs batches — for each trajectory, the circuit is run as a noiseless circuit with that trajectory's outcomes applied; results are collected.
6. Aggregates all per-trajectory counts into a single `SampleResult`.

Given sufficient trajectory and shot samples PTSBE will be equivalent to standard trajectory based sampling up to sampling noise. However, it has the additional advantage of being able to batch many shots per trajectory and to control simulation cost via the number of trajectories. This notebook runs the full workflow with a single API call: `cudaq.ptsbe.sample()`.

### Set up the environment

All examples in this notebook use the statevector CPU simulator. Set a random seed for reproducibility.


In [1]:
import sys, os
_root = os.path.join(os.path.expanduser("~/Devel/cudaq-pstbe/vendor/cuda-quantum"), "build")
sys.path.insert(0, os.path.join(_root, "python"))
_build_lib = os.path.join(_root, "lib")
os.environ["LD_LIBRARY_PATH"] = _build_lib + ":" + os.environ.get("LD_LIBRARY_PATH", "")

'/home/talexander/.local/cutensor/lib:/home/talexander/.local/cuquantum/lib:/home/talexander/.llvm/lib:'

In [4]:
import cudaq

cudaq.set_target("nvidia")
cudaq.set_random_seed(42)

RuntimeError: Invalid simulator requested: custatevec_fp32

### Define the circuit and noise model

Define a kernel and attach a noise model. Each gate you add to the noise model becomes a noise site when that gate appears in the circuit. In this example, for single-qubit gates (e.g. $H$) we use a `DepolarizationChannel` with one qubit and for $CNOT$ we add the **qubit pair** `[control, target]` with `Depolarization2` (two-qubit depolarization channel). Here we use a small Bell-style circuit to demonstrate the

In [None]:
depol_1q = 0.05
depol_2q = 0.03

@cudaq.kernel
def bell_with_noise():
    q = cudaq.qvector(2)
    h(q[0])
    x.ctrl(q[0], q[1])
    mz(q)

noise = cudaq.NoiseModel()
noise.add_all_qubit_channel("h", cudaq.DepolarizationChannel(depol_1q))
noise.add_all_qubit_channel("x", cudaq.Depolarization2(depol_2q), num_controls=1)

#### Inline noise with `apply_noise`

Instead of (or in addition to) attaching noise via a `NoiseModel`, you can place noise at specific points in the kernel with `cudaq.apply_noise`. PTSBE traces these calls and includes them as noise sites alongside any model-attached noise.

In [None]:
@cudaq.kernel
def bell_inline_noise():
    q = cudaq.qvector(2)
    h(q[0])
    cudaq.apply_noise(cudaq.DepolarizationChannel, depol_1q, q[0])
    x.ctrl(q[0], q[1])
    cudaq.apply_noise(cudaq.Depolarization2, depol_2q, q[0], q[1])
    mz(q)

result_inline = cudaq.ptsbe.sample(
    bell_inline_noise,
    shots_count=10000,
)
print("Inline apply_noise result:")
print(result_inline)

### Run PTSBE sampling

Call `cudaq.ptsbe.sample()` with the kernel, noise model, and shot count. Optional arguments are the:

- `sampling_strategy` — how trajectories are chosen:
  - `ProbabilisticSamplingStrategy(seed=...)` (default): Performs Monte Carlo sampling over trajectories.
  - `ExhaustiveSamplingStrategy()`: use all possible trajectories (every combination of Kraus outcomes per noise site).
  - `OrderedSamplingStrategy()`: Select the top-$k$ trajectories by probability (highest first), up to `max_trajectories`.
- `shot_allocation` — how shots are split across the chosen trajectories:
  - `PROPORTIONAL` (default): allocate shots in proportion to each sampled trajectory's weighting.
  - `UNIFORM`: give each trajectory the same number of shots.
  - `LOW_WEIGHT_BIAS`: bias more shots toward low-weight (fewer errors) trajectories; optional `bias_strength` (default 2.0).
  - `HIGH_WEIGHT_BIAS`: bias more shots toward high-weight trajectories; optional `bias_strength` (default 2.0).
- `max_trajectories`: cap the number of trajectories (useful for large shot counts).
- `return_execution_data`: If ``True``, the result includes trace instructions and per-trajectory data (``result.ptsbe_execution_data``). This is currently an experimental API and subject to change in future releases. see section 5 below.

In [None]:
shots = 1000000

strategy = cudaq.ptsbe.ProbabilisticSamplingStrategy(seed=42)
result = cudaq.ptsbe.sample(
    bell_with_noise,
    noise_model=noise,
    shots_count=shots,
    sampling_strategy=strategy,
)

print("PTSBE sample result:")
print(result)
print(f"Total shots: {result.get_total_shots()}")

#### Compare with standard noisy sampling

To verify that PTSBE matches the usual noisy simulation, run standard `cudaq.sample()` with the same kernel and noise model on the statevector simulator. Standard noisy sampling runs one trajectory per shot. With enough shots, the two outcome distributions should be close.

In [None]:
result_standard = cudaq.sample(bell_with_noise, noise_model=noise, shots_count=shots)

print("Standard noisy sample result:")
print(result_standard)
print(f"Total shots: {result_standard.get_total_shots()}")

In [None]:
n_qubits = 12

@cudaq.kernel
def ghz(n: int):
    q = cudaq.qvector(n)
    h(q[0])
    for i in range(1, n):
        x.ctrl(q[i - 1], q[i])
    mz(q)

ghz_depol_1q = 0.01
ghz_depol_2q = 0.01

ghz_noise = cudaq.NoiseModel()
ghz_noise.add_all_qubit_channel("h", cudaq.DepolarizationChannel(ghz_depol_1q))
ghz_noise.add_all_qubit_channel("x", cudaq.Depolarization2(ghz_depol_2q), num_controls=1)

#### Timing: PTSBE vs standard noisy sampling

PTSBE batches many shots per unique trajectory, so the number of circuit simulations scales with the number of *unique trajectories* rather than the number of shots. Standard noisy sampling on the statevector simulator runs one trajectory per shot, requiring a full statevector simulation for each.

To show the difference we use a larger circuit: a 12-qubit GHZ state with depolarization on every gate. This gives 12 noise sites (1 H + 11 CNOTs) with a huge trajectory space. PTSBE caps trajectories with `max_trajectories` and batches 10M shots across just those trajectories.

In [None]:
import time

timing_shots = 10_000_000
max_traj = 256

# PTSBE (batches shots across unique trajectories)
cudaq.set_random_seed(42)
t0 = time.perf_counter()
result_ptsbe = cudaq.ptsbe.sample(
    ghz,
    n_qubits,
    noise_model=ghz_noise,
    shots_count=timing_shots,
    max_trajectories=max_traj,
)
t_ptsbe = time.perf_counter() - t0

# Standard noisy sampling (one trajectory per shot)
cudaq.set_random_seed(42)
t0 = time.perf_counter()
result_std = cudaq.sample(ghz, n_qubits, noise_model=ghz_noise, shots_count=timing_shots)
t_std = time.perf_counter() - t0

print(f"Circuit: {n_qubits}-qubit GHZ ({n_qubits} noise sites)")
print(f"Shots: {timing_shots:,}, PTSBE max_trajectories: {max_traj}")
print(f"  PTSBE:              {t_ptsbe:.3f}s")
print(f"  Standard noisy:     {t_std:.3f}s")
print(f"  Speedup:            {t_std / t_ptsbe:.1f}x")

### Inspecting trajectories with execution data

Pass `return_execution_data=True` to get the PTSBE execution data via `result.ptsbe_execution_data`. This reveals *why* PTSBE is efficient: most probability mass concentrates on a few low-error trajectories, so a small number of circuit simulations captures the bulk of the physics. The highest-probability trajectory (no errors at any noise site) typically receives the vast majority of shots, while multi-error trajectories are exponentially suppressed.

Note: this is an experimental API and may change in future releases.

In [None]:
from collections import Counter

exec_shots = 1_000_000
result_with_data = cudaq.ptsbe.sample(
    ghz,
    n_qubits,
    noise_model=ghz_noise,
    shots_count=exec_shots,
    max_trajectories=max_traj,
    return_execution_data=True,
)
assert result_with_data.has_execution_data()
data = result_with_data.ptsbe_execution_data

trajs = sorted(data.trajectories, key=lambda t: t.probability, reverse=True)
print(f"Trajectories: {len(trajs)}, Total shots: {exec_shots:,}\n")

# Top 5 trajectories by probability
print("Top 5 trajectories (highest probability):")
cumulative = 0
for rank, t in enumerate(trajs[:5], 1):
    cumulative += t.num_shots
    n_errors = sum(1 for s in t.kraus_selections if s.is_error)
    print(f"  #{rank}: prob={t.probability:.6f}, shots={t.num_shots:,}, "
          f"errors={n_errors}, cumulative shots={100 * cumulative / exec_shots:.1f}%")

# Lowest-probability trajectory
lowest = trajs[-1]
n_errors_low = sum(1 for s in lowest.kraus_selections if s.is_error)
print(f"\nLowest-probability trajectory:")
print(f"  prob={lowest.probability:.2e}, shots={lowest.num_shots}, errors={n_errors_low}")

noise_instructions = {i: inst for i, inst in enumerate(data.instructions)
                      if inst.type == cudaq.ptsbe.TraceInstructionType.Noise}

def fmt_selection(sel):
    channel = noise_instructions[sel.circuit_location]
    label = "error" if sel.is_error else "no-error"
    return (f"    site {sel.circuit_location} [{channel.name} on q{channel.targets}]: "
            f"K{sel.kraus_operator_index} ({label})")

highest = trajs[0]
print(f"\nHighest-probability trajectory (id={highest.trajectory_id}):")
print(f"  prob={highest.probability:.6f}, shots={highest.num_shots:,}")
print(f"  Kraus selections:")
for sel in highest.kraus_selections:
    print(fmt_selection(sel))

print(f"\nLowest-probability trajectory (id={lowest.trajectory_id}):")
print(f"  prob={lowest.probability:.2e}, shots={lowest.num_shots}")
print(f"  Kraus selections:")
for sel in lowest.kraus_selections:
    print(fmt_selection(sel))

# Error count histogram
error_counts = Counter(
    sum(1 for s in t.kraus_selections if s.is_error) for t in trajs
)
print("\nTrajectories grouped by error count:")
for n_err in sorted(error_counts):
    n_traj = error_counts[n_err]
    total = sum(t.num_shots for t in trajs
                if sum(1 for s in t.kraus_selections if s.is_error) == n_err)
    print(f"  {n_err} errors: {n_traj} trajectories, "
          f"{total:,} shots ({100 * total / exec_shots:.1f}%)")