A focused collection of multi-threaded Python benchmarks designed to
characterize free-threaded Python (no-GIL) performance. 24 workloads
covering every threading synchronization primitive: Lock, RLock,
Event, Condition, Semaphore, BoundedSemaphore, Barrier, plus
queue.Queue / queue.PriorityQueue
(layered on Condition + Lock).
Each workload runs in both parallel and same-total-work serial
mode; the driver reports speedup = serial / parallel.
run_bench.py CLI driver that runs workloads, prints metrics to stdout
workloads/ 24 benchmark scripts + shared helpers + README.md
tests/ correctness + determinism + driver unit tests
Per-workload algorithm references live in workloads/README.md, grouped by Python sync primitive. Contributed benchmark runs from different machines live in reports/ — PRs welcome.
uvfor environment management.psutil(declared inpyproject.toml); used for peak-RSS on Windows.
The suite runs on any CPython 3.13+, GIL'd or free-threaded.
Free-threaded is what the benchmark is designed to characterise — on a
GIL'd build most speedups will hover around 1× and that's the expected
result. Each run_bench.py invocation prints the actual interpreter
(implementation, version, free-threaded yes/no) at the top of stdout.
The canonical build for the published numbers is CPython 3.14 free-threaded (what CI uses). To get it locally:
uv python install 3.14+freethreaded
uv sync --python 3.14+freethreadedAll uv command examples in this repository pass --python 3.14+freethreaded explicitly so the canonical interpreter is used
regardless of UV_PYTHON / .python-version / shell state. Substitute
another version (e.g. --python 3.13) if you want to compare
interpreter builds.
uv sync --python 3.14+freethreaded
# List all workloads with their BENCH_SPEC:
uv run --python 3.14+freethreaded python run_bench.py --list
# Run the full suite (24 workloads × 5 runs × serial+parallel):
uv run --python 3.14+freethreaded python run_bench.py
# Parallel only, 3 runs, verbose:
uv run --python 3.14+freethreaded python run_bench.py --mode parallel --runs 3 -v
# Smoke run (small problem sizes, ~10× smaller):
uv run --python 3.14+freethreaded python run_bench.py --smokeIn --mode both (the default) the per-workload status line and summary
table compare serial vs parallel:
workload runs threads serial wall (s) parallel wall (s) ± stdev speedup par eff peak MB checksum
- serial wall — mean wall time of
main_serial(). - parallel wall — mean wall time of
main_parallel()with the workload'snum_threads. - speedup =
serial_wall_mean / parallel_wall_mean. > 1.0× means parallel beats serial. CPU-bound embarrassingly-parallel workloads on free-threaded Python should approachnum_threads. - par eff =
speedup / num_threads. 100% means perfect scaling. - checksum — deterministic per-workload value.
*= unstable across runs.DIVERGENT= serial and parallel disagree (workload bug).
In --mode parallel the table reverts to per-run metrics (wall mean ±
stdev, cpu mean, cpu eff = cpu_mean / (wall_mean * num_threads), peak
MB, checksum). In --mode serial it shows just wall / peak / checksum.
run_bench.py [-h] [--runs N] [--benchmarks LIST]
[--mode {parallel,serial,both}] [--smoke]
[--timeout S] [--list] [--verbose]
--runs N— runs per (workload, mode) cell (default 5).--benchmarks LIST— comma-separated subset of workload names.--mode—parallel,serial, orboth(defaultboth).--smoke— setsBENCH_SMOKE=1in workloads, scaling problem sizes down ~10× for fast iteration.--timeout S— per-subprocess timeout in seconds (default 600).--list— print discovered workloads with theirBENCH_SPECand exit.--verbose/-v— log every run as it completes.
Exit code is non-zero if any run failed, any checksum was unstable, or any serial/parallel checksum diverged. A sub-1.0× speedup is not considered a failure. The suite is curated for sync-primitive coverage, not "everything scales".
uv run --python 3.14+freethreaded pytest tests/ -vTests set BENCH_SMOKE=1 so problem sizes shrink ~10×. Every workload
is checked in both modes against the same golden checksum. CI runs the
same command on every push and PR to main.
- Add
workloads/<name>.py. ImportcommonforNUM_THREADS,parallel_for,barrier_workers,run_entry. - Implement both
main_parallel()(multithreaded) andmain_serial()(single-thread baseline). They must return the same checksum for the same input. - Add a module-level
BENCH_SPECdict withname,description,num_threads,sync,work_units. - End the script with
if __name__ == "__main__": common.run_entry(main_parallel, main_serial). - Honor
BENCH_SMOKE— usecommon.scaled(full, smoke_value)for problem-size constants. - Run the workload once at smoke size and paste the checksum into
tests/test_workloads_correct.pyas the golden value. - Add a section to workloads/README.md with curated tutorial links for the algorithm.
- Run
uv run --python 3.14+freethreaded pytest tests/anduv run --python 3.14+freethreaded pre-commit run --all-filesand confirm both pass.