Python Profiler Benchmark

A FastAPI app that benchmarks the overhead and output of three Python profilers: cProfile (stdlib), py-spy (sampling), and scalene (line-level CPU + memory).

Built with Python 3.13 and managed with uv.

Project structure

├── main.py                # FastAPI app with CPU / memory / I/O endpoints
├── workloads.py           # Pure workload functions (fibonacci, primes, matrix, memory, I/O)
├── benchmark.py           # Orchestrator: measures latency overhead per profiler
├── profile_workloads.py   # Runs workloads directly (no server) — used for clean profiling
├── profiler.py            # Reusable @cprofile_fn and @scalene_fn decorators
├── pyproject.toml         # Dependencies
├── uv.lock                # Locked dependency versions
├── .python-version        # Pins Python 3.13
└── profiles/              # Generated output (gitignored)
    ├── baseline.json
    ├── cprofile_stats.txt
    ├── py-spy.json / pyspy.svg
    └── scalene.json

Setup

uv sync

Endpoints

Endpoint	Workload
`GET /health`	Health check
`GET /cpu/fibonacci?n=28`	Recursive Fibonacci — stresses function-call dispatch
`GET /cpu/primes?limit=50000`	Sieve of Eratosthenes — stresses integer loops
`GET /cpu/matrix?size=40`	Naive O(n³) matrix multiply — stresses nested loops
`GET /memory?num_lists=6&list_size=10000`	Large list alloc + sort + reduce
`GET /io?count=40`	Repeated file write + read
`GET /mixed`	All of the above combined
`GET /profiler/stats`	Live cProfile dump (only with `--with-cprofile`)

Benchmark: measuring profiler overhead

The benchmark starts the server as a subprocess, hits every endpoint, and measures latency. Results are saved to profiles/ as JSON for cross-session comparison.

Step 1 — establish baseline

uv run python benchmark.py --mode baseline
# saves profiles/baseline.json

Step 2 — cProfile overhead (automatic)

uv run python benchmark.py --mode cprofile
# starts server with --with-cprofile, compares against saved baseline

Step 3 — py-spy overhead (external sampling)

# Terminal A — server (no profiler)
uv run python main.py

# Terminal B — attach py-spy while traffic runs
py-spy record -o profiles/pyspy.svg --pid $(pgrep -f 'main.py') --duration 90
# macOS may need: sudo py-spy record …

# Terminal C — benchmark and compare
uv run python benchmark.py --mode external

Step 4 — scalene overhead (external)

# Terminal A — server under scalene
uv run python -m scalene run -o profiles/scalene.json -- main.py

# Terminal B — benchmark and compare
uv run python benchmark.py --mode external

Full auto run (baseline + cProfile + instructions)

uv run python benchmark.py

Options

--mode      baseline | cprofile | external | all  (default: all)
--requests  N timed requests per endpoint          (default: 20)
--warmup    N warmup requests per endpoint         (default: 5)

Profiling your own code

Cleanest way: profile without FastAPI noise

# Generate scalene report on workloads directly
uv run python -m scalene run -o profiles/scalene_workloads.json profile_workloads.py

# View in browser
uv run python -m scalene view profiles/scalene_workloads.json

# View as standalone HTML file
uv run python -m scalene view profiles/scalene_workloads.json --standalone

# View in terminal (only active lines)
uv run python -m scalene view profiles/scalene_workloads.json --cli --reduced

# cProfile in one command
uv run python -m cProfile -s cumulative profile_workloads.py | head -30

Using the profiling decorators

Copy profiler.py to your project and decorate the functions you care about:

from profiler import cprofile_fn, scalene_fn

# cProfile: call counts + cumulative time per function
# → saves report to profiles/compute_nodes.txt automatically
@cprofile_fn(top=15, save_to="profiles/compute_nodes.txt")
def compute_nodes(graph):
    ...

# scalene: line-level CPU% + memory — active only under `scalene run`
@scalene_fn
def build_edge_matrix(nodes):
    ...

Then:

# cProfile (works immediately)
python your_script.py

# scalene (line-level detail)
python -m scalene run -o profiles/out.json your_script.py
python -m scalene view profiles/out.json --cli --reduced

How to read the outputs

cProfile

ncalls    tottime  percall  cumtime  percall  filename:lineno(function)
1028457/1   0.099    0.000    0.099    0.099  workloads.py:17(fibonacci)

Column	Meaning
`ncalls`	Total calls / non-recursive calls (slash = recursive)
`tottime`	Time in this function only (excludes callees)
`cumtime`	Time including everything this function called
`percall`	Per-call cost (tottime or cumtime ÷ ncalls)

Rule: high cumtime but low tottime → bottleneck is inside something this function calls.

scalene

line 19:  return fibonacci(n-1) + fibonacci(n-2)
          Python: 94%   Native: 0%   Memory: 0 MB

Column	Meaning	Action
Python%	Time Python's interpreter spent on this line	High → rewrite or vectorize
Native%	Time in C/native libraries called from this line	High → already fast, fix the algorithm
Memory (MB)	Allocations on this line	High → reducing copies or intermediate objects

Key insight: high Native% on a numpy/pandas line means the C library is doing the work — the problem is usually calling it too many times (e.g. inside a loop), not the line itself.

Benchmark results summary

Profiler	Mechanism	fibonacci overhead	memory overhead
py-spy	OS-level sampling, external	~0%	~0%
scalene	Line-level CPU + allocation tracking	~0%	+90%
cProfile	Hook on every function call	+261%	+19%

Each profiler has a different Achilles' heel:

cProfile is slow on recursion-heavy / call-heavy code
scalene is slow on allocation-heavy code (by design — it's tracking every allocation)
py-spy has near-zero overhead on everything

Recommended workflow

1. scalene  →  find the hot LINE and whether it's CPU-bound or memory-bound
2. cProfile →  confirm call counts dropped after your fix
3. py-spy   →  verify the fix holds under real production load

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Python Profiler Benchmark

Project structure

Setup

Endpoints

Benchmark: measuring profiler overhead

Step 1 — establish baseline

Step 2 — cProfile overhead (automatic)

Step 3 — py-spy overhead (external sampling)

Step 4 — scalene overhead (external)

Full auto run (baseline + cProfile + instructions)

Options

Profiling your own code

Cleanest way: profile without FastAPI noise

Using the profiling decorators

How to read the outputs

cProfile

scalene

Benchmark results summary

Recommended workflow

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.github		.github
docs		docs
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
benchmark.py		benchmark.py
main.py		main.py
profile_workloads.py		profile_workloads.py
profiler.py		profiler.py
pyproject.toml		pyproject.toml
run_workloads.py		run_workloads.py
uv.lock		uv.lock
workloads.py		workloads.py

Folders and files

Latest commit

History

Repository files navigation

Python Profiler Benchmark

Project structure

Setup

Endpoints

Benchmark: measuring profiler overhead

Step 1 — establish baseline

Step 2 — cProfile overhead (automatic)

Step 3 — py-spy overhead (external sampling)

Step 4 — scalene overhead (external)

Full auto run (baseline + cProfile + instructions)

Options

Profiling your own code

Cleanest way: profile without FastAPI noise

Using the profiling decorators

How to read the outputs

cProfile

scalene

Benchmark results summary

Recommended workflow

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages