Summary
Running coverage with branch coverage enabled on PyPy results in a ~20x slowdown compared to CPython (3.5-5+ minutes vs ~15 seconds for the same test suite). This makes it impractical to enforce coverage on PyPy in CI.
You can view my experiments in the PR to my repository (commits starting at d91abf2c8fdb6ca7c89cd18c90bcb666c09b951f and ending at 9fbe0d4999b66a53a0dfdf2f0ef1bd5b7e235416).
Environment
- coverage version: 7.6.1
- PyPy versions tested: pypy3.9 (7.3.16), pypy3.10 (7.3.19), pypy3.11 (7.3.21)
- CPython baseline: 3.8-3.15
- OS: macOS, Ubuntu, Windows (all three affected equally)
- Test runner: pytest 8.3.5 + pytest-xdist 3.8.0 (
-n auto)
- Coverage config:
[tool.coverage.run]
branch = true
parallel = true
plugins = ["coverage_pyver_pragma"]
source = ["suby"]
The project
suby is a small subprocess wrapper library (~330 statements, ~105 branches). The test suite has 341 tests. Most tests spawn a short-lived child process via subprocess.Popen.
On CPython, the full test suite with branch coverage and xdist parallelization completes in ~15 seconds.
What happens on PyPy
The same test suite with the same coverage configuration takes 3.5-7 minutes on PyPy, depending on the approach used.
Approach 1: coverage run + .pth file (same as CPython)
COVERAGE_PROCESS_START="pyproject.toml" coverage run -m pytest -n auto
coverage combine
coverage report -m --fail-under=100
With a .pth file bootstrapping coverage.process_startup() in xdist workers, and a pytest_configure hook removing COVERAGE_PROCESS_START from the worker environment to prevent child processes from inheriting it.
Result: ~3.5 minutes (was ~15 seconds on CPython).
Approach 2: pytest-cov
pytest --cov=suby --cov-branch --cov-fail-under=100 -n auto
Result: 5-7 minutes. Even worse because pytest-cov's .pth file (COV_CORE_* env vars) activates coverage in xdist workers at Python startup, and then the pytest-cov plugin starts a second coverage instance in the same workers, resulting in double tracing.
Approach 3: Single-process (no xdist)
We also tried running without xdist to avoid the .pth subprocess overhead entirely. This was still dramatically slower than CPython, confirming that the tracing itself (not subprocess bootstrap) is the bottleneck.
Root cause analysis
We traced the slowdown to coverage's line/branch tracer running as pure Python on PyPy (no C extension). Specifically:
-
sys.settrace overhead: On CPython, coverage uses a C-based tracer. On PyPy, it falls back to a pure Python tracer. Every line of executed code (including pytest framework code, not just the measured source) passes through this tracer.
-
Branch coverage amplifies the cost: With branch = true, the tracer does additional work per line to track branch transitions. On CPython's C tracer this is negligible; on PyPy's pure Python tracer it's expensive.
-
source filtering doesn't help enough: Even with source = ["suby"] limiting coverage measurement to a small package, the sys.settrace callback still fires for ALL executed code. The filtering happens inside the callback, but the call overhead remains.
What we tried to mitigate
- Removed coverage from child subprocesses (conftest hook removing
COVERAGE_PROCESS_START / COV_CORE_* from xdist workers' env) - this prevented the ~300 child processes per run from loading coverage, but the xdist worker tracing overhead remained.
- Tried pytest-cov instead of manual
coverage run - made things worse due to double coverage instances.
- Disabled pytest-cov with
-p no:cov while using coverage run - eliminated double coverage but PyPy tracing is still inherently slow.
Expected behavior
Coverage on PyPy should be within a reasonable factor (2-3x) of CPython performance, not 20x+ slower. PyPy's JIT should in theory be able to optimize a hot tracing function.
Possible directions
- A PyPy-optimized tracer (perhaps using PyPy's JIT-friendly patterns instead of
sys.settrace)
- An option to only invoke the trace callback for files matching
source, rather than filtering inside the callback
- CFFI-based tracer for PyPy instead of pure Python
Summary
Running
coveragewith branch coverage enabled on PyPy results in a ~20x slowdown compared to CPython (3.5-5+ minutes vs ~15 seconds for the same test suite). This makes it impractical to enforce coverage on PyPy in CI.You can view my experiments in the PR to my repository (commits starting at d91abf2c8fdb6ca7c89cd18c90bcb666c09b951f and ending at 9fbe0d4999b66a53a0dfdf2f0ef1bd5b7e235416).
Environment
-n auto)The project
suby is a small subprocess wrapper library (~330 statements, ~105 branches). The test suite has 341 tests. Most tests spawn a short-lived child process via
subprocess.Popen.On CPython, the full test suite with branch coverage and xdist parallelization completes in ~15 seconds.
What happens on PyPy
The same test suite with the same coverage configuration takes 3.5-7 minutes on PyPy, depending on the approach used.
Approach 1:
coverage run+.pthfile (same as CPython)COVERAGE_PROCESS_START="pyproject.toml" coverage run -m pytest -n auto coverage combine coverage report -m --fail-under=100With a
.pthfile bootstrappingcoverage.process_startup()in xdist workers, and apytest_configurehook removingCOVERAGE_PROCESS_STARTfrom the worker environment to prevent child processes from inheriting it.Result: ~3.5 minutes (was ~15 seconds on CPython).
Approach 2: pytest-cov
Result: 5-7 minutes. Even worse because pytest-cov's
.pthfile (COV_CORE_*env vars) activates coverage in xdist workers at Python startup, and then the pytest-cov plugin starts a second coverage instance in the same workers, resulting in double tracing.Approach 3: Single-process (no xdist)
We also tried running without xdist to avoid the
.pthsubprocess overhead entirely. This was still dramatically slower than CPython, confirming that the tracing itself (not subprocess bootstrap) is the bottleneck.Root cause analysis
We traced the slowdown to coverage's line/branch tracer running as pure Python on PyPy (no C extension). Specifically:
sys.settraceoverhead: On CPython, coverage uses a C-based tracer. On PyPy, it falls back to a pure Python tracer. Every line of executed code (including pytest framework code, not just the measured source) passes through this tracer.Branch coverage amplifies the cost: With
branch = true, the tracer does additional work per line to track branch transitions. On CPython's C tracer this is negligible; on PyPy's pure Python tracer it's expensive.sourcefiltering doesn't help enough: Even withsource = ["suby"]limiting coverage measurement to a small package, thesys.settracecallback still fires for ALL executed code. The filtering happens inside the callback, but the call overhead remains.What we tried to mitigate
COVERAGE_PROCESS_START/COV_CORE_*from xdist workers' env) - this prevented the ~300 child processes per run from loading coverage, but the xdist worker tracing overhead remained.coverage run- made things worse due to double coverage instances.-p no:covwhile usingcoverage run- eliminated double coverage but PyPy tracing is still inherently slow.Expected behavior
Coverage on PyPy should be within a reasonable factor (2-3x) of CPython performance, not 20x+ slower. PyPy's JIT should in theory be able to optimize a hot tracing function.
Possible directions
sys.settrace)source, rather than filtering inside the callback