Dramatic slowdown when running coverage on PyPy (~20x slower than CPython)

## Summary

Running `coverage` with branch coverage enabled on PyPy results in a ~20x slowdown compared to CPython (3.5-5+ minutes vs ~15 seconds for the same test suite). This makes it impractical to enforce coverage on PyPy in CI.

You can view my experiments in [the PR](https://github.com/mutating/suby/pull/7) to my repository (commits starting at d91abf2c8fdb6ca7c89cd18c90bcb666c09b951f and ending at 9fbe0d4999b66a53a0dfdf2f0ef1bd5b7e235416).

## Environment

- **coverage version**: 7.6.1
- **PyPy versions tested**: pypy3.9 (7.3.16), pypy3.10 (7.3.19), pypy3.11 (7.3.21)
- **CPython baseline**: 3.8-3.15
- **OS**: macOS, Ubuntu, Windows (all three affected equally)
- **Test runner**: pytest 8.3.5 + pytest-xdist 3.8.0 (`-n auto`)
- **Coverage config**:
  ```toml
  [tool.coverage.run]
  branch = true
  parallel = true
  plugins = ["coverage_pyver_pragma"]
  source = ["suby"]
  ```

## The project

[suby](https://github.com/pomponchik/suby) is a small subprocess wrapper library (~330 statements, ~105 branches). The test suite has 341 tests. Most tests spawn a short-lived child process via `subprocess.Popen`.

On CPython, the full test suite with branch coverage and xdist parallelization completes in **~15 seconds**.

## What happens on PyPy

The same test suite with the same coverage configuration takes **3.5-7 minutes** on PyPy, depending on the approach used.

### Approach 1: `coverage run` + `.pth` file (same as CPython)

```bash
COVERAGE_PROCESS_START="pyproject.toml" coverage run -m pytest -n auto
coverage combine
coverage report -m --fail-under=100
```

With a `.pth` file bootstrapping `coverage.process_startup()` in xdist workers, and a `pytest_configure` hook removing `COVERAGE_PROCESS_START` from the worker environment to prevent child processes from inheriting it.

**Result: ~3.5 minutes** (was ~15 seconds on CPython).

### Approach 2: pytest-cov

```bash
pytest --cov=suby --cov-branch --cov-fail-under=100 -n auto
```

**Result: 5-7 minutes.** Even worse because pytest-cov's `.pth` file (`COV_CORE_*` env vars) activates coverage in xdist workers at Python startup, and then the pytest-cov plugin starts a second coverage instance in the same workers, resulting in double tracing.

### Approach 3: Single-process (no xdist)

```bash
coverage run -m pytest
```

We also tried running without xdist to avoid the `.pth` subprocess overhead entirely. This was still dramatically slower than CPython, confirming that the tracing itself (not subprocess bootstrap) is the bottleneck.

## Root cause analysis

We traced the slowdown to coverage's line/branch tracer running as pure Python on PyPy (no C extension). Specifically:

1. **`sys.settrace` overhead**: On CPython, coverage uses a C-based tracer. On PyPy, it falls back to a pure Python tracer. Every line of executed code (including pytest framework code, not just the measured source) passes through this tracer.

2. **Branch coverage amplifies the cost**: With `branch = true`, the tracer does additional work per line to track branch transitions. On CPython's C tracer this is negligible; on PyPy's pure Python tracer it's expensive.

3. **`source` filtering doesn't help enough**: Even with `source = ["suby"]` limiting coverage measurement to a small package, the `sys.settrace` callback still fires for ALL executed code. The filtering happens inside the callback, but the call overhead remains.

## What we tried to mitigate

- Removed coverage from child subprocesses (conftest hook removing `COVERAGE_PROCESS_START` / `COV_CORE_*` from xdist workers' env) - this prevented the ~300 child processes per run from loading coverage, but the xdist worker tracing overhead remained.
- Tried pytest-cov instead of manual `coverage run` - made things worse due to double coverage instances.
- Disabled pytest-cov with `-p no:cov` while using `coverage run` - eliminated double coverage but PyPy tracing is still inherently slow.

## Expected behavior

Coverage on PyPy should be within a reasonable factor (2-3x) of CPython performance, not 20x+ slower. PyPy's JIT should in theory be able to optimize a hot tracing function.

## Possible directions

- A PyPy-optimized tracer (perhaps using PyPy's JIT-friendly patterns instead of `sys.settrace`)
- An option to only invoke the trace callback for files matching `source`, rather than filtering inside the callback
- CFFI-based tracer for PyPy instead of pure Python

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Dramatic slowdown when running coverage on PyPy (~20x slower than CPython) #2155

Summary

Environment

The project

What happens on PyPy

Approach 1: `coverage run` + `.pth` file (same as CPython)

Approach 2: pytest-cov

Approach 3: Single-process (no xdist)

Root cause analysis

What we tried to mitigate

Expected behavior

Possible directions

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Dramatic slowdown when running coverage on PyPy (~20x slower than CPython) #2155

Description

Summary

Environment

The project

What happens on PyPy

Approach 1: coverage run + .pth file (same as CPython)

Approach 2: pytest-cov

Approach 3: Single-process (no xdist)

Root cause analysis

What we tried to mitigate

Expected behavior

Possible directions

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Approach 1: `coverage run` + `.pth` file (same as CPython)