fix: snapshot data in Collector.flush_data to avoid threading race by alexmv · Pull Request #2165 · coveragepy/coveragepy

alexmv · 2026-04-28T03:23:41Z

Collector.data is a single dict shared across all per-thread tracers, and tracers update its inner sets without taking a lock (one acquire per traced line would be far too expensive). When a switch_context boundary fires on one thread -- which is routine with dynamic_context = test_function -- flush_data runs on that thread while tracers in other threads keep adding lines to the same sets. mapped_file_dict only guarded the outer dict iteration, so add_arcs / add_lines later iterated the live sets and could fail with:

RuntimeError: Set changed size during iteration

...raised from nums_to_numbits in the lines path, or from the arc tuple comprehension in add_arcs. list(packeds) and list(packed_data.items()) in the packed-arcs branch were equally exposed: both go through the Python iterator protocol, which detects concurrent mutation and raises.

Snapshot both the outer dict and each per-file set with .copy(), which is atomic in CPython -- the GIL is held for the duration of the C-level copy, so no other thread can mutate the source mid-copy. Free-threaded builds do them in per-container critical sections, so the same guarantee applies. add_arcs / add_lines now iterate stable copies even while other threads continue recording trace data.

The non-packed-arcs branch passes self.data's sets straight into add_arcs, but the old `cast(dict[str, list[TArc]], self.data)` lied about the runtime type to fit a too-narrow annotation. add_arcs accepts `Mapping[str, Collection[TArc]]`, which covers both the packed-arcs lists and the unpacked sets. Widen arc_data to `dict[str, Collection[TArc]]` and correct the cast to `set[TArc]`.

Collector.data is a single dict shared across all per-thread tracers, and tracers update its inner sets without taking a lock (one acquire per traced line would be far too expensive). When a switch_context boundary fires on one thread -- which is routine with `dynamic_context = test_function` -- flush_data runs on that thread while tracers in other threads keep adding lines to the same sets. mapped_file_dict only guarded the outer dict iteration, so add_arcs / add_lines later iterated the live sets and could fail with: RuntimeError: Set changed size during iteration ...raised from nums_to_numbits in the lines path, or from the arc tuple comprehension in add_arcs. `list(packeds)` and `list(packed_data.items())` in the packed-arcs branch were equally exposed: both go through the Python iterator protocol, which detects concurrent mutation and raises. Snapshot both the outer dict and each per-file set with `.copy()`, which is atomic in CPython -- the GIL is held for the duration of the C-level copy, so no other thread can mutate the source mid-copy. Free-threaded builds do them in per-container critical sections, so the same guarantee applies. add_arcs / add_lines now iterate stable copies even while other threads continue recording trace data.

nedbat · 2026-04-28T10:02:23Z

I can see that a lock for each line would be a lot, though I'm not sure it would be too expensive. Copying the data is also expensive, though it happens infrequently. I'm more concerned about the GIL reasoning, and will it hold up in free-threaded builds?

alexmv · 2026-05-01T15:40:24Z

.copy() is implemented in C -- and as such, it's free from this style of race.

In free-threaded builds, each object has its own mutex, which is held while modifying the object. These are suspended if it pop back to Python (e.g. using list(thing) will check iterator protocols), but .copy() is explicitly listed as being atomic to other threads, as it is entirely implemented in C.

nedbat · 2026-05-10T18:01:46Z

This is now released as part of coverage 7.14.0.

alexmv · 2026-05-10T19:03:25Z

Thanks for the release!

alexmv added 2 commits April 28, 2026 02:55

alexmv force-pushed the fix-flush-data-race branch from a093ba4 to 4c51be1 Compare April 28, 2026 03:53

Merge branch 'main' into fix-flush-data-race

38727cc

nedbat merged commit 8cd392e into coveragepy:main May 10, 2026
44 checks passed

nedbat added a commit that referenced this pull request May 10, 2026

docs: thanks Alex Vandiver for #2165

649e8aa

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: snapshot data in Collector.flush_data to avoid threading race#2165

fix: snapshot data in Collector.flush_data to avoid threading race#2165
nedbat merged 3 commits into
coveragepy:mainfrom
alexmv:fix-flush-data-race

alexmv commented Apr 28, 2026

Uh oh!

nedbat commented Apr 28, 2026

Uh oh!

alexmv commented May 1, 2026

Uh oh!

Uh oh!

nedbat commented May 10, 2026

Uh oh!

alexmv commented May 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

alexmv commented Apr 28, 2026

Uh oh!

nedbat commented Apr 28, 2026

Uh oh!

alexmv commented May 1, 2026

Uh oh!

Uh oh!

nedbat commented May 10, 2026

Uh oh!

alexmv commented May 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants