Skip to content

fix: snapshot data in Collector.flush_data to avoid threading race#2165

Merged
nedbat merged 3 commits into
coveragepy:mainfrom
alexmv:fix-flush-data-race
May 10, 2026
Merged

fix: snapshot data in Collector.flush_data to avoid threading race#2165
nedbat merged 3 commits into
coveragepy:mainfrom
alexmv:fix-flush-data-race

Conversation

@alexmv
Copy link
Copy Markdown
Contributor

@alexmv alexmv commented Apr 28, 2026

Collector.data is a single dict shared across all per-thread tracers, and tracers update its inner sets without taking a lock (one acquire per traced line would be far too expensive). When a switch_context boundary fires on one thread -- which is routine with dynamic_context = test_function -- flush_data runs on that thread while tracers in other threads keep adding lines to the same sets. mapped_file_dict only guarded the outer dict iteration, so add_arcs / add_lines later iterated the live sets and could fail with:

RuntimeError: Set changed size during iteration

...raised from nums_to_numbits in the lines path, or from the arc tuple comprehension in add_arcs. list(packeds) and list(packed_data.items()) in the packed-arcs branch were equally exposed: both go through the Python iterator protocol, which detects concurrent mutation and raises.

Snapshot both the outer dict and each per-file set with .copy(), which is atomic in CPython -- the GIL is held for the duration of the C-level copy, so no other thread can mutate the source mid-copy. Free-threaded builds do them in per-container critical sections, so the same guarantee applies. add_arcs / add_lines now iterate stable copies even while other threads continue recording trace data.

alexmv added 2 commits April 28, 2026 02:55
The non-packed-arcs branch passes self.data's sets straight into
add_arcs, but the old `cast(dict[str, list[TArc]], self.data)`
lied about the runtime type to fit a too-narrow annotation.
add_arcs accepts `Mapping[str, Collection[TArc]]`, which covers
both the packed-arcs lists and the unpacked sets.

Widen arc_data to `dict[str, Collection[TArc]]` and correct the
cast to `set[TArc]`.
Collector.data is a single dict shared across all per-thread tracers,
and tracers update its inner sets without taking a lock (one acquire
per traced line would be far too expensive). When a switch_context
boundary fires on one thread -- which is routine with
`dynamic_context = test_function` -- flush_data runs on that
thread while tracers in other threads keep adding lines to the same
sets. mapped_file_dict only guarded the outer dict iteration, so
add_arcs / add_lines later iterated the live sets and could fail with:

    RuntimeError: Set changed size during iteration

...raised from nums_to_numbits in the lines path, or from the arc tuple
comprehension in add_arcs. `list(packeds)` and
`list(packed_data.items())` in the packed-arcs branch were equally
exposed: both go through the Python iterator protocol, which detects
concurrent mutation and raises.

Snapshot both the outer dict and each per-file set with `.copy()`,
which is atomic in CPython -- the GIL is held for the duration of the
C-level copy, so no other thread can mutate the source mid-copy.
Free-threaded builds do them in per-container critical sections, so
the same guarantee applies.  add_arcs / add_lines now iterate stable
copies even while other threads continue recording trace data.
@alexmv alexmv force-pushed the fix-flush-data-race branch from a093ba4 to 4c51be1 Compare April 28, 2026 03:53
@nedbat
Copy link
Copy Markdown
Member

nedbat commented Apr 28, 2026

I can see that a lock for each line would be a lot, though I'm not sure it would be too expensive. Copying the data is also expensive, though it happens infrequently. I'm more concerned about the GIL reasoning, and will it hold up in free-threaded builds?

@alexmv
Copy link
Copy Markdown
Contributor Author

alexmv commented May 1, 2026

.copy() is implemented in C -- and as such, it's free from this style of race.

In free-threaded builds, each object has its own mutex, which is held while modifying the object. These are suspended if it pop back to Python (e.g. using list(thing) will check iterator protocols), but .copy() is explicitly listed as being atomic to other threads, as it is entirely implemented in C.

@nedbat nedbat merged commit 8cd392e into coveragepy:main May 10, 2026
44 checks passed
nedbat added a commit that referenced this pull request May 10, 2026
@nedbat
Copy link
Copy Markdown
Member

nedbat commented May 10, 2026

This is now released as part of coverage 7.14.0.

@alexmv
Copy link
Copy Markdown
Contributor Author

alexmv commented May 10, 2026

Thanks for the release!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants