Skip to content

gh-123471: Make itertools.chain thread-safe #135689

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

eendebakpt
Copy link
Contributor

@eendebakpt eendebakpt commented Jun 18, 2025

The itertools.chain has two attributes that are mutated duration iteration, making it non-thread safe.

Options to make it thread safe

i) Use a lock (simple, but adds a performance penalty for the single-threaded case)
ii) Make the iterator thread safe using atomics.

For the second option we can stop clearing the lz->source on exhaustion (this is a standard mitigation to avoid setting lz->source to zero (and doing a decref) while another thread is still using lz->source). To signal the iterator that lz->source is exhausted we can either add a new attribute, or replace the while (lz->source != NULL) with while (true). This creates a slower path the the iterator is exhausted, but typically once the iterator is exhausted performance is not relevant any longer. The lz->active is more problematic. Once lz->active is exhausted, we need to update this with a new one (e.g. PyIter_Next(lz->source). But lz->active only has a single reference count. So updating in one thread means having to decref the old lz->active. Possible mitigations: a) atomically load-and-incref the lz->active in chain_next b) add a new attribute that contains all the exhausted values of lz->active and clear this once the chainobject itself is deallocated.

The second option is complex and has some performance issues (e.g. more incref/decrefs, keeping references to exhausted iterators longer in memory), so in this PR we pick the first option.

@rhettinger rhettinger removed their request for review June 18, 2025 20:58
@rhettinger
Copy link
Contributor

rhettinger commented Jun 18, 2025

Options to make it thread safe

i) Use a lock (simple, but adds a performance penalty for the single-threaded case)
ii) Make the iterator thread safe using atomics.

Can you use an #ifdef to have different code paths for the gil/nogil builds? Ideally, the nogil case should be kept in its current form, clean, fast, and highly optimized.

ISTM that most of the "make thread-safe" patches going in everywhere from heapq to itertools adds overhead to the builds that don't get any benefit from it. The code is getting harder to read and maintain. AFAICT most of these edits are going in without tests so it hard to tell whether they are achieving their goal. Also there seem to be differing notions of what "thread-safe" means, from just not-segfaulting to strict guarantees that underlying iterators are non called concurrently.

Also, have you been testing for performance impacts on the standard nogil build that everyone uses in production? Almost the entire point of itertools is to add minimal overhead to the underying data producers. The chain_next calls are effectively a very tight loop where adding almost anything degrades the performance that people were seeking when they choose to link various itertools together. In the case of chain, the alternative is a nested for-loop, for iterator in iterators: for value in iterator: .... If the chain itertool can't beat the pure python alternative, it loses part of its raison d'être.

@@ -0,0 +1 @@
Make concurrent iterations over :class:`itertools.chain` safe under free-threading.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Make concurrent iterations over :class:`itertools.chain` safe under free-threading.
Make concurrent iterations over :class:`itertools.chain` safe under :term:`free threading`.

barrier.wait()
while True:
try:
_ = next(it)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the extra assignment to _?

static PyObject *
chain_next(PyObject *op)
{
PyObject * result;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
PyObject * result;
PyObject *result;

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants