-
-
Notifications
You must be signed in to change notification settings - Fork 32.2k
gh-123471: Make itertools.chain thread-safe #135689
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Can you use an ISTM that most of the "make thread-safe" patches going in everywhere from heapq to itertools adds overhead to the builds that don't get any benefit from it. The code is getting harder to read and maintain. AFAICT most of these edits are going in without tests so it hard to tell whether they are achieving their goal. Also there seem to be differing notions of what "thread-safe" means, from just not-segfaulting to strict guarantees that underlying iterators are non called concurrently. Also, have you been testing for performance impacts on the standard nogil build that everyone uses in production? Almost the entire point of itertools is to add minimal overhead to the underying data producers. The |
@@ -0,0 +1 @@ | |||
Make concurrent iterations over :class:`itertools.chain` safe under free-threading. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make concurrent iterations over :class:`itertools.chain` safe under free-threading. | |
Make concurrent iterations over :class:`itertools.chain` safe under :term:`free threading`. |
barrier.wait() | ||
while True: | ||
try: | ||
_ = next(it) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the extra assignment to _
?
static PyObject * | ||
chain_next(PyObject *op) | ||
{ | ||
PyObject * result; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PyObject * result; | |
PyObject *result; |
The
itertools.chain
has two attributes that are mutated duration iteration, making it non-thread safe.Options to make it thread safe
i) Use a lock (simple, but adds a performance penalty for the single-threaded case)
ii) Make the iterator thread safe using atomics.
For the second option we can stop clearing the
lz->source
on exhaustion (this is a standard mitigation to avoid settinglz->source
to zero (and doing a decref) while another thread is still usinglz->source
). To signal the iterator thatlz->source
is exhausted we can either add a new attribute, or replace thewhile (lz->source != NULL)
withwhile (true)
. This creates a slower path the the iterator is exhausted, but typically once the iterator is exhausted performance is not relevant any longer. Thelz->active
is more problematic. Oncelz->active
is exhausted, we need to update this with a new one (e.g.PyIter_Next(lz->source
). Butlz->active
only has a single reference count. So updating in one thread means having to decref the oldlz->active
. Possible mitigations: a) atomically load-and-incref thelz->active
inchain_next
b) add a new attribute that contains all the exhausted values oflz->active
and clear this once thechainobject
itself is deallocated.The second option is complex and has some performance issues (e.g. more incref/decrefs, keeping references to exhausted iterators longer in memory), so in this PR we pick the first option.