Make `PyDict` iterator compatible with free-threaded build #4439

bschoenmaeckers · 2024-08-14T13:31:14Z

This pulls in the PyCriticalSection_Begin & PyCriticalSection_End functions new in 3.13 and use it to lock the PyDict iterators as d
described here. I'm not sure about the PyCriticalSection struct definition. We cannot use the opaque_struct! macro to define this struct because we have to allocate enough space on the stack so we can pass the uninitialized pointer to PyCriticalSection_Begin. So some help would be appreciated!

depends on #4421
related to #4265

ngoldbaum · 2024-08-14T15:34:13Z

I actually have a branch with these changes (more or less) that I was planning to do separately from that PR. Unfortunately the deadlock I found is caused by something else.

If you're planning to work on this stuff I'd appreciate it if you could comment on the tracking issue so we can coordinate work and avoid duplication.

mejrs

This use of the critical section api seems unwise. This allows users to create several critical sections (and worse) allows them to release them in arbitrary order. I don't think I understand the critical section api well but it seems guaranteed to cause issues.

I can see two obvious solutions:

Replace the implementation with PyObject_GetIter and PyIter_Next (slow?)
Implement some form of internal iteration:

impl PyDict{
    pub fn traverse<B>(&self, f: &mut impl FnMut(Bound<'py, PyAny>, Bound<'py, PyAny>) -> ControlFlow<B>) -> ControlFlow<B> {
        struct Guard { .. };
        impl Drop for Guard { ..release critical section }
        
        let mut cs = std::mem::MaybeUninit::zeroed();
        ffi::PyCriticalSection_Begin(cs.as_mut_ptr(), dict.as_ptr());
        let mut ma_used = ..;
        let mut di_used = ..;
        let key = ...;
        let value = ..;
        
        while PyDict_Next(...) != 0{
           f(key, value)?;
        }
        ControlFlow::Continue(())
    }
}

mejrs · 2024-08-14T20:09:22Z

src/types/dict.rs

+            let cs = unsafe {
+                let mut cs = std::mem::MaybeUninit::zeroed();
+                ffi::PyCriticalSection_Begin(cs.as_mut_ptr(), dict.as_ptr());
+                cs.assume_init()
+            };


This can just assume_init immediately because it is zero-valid. This would only be necessary if you used MaybeUninit::uninit().

Suggested change

let cs = unsafe {

let mut cs = std::mem::MaybeUninit::zeroed();

ffi::PyCriticalSection_Begin(cs.as_mut_ptr(), dict.as_ptr());

cs.assume_init()

};

let cs: ffi::PyCriticalSection = unsafe { std::mem::MaybeUninit::zeroed().assume_init() };

unsafe { ffi::PyCriticalSection_Begin(cs.as_mut_ptr(), dict.as_ptr()) };

mejrs · 2024-08-14T20:12:21Z

src/types/dict.rs

+    #[cfg(Py_GIL_DISABLED)]
+    impl Drop for BorrowedDictIter<'_, '_> {
+        fn drop(&mut self) {
+            unsafe {
+                ffi::PyCriticalSection_End(&mut self.cs);
+            }
+        }
+    }


It should probably implement Drop unconditionally (or not at all)

Suggested change

#[cfg(Py_GIL_DISABLED)]

impl Drop for BorrowedDictIter<'_, '_> {

fn drop(&mut self) {

unsafe {

ffi::PyCriticalSection_End(&mut self.cs);

}

}

}

impl Drop for BorrowedDictIter<'_, '_> {

fn drop(&mut self) {

#[cfg(Py_GIL_DISABLED)]

unsafe {

ffi::PyCriticalSection_End(&mut self.cs);

}

}

}

davidhewitt · 2024-08-16T20:44:27Z

Replace the implementation with PyObject_GetIter and PyIter_Next (slow?)

I think we should seriously consider going this way and benchmark if it's actually a performance concern. We already made the same change for sets a couple of releases back, it wasn't a major performance impact there compared to the wins from the Bound API. Two reasons why we did it for sets:

_PySet_Next (or whatever the API was called) was a private API
It doesn't do the right thing for subclasses of sets with custom __iter__ functions

Similarly our current implementation doesn't respect dict subclasses with custom __iter__ functions? Should it? Probably, in which case we might just want to switch to PyObject_GetIter anyway.

ngoldbaum · 2024-08-23T21:40:36Z

I opened #4477 with a different implementation of the FFI bindings.

bschoenmaeckers · 2024-08-23T22:33:43Z

I opened #4477 with a different implementation of the FFI bindings.

Sorry for the late reply. Your implementation looks good, and the 'opaque_type!' use is exactly like what I was looking for. I will update my MR after the weekend.

bschoenmaeckers · 2024-08-23T22:36:07Z

This use of the critical section api seems unwise. This allows users to create several critical sections (and worse) allows them to release them in arbitrary order. I don't think I understand the critical section api well but it seems guaranteed to cause issues.

I can see two obvious solutions:

Replace the implementation with PyObject_GetIter and PyIter_Next (slow?)
Implement some form of internal iteration:

impl PyDict{

    pub fn traverse<B>(&self, f: &mut impl FnMut(Bound<'py, PyAny>, Bound<'py, PyAny>) -> ControlFlow<B>) -> ControlFlow<B> {

        struct Guard { .. };

        impl Drop for Guard { ..release critical section }

        

        let mut cs = std::mem::MaybeUninit::zeroed();

        ffi::PyCriticalSection_Begin(cs.as_mut_ptr(), dict.as_ptr());

        let mut ma_used = ..;

        let mut di_used = ..;

        let key = ...;

        let value = ..;

        

        while PyDict_Next(...) != 0{

           f(key, value)?;

        }

        ControlFlow::Continue(())

    }

}

Interesting solutions 👀. I will try to implement the first one and test the performance hit after.

codspeed-hq · 2024-08-28T17:15:07Z

CodSpeed Performance Report

Merging #4439 will not alter performance

_{Comparing bschoenmaeckers:PyDict_next_lock (f79dc05) with main (4cbf6e0)}

Summary

✅ 83 untouched benchmarks

ngoldbaum · 2024-08-28T17:21:43Z

ouch that does seem to be a big perf hit

bschoenmaeckers · 2024-08-28T18:59:17Z

Yea this is really bad, but kind of expected as dict.items() creates a copy of the iterable and saves it into a PyList.

https://github.com/python/cpython/blob/main/Objects/dictobject.c#L3381-L3432

bschoenmaeckers · 2024-08-28T19:09:08Z

I've also looked into iterating a raw dict but this only yields the keys. So it does not protect against modifications of the values before fetching them on the next call.

ngoldbaum · 2024-08-28T19:40:29Z

I wonder if the critical section API is actually problematic in practice. You could try iterating over the same dict in many threads on the free-threaded build as a stress test. I'm not sure if there are other usage patterns that @mejrs might be concerned about.

It would be nice if we could still keep the fast path for dicts and then only degrade to the slow path if we're not handed an instance of PyDict_Type.

davidhewitt · 2024-08-28T21:44:32Z

Yea this is really bad, but kind of expected as dict.items() creates a copy of the iterable and saves it into a PyList.

https://github.com/python/cpython/blob/main/Objects/dictobject.c#L3381-L3432

dict.items() is equivalent to the Python 2 semantics where .items() in Python did create a new list. Is perf any better if you try .dict.call_method0("items") to get an iterable items view?

bschoenmaeckers · 2024-08-28T22:49:38Z

src/types/dict.rs

+            let tuple = pair.downcast::<PyTuple>().unwrap();
+            let key = tuple.get_item(0).unwrap();
+            let value = tuple.get_item(1).unwrap();


Is it wise to use the unchecked variants here instead of unwrap?

Now that I think about this, it is probably not safe, because items() method can return an arbitrary object when overridden in python code.

bschoenmaeckers · 2024-08-28T23:06:23Z

Yea this is really bad, but kind of expected as dict.items() creates a copy of the iterable and saves it into a PyList.
https://github.com/python/cpython/blob/main/Objects/dictobject.c#L3381-L3432

dict.items() is equivalent to the Python 2 semantics where .items() in Python did create a new list. Is perf any better if you try .dict.call_method0("items") to get an iterable items view?

I didn't know that this is different, learning something new every day! It is indeed somewhat faster. We went down from ~87% slowdown to ~63%.

bschoenmaeckers · 2024-08-29T16:27:47Z

It would be nice if we could still keep the fast path for dicts and then only degrade to the slow path if we're not handed an instance of PyDict_Type.

I made the previous fast path available to non-freethreaded builds when the dict is not a subtype of PyDict. This gives us minimal performance regressions on existing code.

davidhewitt · 2024-08-31T13:33:01Z

src/types/dict.rs

+
+                if unsafe { ffi::PyDict_Next(dict.as_ptr(), ppos, &mut key, &mut value) } != 0 {
+                    *remaining -= 1;
+                    let py = dict.py();
+                    // Safety:
+                    // - PyDict_Next returns borrowed values
+                    // - we have already checked that `PyDict_Next` succeeded, so we can assume these to be non-null
+                    Some((
+                        unsafe { key.assume_borrowed_unchecked(py) }.to_owned(),
+                        unsafe { value.assume_borrowed_unchecked(py) }.to_owned(),
+                    ))
+                } else {
+                    None
+                }
+            }


Is there an alternative implementation here where we add a critical section internally here just around the call to PyDict_Next? It means that each iteration has to lock / unlock a mutex, which might also be terrible for performance, but it'd be interesting to try. (If it performs acceptably, we could then also ask freethreaded CPython experts if this is sound. My hunch is that it would be.)

I also thought of this implementation but as of the following issue this is not sufficient.

python/cpython#120858

If I'm reading correctly, isn't the point of that issue precisely that it's permitting us to add locking here around each call to PyDict_Next if we so wanted? The concern about borrowed references is not relevant here because we immediately incref them, and we can do that before releasing the critical section. Cc @colesbury

@davidhewitt is right about the borrowed references issue not being relevant here because PyO3 would be doing it's own locking around PyDict_Next() with incref inside the lock.

That's still not ideal, but it might be a reasonable starting point. It's much better to lock around the entire loop, both because of the performance issue and because you will see a consistent view of the dict. The locking only around PyDict_Next() allows for concurrent modifications in between each call, so you're going to have more cases that panic due to concurrent modifications, which would have been prevented by the GIL or a loop-scoped lock.

Another alternative is to copy the dict inside the iterator and iterate over the copy. It's probably cheaper than locking around each call.

Thanks for clearing this up! Copying the dict sounds like the easiest solution for now. When we finalize a critical section api we can consider moving the responsibility of locking the dict (during the whole iteration) on free-threaded builds to the user and remove the copy() and panic on concurrent modifications.

That said, I think if the loop executes arbitrary python code then it is still possible for the dict to be modified during iteration under a critical section, because it may be suspended by a nested section which then modifies the dict.

I feel like users are more likely to be able to know for their use case if copying or locking per iteration is more acceptable. I wonder if we need to split .iter() into multiple methods?

I opened #4571 to suggest a way forward on this.

davidhewitt · 2024-09-29T16:05:50Z

👍 thanks! I will try to read tonight or tomorrow evening!

bschoenmaeckers · 2024-09-29T18:29:21Z

src/types/dict.rs

+
+        for (key, value) in self {
+            if let Err(err) = closure(key, value) {
+                unsafe { ffi::PyCriticalSection_End(&mut section) };


It occurred to me that this is probably not safe when the closure panics. I have to create a Guard here and call PyCriticalSection_End when dropping it after the iteration.

For what it's worth, I have a half-completed wrapper around the critical section API using an RAII guard pattern. The tricky part is writing a test to make sure that the critical section is closed if a panic happens.

I'll probably have a PR for it tomorrow if you don't mind waiting for that :)

Great! I just came up with the following wrapper but love to see your version tomorrow and use that one instead.

#[inline] #[cfg(Py_GIL_DISABLED)] fn critical_section<R, F>(op: *mut ffi::PyObject, closure: F) -> R where F: FnOnce() -> R, { struct Guard(ffi::PyCriticalSection); impl Drop for Guard { fn drop(&mut self) { unsafe { ffi::PyCriticalSection_End(&mut self.0); } } } let _guard = unsafe { let mut section = std::mem::zeroed(); ffi::PyCriticalSection_Begin(&mut section, op); Guard(section) }; closure() }

Mine's called with_critical_section but it's basically the same thing, I'll do a PR with docs and tests tomorrow.

Here's the API @davidhewitt and I worked out on Friday:

/// Executes a closure on an object with an active critical section. /// /// Locks modifications from other threads while the closure is executing. /// This is structurally equivalent to the use of the paired /// Py_BEGIN_CRITICAL_SECTION and Py_END_CRITICAL_SECTION macros exposed /// by the C API. /// /// This function is a no-op on Python versions older than 3.13, where /// the critical section API is not exposed by the C API. pub fn with_critical_section<F, R>(object: &Bound<'_, PyAny>, f: F) -> R where F: FnOnce() -> R,

I think taking a Bound is nicer for a user-facing API and we decided that this is a safe abstraction, since while it is possible to do unsafe stuff, you need to use unsafe to do it (e.g. accessing dict or list internals inside a critical section). Also it doesn't need to be config-ed out, since it can just be a no-op on the GIL-enabled build.

davidhewitt

This is looking really good, thanks! Just a few final points really...

davidhewitt · 2024-09-30T20:07:13Z

src/types/dict.rs

+#[inline]
+#[cfg(Py_GIL_DISABLED)]
+pub fn with_critical_section<F, R>(object: &Bound<'_, PyAny>, f: F) -> R
+where
+    F: FnOnce() -> R,
+{
+    struct Guard(ffi::PyCriticalSection);
+
+    impl Drop for Guard {
+        fn drop(&mut self) {
+            unsafe {
+                ffi::PyCriticalSection_End(&mut self.0);
+            }
+        }
+    }
+
+    let _guard = unsafe {
+        let mut section = std::mem::zeroed();
+        ffi::PyCriticalSection_Begin(&mut section, object.as_ptr());
+        Guard(section)
+    };
+    f()
 }


I think @ngoldbaum and I came up with pretty much the same API, I think probably this belongs in src/sync.rs?

davidhewitt · 2024-09-30T20:10:04Z

src/types/dict.rs

+        if dict.is_exact_instance_of::<PyDict>() {
+            return BoundDictIterator::DictIter {
+                dict,
+                ppos: 0,
+                di_used: remaining,
+                remaining,
+            };
+        };
+
+        let items = dict.call_method0(intern!(dict.py(), "items")).unwrap();
+        let iter = PyIterator::from_object(&items).unwrap();
+        BoundDictIterator::ItemIter { iter, remaining }


I think there's a possible design decision to be had about the subclasses and .items() call. Do we definitely want to do this? We essentially already do this for sets, so there is a good argument we should do this here for consistency. (And I guess similar for lists?) Maybe this wants to be split into its own PR / discussion.

See also #4490 which is very related.

If removing the ItemIter from this MR makes it easier to review then I'm happy to open a second one and remove the changes from this MR.

davidhewitt · 2024-09-30T20:11:35Z

src/types/dict.rs

+        ffi::PyCriticalSection_Begin(&mut section, object.as_ptr());
+        Guard(section)


I think moving the section might be incorrect here (changes its address), probably need to allocate the critical section on the stack and then store &mut ffi::PyCriticalSection in the guard.

I use @ngoldbaum's implementation now, so we can make sure #4587 is correct before merging this MR.

davidhewitt · 2024-09-30T20:14:52Z

src/types/dict.rs

+        })
+    }
+
+    #[inline]


Nice touch to add implementation of the below methods on stable 👍

davidhewitt · 2024-09-30T20:15:29Z

src/types/dict.rs

+    /// Iterates over the contents of this dictionary while preventing other threads from modifying it.
+    #[cfg(Py_GIL_DISABLED)]
+    fn locked_for_each<F>(&self, closure: F) -> PyResult<()>
+    where
+        F: Fn(Bound<'py, PyAny>, Bound<'py, PyAny>) -> PyResult<()>;


I think this API description isn't quite correct; the critical section may be released by the GC and then other threads may take the opportunity to modify the dictionary before this thread can reacquire the critical section.

All the critical section really guarantees is that this thread can safely call PyDict_Next and take ownership of the borrowed references before any other critical section gets nested.

I guess it is potentially nice to expose this API to users, but it's a bit tricky to find a good wording.

It's a bit stronger than that, but I agree that it's tricky to find the right wording. Threads won't pause just anywhere for the GC -- they pause at pretty much the same places where the GIL could have been released in the default build.

So in practice, this means that if closure() doesn't make any API calls that could have released the GIL internally, then the dictionary will remain locked for the entire iteration.

I have updated the api description, let me know if this is what you were looking for.

ngoldbaum · 2024-10-01T17:24:21Z

src/types/dict.rs

+                ControlFlow::Break(x) => Some(x),
+            }
+        })
+    }


I'm going to go through all of these today and add implementations for PyList as well. I'll comment if I have any suggestions or questions but hopefully it should be straightforward with the worked example in front of me!

bschoenmaeckers force-pushed the PyDict_next_lock branch 2 times, most recently from d90dc42 to c0136f7 Compare August 14, 2024 13:57

mejrs requested changes Aug 14, 2024

View reviewed changes

ngoldbaum mentioned this pull request Aug 16, 2024

Tracking issue for no-gil/freethreaded work #4265

Open

14 tasks

ngoldbaum mentioned this pull request Aug 23, 2024

add bindings for critical section API #4477

Merged

bschoenmaeckers force-pushed the PyDict_next_lock branch from c0136f7 to 562fb4e Compare August 28, 2024 16:50

bschoenmaeckers commented Aug 28, 2024

View reviewed changes

bschoenmaeckers changed the title ~~Add PyCriticalSection lock to Dict iterator~~ Make PyDict iterator compatible with free-threaded build Aug 29, 2024

bschoenmaeckers force-pushed the PyDict_next_lock branch 3 times, most recently from 0b8f4c6 to ae0ee72 Compare August 29, 2024 16:53

davidhewitt reviewed Aug 31, 2024

View reviewed changes

bschoenmaeckers force-pushed the PyDict_next_lock branch from d5a7330 to afab9b2 Compare September 3, 2024 18:08

bschoenmaeckers marked this pull request as ready for review September 3, 2024 18:09

bschoenmaeckers force-pushed the PyDict_next_lock branch from afab9b2 to 3ace91a Compare September 3, 2024 18:44

davidhewitt mentioned this pull request Sep 13, 2024

Don't use PyList.get_item_unchecked() on free-threaded build #4539

Merged

ngoldbaum mentioned this pull request Sep 20, 2024

Add locked iterations APIs for dicts and lists #4571

Open

davidhewitt added the CI-build-full label Sep 27, 2024

bschoenmaeckers force-pushed the PyDict_next_lock branch 4 times, most recently from aaf3774 to 380ccf5 Compare September 29, 2024 15:23

bschoenmaeckers force-pushed the PyDict_next_lock branch 2 times, most recently from 9687bc8 to bb63414 Compare September 29, 2024 18:10

bschoenmaeckers commented Sep 29, 2024

View reviewed changes

bschoenmaeckers force-pushed the PyDict_next_lock branch from 5529e08 to 547c283 Compare September 30, 2024 09:17

davidhewitt reviewed Sep 30, 2024

View reviewed changes

bschoenmaeckers force-pushed the PyDict_next_lock branch 2 times, most recently from e38323c to bfbd5bb Compare October 1, 2024 08:59

ngoldbaum reviewed Oct 1, 2024

View reviewed changes

davidhewitt mentioned this pull request Oct 1, 2024

migrate PyDictMethods trait bounds #4493

Merged

bschoenmaeckers force-pushed the PyDict_next_lock branch 3 times, most recently from ed2d413 to ac9fcf9 Compare October 7, 2024 08:13

bschoenmaeckers added 11 commits October 8, 2024 09:01

Iterate over dict items in DictIterator

f9aa429

Use python instead of C API to get dict items

3183aaa

Use plain dict iterator when not on free-threaded & not a subclass

6d69041

Copy dict on free-threaded builds to prevent concurrent modifications

cdc4b92

Add test for dict subclass iters

a45310a

Implement PyDict::locked_for_each

e82ef74

Lock BoundDictIterator::next on each iteration

d8a4c0d

Implement locked fold & try_fold

b2880a6

Implement all,any,find,find_map,position when not on nightly

0e8ef24

Add changelog

4cb0fa1

Use critical section wrapper

f79dc05

bschoenmaeckers force-pushed the PyDict_next_lock branch from ac9fcf9 to f79dc05 Compare October 8, 2024 07:40

		ffi::PyCriticalSection_Begin(&mut section, object.as_ptr());
		Guard(section)

Make PyDict iterator compatible with free-threaded build #4439

Are you sure you want to change the base?

Make PyDict iterator compatible with free-threaded build #4439

Conversation

bschoenmaeckers commented Aug 14, 2024 • edited Loading

ngoldbaum commented Aug 14, 2024

mejrs left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

davidhewitt commented Aug 16, 2024

ngoldbaum commented Aug 23, 2024

bschoenmaeckers commented Aug 23, 2024

bschoenmaeckers commented Aug 23, 2024

codspeed-hq bot commented Aug 28, 2024 • edited Loading

Merging #4439 will not alter performance

Summary

ngoldbaum commented Aug 28, 2024

bschoenmaeckers commented Aug 28, 2024 • edited Loading

bschoenmaeckers commented Aug 28, 2024

ngoldbaum commented Aug 28, 2024

davidhewitt commented Aug 28, 2024

Choose a reason for hiding this comment

bschoenmaeckers Aug 28, 2024 • edited Loading

Choose a reason for hiding this comment

bschoenmaeckers commented Aug 28, 2024 • edited Loading

bschoenmaeckers commented Aug 29, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

davidhewitt commented Sep 29, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ngoldbaum Sep 29, 2024 • edited Loading

Choose a reason for hiding this comment

davidhewitt left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Make `PyDict` iterator compatible with free-threaded build #4439

Make `PyDict` iterator compatible with free-threaded build #4439

bschoenmaeckers commented Aug 14, 2024 •

edited

Loading

mejrs left a comment •

edited

Loading

codspeed-hq bot commented Aug 28, 2024 •

edited

Loading

bschoenmaeckers commented Aug 28, 2024 •

edited

Loading

bschoenmaeckers Aug 28, 2024 •

edited

Loading

bschoenmaeckers commented Aug 28, 2024 •

edited

Loading

ngoldbaum Sep 29, 2024 •

edited

Loading