Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Require Send in addition to Sync on Py::get and PyCell::get #3169

Closed
wants to merge 1 commit into from

Conversation

adamreichold
Copy link
Member

@adamreichold adamreichold commented May 20, 2023

While the unsendable parameter and the thread checker help us detect accesses to unsendable types at runtime, they do not prevent sending such types to other threads in the first place where Py::get and PyCell::get provide direct access without checking thread affinity.

So we could either call the thread checker for both methods or we add the additional Send bound. I opted for the latter to keep the zero overhead approach of the methods intact.

@adamreichold adamreichold added bugfix CI-skip-changelog Skip checking changelog entry labels May 20, 2023
@adamreichold
Copy link
Member Author

bors try

bors bot added a commit that referenced this pull request May 20, 2023
@bors
Copy link
Contributor

bors bot commented May 20, 2023

try

Build failed:

@adamreichold
Copy link
Member Author

bors try

bors bot added a commit that referenced this pull request May 21, 2023
@adamreichold
Copy link
Member Author

I noticed that our versioning of the compile error tests does not really meaning that the tests do not really pass on 1.63 even though we have "since-1.63" test cases. I wonder if we should simplify this (in a separate PR) to have "testcases limited to nightly" and "testcases limited to not our MSRV (1.48)" as everything else effectively tracks stable anyway due to the brittleness of the stderr files.

@bors
Copy link
Contributor

bors bot commented May 21, 2023

try

Build succeeded!

The publicly hosted instance of bors-ng is deprecated and will go away soon.

If you want to self-host your own instance, instructions are here.
For more help, visit the forum.

If you want to switch to GitHub's built-in merge queue, visit their help page.

Copy link
Member

@kngwyu kngwyu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds reasonable.

While the unsendable parameter and the thread checker help us detect accesses to
unsendable types at runtime, they do not prevent sending such types to other
threads in the first place where Py::get and PyCell::get provide direct access
without checking thread affinity.

So we could either call the thread checker for both methods or we add the
additional Send bound. I opted for the latter to keep the zero overhead approach
of the methods intact.
@adamreichold adamreichold mentioned this pull request May 21, 2023
7 tasks
@davidhewitt
Copy link
Member

I'm not yet convinced we need the Send bound on T here, I think Sync is enough. I could definitely be wrong here, so we should think this through carefully.

There is a lot of prior discussion when we originally debated what bounds #[pyclass] needed when we decided only Send was relevant in #948

I'm currently relying on the definition that "T is Sync only if &T is Send", as stated in the docs . My understanding is that these ::get() methods are only giving us access to &T, i.e. they're not actually moving ownership of T between threads, just allowing unsynchronised access from multiple threads.

On the other hand, we decided on Send checking (but not Sync) for #[pyclass] because the way that .borrow()/.borrow_mut() worked with the GIL guaranteed that the contained T was really only accessible from a single thread at a time. The analogy we fell back to was that Mutex<T> is Sync if T: Send.

I'm open to being convinced that T: Send is necessary here. Maybe is there an example of something which would be unsound without T: Send, or simply just a logical argument why T: Send is required?

I suppose we should focus on this: what examples of Sync but !Send types can we come up with, and would Py<T>::get() create unsoundness if used with such a type?

@adamreichold
Copy link
Member Author

I think we only recently had a similar discussion with a comment from birkenfeld on URLO resolving this one way or the other in an issue or PR here but I currently cannot find it...

ATM, I am even more confused. Let's say I even have a !Send + !Sync type, I put it inside a #[pyclass(unsendable)] and that inside a Py which is now Send + Sync. Even though I will not be able to safely access it, I send the Py to another thread, acquire the GIL there and drop the Py so that the class and the interior !Send + !Sync type on a different thread that it was created on. Admittedly, that has nothing to do with Py::get, but rather that PyCell's Drop impl does not invoke the thread affinity check, or does it?

@adamreichold
Copy link
Member Author

Indeed, the test case

#[test]
fn drop_unsendable_elsewhere() {
    use std::cell::RefCell;
    use std::rc::Rc;
    use std::thread::{current, spawn, ThreadId};

    #[pyclass(unsendable)]
    struct Unsendable {
        thread_id: Rc<RefCell<ThreadId>>,
    }

    impl Drop for Unsendable {
        fn drop(&mut self) {
            let created = *self.thread_id.borrow();
            let dropped = current().id();
            assert_eq!(
                created, dropped,
                "I was created on {created:?}, but dropped on {dropped:?}."
            );
        }
    }

    let unsendable = Python::with_gil(|py| {
        Py::new(
            py,
            Unsendable {
                thread_id: Rc::new(RefCell::new(current().id())),
            },
        )
        .unwrap()
    });

    spawn(move || {
        Python::with_gil(move |_py| {
            drop(unsendable);
        });
    });
}

prints

pyo3_runtime.PanicException: assertion failed: `(left == right)`
  left: `ThreadId(7)`,
 right: `ThreadId(22)`: I was created on ThreadId(7), but dropped on ThreadId(22).

@adamreichold
Copy link
Member Author

What is worse: I think in the above case, we can only abort the process. While leaking is safe, this would be not-running-drop-but-freeing-the-memory-anyway.

@adamreichold
Copy link
Member Author

My understanding is that these ::get() methods are only giving us access to &T, i.e. they're not actually moving ownership of T between threads, just allowing unsynchronised access from multiple threads.

I think @birkenfeld's comment from URLO still applies: Send is a misnomer. It is about being accessed from multiple threads, even with mutual exclusion, not just transferring ownership. Sync is being accesses from multiple threads at the same time. (The discussion I was referring to was part of our effort to add GILProtected.)

I think this is the same underlying issue why Mutex<T>: Send + Sync requires T: Send. I admittedly still have a hard time coming up with a natural example, but the idea would be some type that does have an affinity to some thread but is Sync nevertheless. With the above explanation of Send and Sync in mind, such types probably do not really exist

(MutexGuard<T> is not Send but Sync, but only if T: Sync and it captures a lifetime. Similarly, structs with only &mut self method receivers could be Sync but not Send but would still not produce an issue. Meaning these are examples which are about transferring ownership, not just having an affinity to some thread.)

For the sake of argument, we could also turn this around: Better safe than sorry if we do not have any example of useful types which would be ruled out by the additional trait bound?

@davidhewitt
Copy link
Member

Hmm, that drop behavior is worrying, I confess I thought we did panic though clearly that is not the case.

So I'm now warming to the idea we add this bound.

I think there's a further question about what to do with #[pyclass(unsendable)]. We do have the option of leaking because we could still move the T into a box and forget it. I think at the very least we should also use PyErr::write_unraisable to report the issue.

I'm now feeling somewhat tempted to deprecate unsendable to simplify the framework and just put T: Send as a general #[pyclass] requirement in a couple releases time, if users don't report a compelling reason to keep it.

@adamreichold
Copy link
Member Author

Hmm, that drop behavior is worrying, I confess I thought we did panic though clearly that is not the case.

Will open a separate PR to discuss and handle the Drop issue as it is not really related to Py::get.

@adamreichold
Copy link
Member Author

So while this appears to clash with the definition of Send reiterated for example in #948 (comment), all practical examples of !Send + Sync types that I could find are not problematic. Either because they are uselessly Sync by having no &self methods or because the &mut self part of their interface is blocked by the class being frozen, e.g. MutexGuard<'static, i32> cannot produce a data race because I cannot use DerefMut to access the i32.

So while MutexGuard<'static, i32> is probably not useful to put into a class either, I am closing this as I just cannot make a convincing argument (even to myself) that these bounds are really necessary.

@adamreichold adamreichold deleted the send-py-pycell-get branch May 22, 2023 07:59
@davidhewitt
Copy link
Member

Ok. Happy to revisit this at any time; I'm reasonably comfortable that this is sound however if we have further thoughts that can convince us either way it would be good to review them.

@davidhewitt
Copy link
Member

Just to round this out, this argument on Stack Overflow further convinced me that we are sound here:

Using the above definitions, it should be apparent why there are few examples of types that are Sync but not Send. If an object can be used safely by two threads at the same time (Sync) then it can be used safely by two threads at different times (Send). Hence, Sync usually implies Send. Any exception probably relates to Send's transfer of ownership between threads, which affects which thread runs the Drop handler and deallocates the value.

So I believe we will properly handle the traits:

  • T: Send we can permit access by multiple threads at different times mediated by the GIL, and during Drop.
  • T: Sync (and frozen) we can permit access to &T with Py::get to multiple threads at the same time.

bors bot added a commit that referenced this pull request May 25, 2023
3176: Prevent dropping unsendable classes on other threads. r=davidhewitt a=adamreichold

Continuing the discussed from #3169 (comment) and #3169 (comment):

We already have checks in place to avoid borrowing these classes on other threads but it was still possible to send them to another thread and drop them there (while holding the GIL).
    
This change avoids running the `Drop` implementation in such a case even though Python will still free the underlying memory. This might leak resources owned by the object, but it avoids undefined behaviour due to access the unsendable type from another thread.
    
This does assume that the object was not unsafely integrated into an intrusive data structures which still point to the now freed memory. In that case, the only recourse would be to abort the process as freeing the memory is unavoidable when the tp_dealloc slot is called. (And moving it elsewhere into a new allocation would still break any existing pointers.)

Co-authored-by: Adam Reichold <adam.reichold@t-online.de>
bors bot added a commit that referenced this pull request May 25, 2023
3176: Prevent dropping unsendable classes on other threads. r=davidhewitt a=adamreichold

Continuing the discussed from #3169 (comment) and #3169 (comment):

We already have checks in place to avoid borrowing these classes on other threads but it was still possible to send them to another thread and drop them there (while holding the GIL).
    
This change avoids running the `Drop` implementation in such a case even though Python will still free the underlying memory. This might leak resources owned by the object, but it avoids undefined behaviour due to access the unsendable type from another thread.
    
This does assume that the object was not unsafely integrated into an intrusive data structures which still point to the now freed memory. In that case, the only recourse would be to abort the process as freeing the memory is unavoidable when the tp_dealloc slot is called. (And moving it elsewhere into a new allocation would still break any existing pointers.)

Co-authored-by: Adam Reichold <adam.reichold@t-online.de>
bors bot added a commit that referenced this pull request May 25, 2023
3168: Do not apply deferred ref count updates and prevent the GIL from being acquired inside of __traverse__ implementations. r=davidhewitt a=adamreichold

Closes #2301
Closes #3165


3176: Prevent dropping unsendable classes on other threads. r=davidhewitt a=adamreichold

Continuing the discussed from #3169 (comment) and #3169 (comment):

We already have checks in place to avoid borrowing these classes on other threads but it was still possible to send them to another thread and drop them there (while holding the GIL).
    
This change avoids running the `Drop` implementation in such a case even though Python will still free the underlying memory. This might leak resources owned by the object, but it avoids undefined behaviour due to access the unsendable type from another thread.
    
This does assume that the object was not unsafely integrated into an intrusive data structures which still point to the now freed memory. In that case, the only recourse would be to abort the process as freeing the memory is unavoidable when the tp_dealloc slot is called. (And moving it elsewhere into a new allocation would still break any existing pointers.)

Co-authored-by: Adam Reichold <adam.reichold@t-online.de>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bugfix CI-skip-changelog Skip checking changelog entry
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants