Why object creation is slow in PyO3 and how to enhance it #679

kngwyu · 2019-12-01T09:32:44Z

This is a support issue following #661, where I explain why PyO3’s object creation can be slow and how we can speed up this.
Since I joined this project a year ago, I’m not sure my understanding is correct, especially about the historical aspect of the design.

cc: @ijl @ehiggs

TL; DR

We have an overhead around object creation.
For PyObject and Py<T>, we cannot remove this overhead.
For current &PyAny, &PyDict or so, we can remove this overhead by changing them to PyAny<'py>, PyDict<'py>, and so on.
But for it to come true, we need a careful design decision about how to handle owned/borrowed pointers.
Once this would come true, we could replace many usages of PyObject with low-cost PyAny<'py> types(and then we should rename PyAny with PyObject).

First, let’s revisit our 2 kinds of object

It is a really confusing thing, but we have 2 types for representing Python’s object, named PyObject and PyAny .
So, what’s the difference between them?
The answer is PyAny is forced to use as &’py PyAny, which has the same lifetime as GILGuard, but PyObject isn’t.

    let any: &PyAny = {
        let gil = Python::acquire_gil();
        let dict: &PyDict = PyDict::new(gil.python());
        dict.as_ref()
    };

Thus, this snippet causes a compile error, because &PyAny has the same lifetime as GILGuard.
CPython has a reference-count based GC, which really differs from Rust’s lifetime based memory management system. For integrating these two systems, it’s a natural choice to represent Python objects’ lifetime by GILGuard(= our RAII guard that corresponds to Python’s thread lock).

In contrast, PyObject can exist even after GILGuard drops.

    let obj: PyObject = {
        let gil = Python::acquire_gil();
        let dict: &PyDict = PyDict::new(gil.python());
        dict.to_object()
    };

This ‘not bounded’ nature of PyObject helps users to write complicated extensions. For example, you can send PyObject into another thread, though you cannot send PyAny.
However, &PyAny is a sufficient and reasonable choice for many use cases.

For other types than PyAny like PyDict, this ‘not bounded’ types are represented by Py<T> wrapper type.

Bounded by GIL	Not bounded
`&PyAny`	`PyObject`
`&PyDict`, `&PyList`, …	`Py<PyDict>`, `Py<PyList>`, …

PyObject retrieval and object storage

Then, how we ensure that PyObject can be used after GILGuard drops?
Let’s think about the this situation.

fn object() -> PyObject {
    let gil = Python::acquire_gil();
    // do something
    make_object(gil.python())
}
{
    let obj = object();
    let gil = Python::acquire_gil();
    // do something
}

First, we need to ensure that PyObject is not deallocated after first GILGuard drops by incrementing
the reference count of the object when creating it.
Then we have to decrement the reference count when it drops, but here comes the problem.
Let’s recall that values are dropped in reverse order of declaraton in Rust, which means obj drops after gil drops.

{
    let obj = object();
    let gil = Python::acquire_gil();
    // do something
    implicit_drop(gil);
    implicit_drop(obj);
}

This is a really problematic thing, because we cannot touch any Python objects when we have no GIL.

To prevent this behavior, we have object storage that stores object pointers.
When PyObject drops, we don’t decrement its reference count but store its internal pointer to the storage.
Then after we release another GILGuard, we decrement reference counts of released PyObjects.

Yeah, object storage is a core overhead we discuss in this issue and ideally should be removed.
But for PyObject, I have no idea other than using storage to enable this complicated behavior.

&’py Lifetime and object storage

For &PyAny, the situation is a bit simpler.
What we want to do for &Py~ types is just forcing them to have the same lifetime as GILGuard.
So when we create &Py~ types, we store its internal pointer to the object storage and returns the reference to the pointer in the storage.
And when GILGuard drops, we decrement the object’s reference count for owned objects and do nothing for borrowed¹ objects.
To enable this conditional behavior, we have 2 object storages(for owned/borrowed) for &Py~ types.

This owned/borrowed does not mean Rust’s owned value and reference, but the term used in Python C-API doc. In the CPython doc, ‘borrowed’ object means we shouldn’t increment the reference count of the object. E.g., CPython doc says

Note that any Python object references which are provided to the caller are borrowed references; do not decrement their reference count!

How we can remove this overhead

So, yeah, for &Py~ types what we do is just decrementing reference count or doing nothing when it drops.
We can do this operation without the internal storage by Drop::drop.
Then we would have PyAny<’py>, PyDict<’py>, and so on instead of &PyAny, &PyDict or so.
Thus this would be a really breaking change.
However, since I don’t think our current API that uses reference only for representing lifetime is not reasonable, I’m sure it is worth considering.

The problem is how we distinguish borrowed and owned object without two types of object storage.
A possible idea is the use of a wrapper type, say, PyOwned<PyAny>.
This design is clear and zero-cost but it requires lots of API changes.
Another possible idea is to use a boolean flag to represent if the object is owned or not.

struct PyAny<'py> {
    pointer: NonNull<ffi::PyObject>,
    owned: bool,
}

It doesn’t force users to rewrite lots of codes but needs some runtime costs.

We need discussion and help

We appreciate any idea, discussion, and PRs around this area, especially about zero-cost object types design.
Thanks.

The text was updated successfully, but these errors were encountered:

nagisa · 2019-12-08T14:45:24Z

I feel that perhpas "bounded by GIL" and "not bounded" is a wrong distinction to make and implement, at least at the core construct level. Instead we should just use "owned reference" (Py<Py*>, adjusts refcount on creation and drops) and "borrowed reference" (&Py*, no refcount opereations on creation or drop).

Neither of those would be tied to any particular GIL lock and would require a GIL token to work with. Once we do that, we ought to fix our APIs to properly use these types. For instance, the new methods for native types should return owned references, not &T ones.

We could then look for a way to add versions of owned and borrowed references that are bound to a particular instance of a GIL guard, but IMO those should be an add-on rather than a core construct.

This, I believe, would remove the book-keeping overhead in object creation and deletion in most, but not all, cases. For example, consider this exceedingly common pattern (in pseudocode):

// now
fn myfn(...) {
    let my_thing = Py*::new(py); // calls gil::register_owned(...) = overhead!
    let return_object = my_think.into_object(); // overhead, increasing refcount on `my_thing`.
    drop(gil); // internal overhead, reducing refcount on stuff registered with `gil::register_owned`!
    return return_object.into_ptr();
}

// adjusted
fn some_glue(...) -> ... {
    let my_thing: Py<Py*> = Py*::new(py); // does not need to call `gil::register_owned`.
    let return_object: Py<PyAny> = my_think.into_object(); // effectively a pointer cast
    drop(gil); // does not touch my_thing refcount...
    return return_object.into_ptr();
}

Then gil::register_owned overhead would remain necessary only in the instances where we are able to get an owned reference to be dropped after GIL is unlocked, as in your example from code above:

{
    let obj: Py<Py*> = object();
    let gil = Python::acquire_gil();
    // do something
    implicit_drop(gil);
    implicit_drop(obj);
}

It should be possible to mitigate this overhead here as well by doing something along the lines of:

#[thread_local]
static mut WE_HOLD_THE_GIL = false;

impl Drop for Py<T> {
    fn drop(&mut self) {
        if WE_HOLD_THE_GIL { // super cheap hot branch
             decref(self)
        } else {
             // will be decrefd once pyo3 takes GIL lock.
             register_to_decref_on_next_lock(self)
        }
    }
}

nagisa · 2019-12-08T16:10:54Z

Somewhat related, something that could inform our design a little: in order to enable implementation of Python::finalize (Py_Finalize) we should be releasing objects in our object storage on entry to GIL rather than on exit, because otherwise we open ourselves up for double-frees.

pganssle · 2019-12-08T16:19:59Z

For integrating these two systems, it’s a natural choice to represent Python objects’ lifetime by GILGuard(= our RAII guard that corresponds to Python’s thread lock).

I am having trouble totally wrapping my head around how the two object management philosophies mesh, but I thought I would chime in to point out that this is not strictly true: GILGuard does not ensure that you hold the GIL during its lifetime, it ensures that you hold the GIL whenever you are successfully executing Rust code during the lifetime of the GIL guard. Any calls to Python APIs can release and re-acquire the GIL, which is why this hack was necessary. This makes me think that the idea of tying object lifetimes to the lifetime of a given GILGuard object might be the wrong abstraction.

I still don't totally understand the object storage stuff, but it seems like acquiring the gil just around refcount operations (and other operations where we actually perform Python operations that require the GIL) would solve this problem, no?

nagisa · 2019-12-08T16:43:34Z

I still don't totally understand the object storage stuff, but it seems like acquiring the gil just around refcount operations (and other operations where we actually perform Python operations that require the GIL) would solve this problem, no?

Technically it would, yes, but you don’t really want GIL to be acquired and released for each and single local that is being dropped – and there could be quite many of them – inefficiency in doing so is exactly what makes it worth implementing the object storage.

Then, again, if we stop putting every single object we create into this storage, then perhaps just taking GIL for the !WE_HOLD_THE_GIL branch would be good enough.

kngwyu · 2019-12-09T09:43:03Z

Instead we should just use "owned reference" (Py<Py*>, adjusts refcount on creation and drops) and "borrowed reference" (&Py*, no refcount opereations on creation or drop).

Bounded by the current design, I didn't come up with such a design but representing borrowed pointer by &Py sounds really natural and good choice.

It should be possible to mitigate this overhead here as well by doing something along the lines of:
#[thread_local]
static mut WE_HOLD_THE_GIL = false;
...

And I like this thread_local too.

I thought I would chime in to point out that this is not strictly true: GILGuard does not ensure that you hold the GIL during its lifetime, it ensures that you hold the GIL whenever you are successfully executing

Ah, thank you for pointing it out. Maybe my explanation should be refined.

Any calls to Python APIs can release and re-acquire the GIL, which is why this hack was necessary. This makes me think that the idea of tying object lifetimes to the lifetime of a given GILGuard object might be the wrong abstraction.

Yeah, just owned or borrowed model can be much clearer.

davidhewitt · 2019-12-16T21:25:10Z

I think that &Py* for borrowed, Py<Py*> for owned is a great idea.

Then, again, if we stop putting every single object we create into this storage, then perhaps just taking GIL for the !WE_HOLD_THE_GIL branch would be good enough.

I would think that this is likely to be true. C++'s pybind11 doesn't actually check the GIL state at when decreasing references for owned values. This isn't perfect, though in my experience only creates problems in very rare circumstances. If we had a check for the GIL, from my experience 99% of the time we'll have the GIL, and the slow path will be rarely hit.

It's worth remembering that the GIL lock is ref-counted, so even in the example code we've been discussing:

{
    let obj: Py<Py*> = object();
    let gil = Python::acquire_gil();
    // do something
    implicit_drop(gil);
    implicit_drop(obj);
}

The GIL might well still be held after implicit_drop(gil) - all that's doing is decreasing the GIL lock reference count by one 😄

davidhewitt · 2019-12-16T21:27:12Z

Also, for checking if we have the GIL: there's an API PyGILState_Check(). But perhaps it's faster if we maintain the information in a static mut bool as proposed.

davidhewitt · 2020-02-08T09:18:05Z

A data point for the discussion here: the following code, from a question on Gitter, will cause memory consumption to indefinitely grow when inside the loop. I think it's from pyo3's use of GilPool to hold all references, meaning none of the temporary PyDict objects ever get freed.

#[pymethods]
impl Test {
    #[new]
    #[args(callback = "None")]
    fn new(py: Python) -> PyResult<Self> {
        Ok(Test{n:0})
    }

    fn run(&mut self,callback: PyObject, py: Python) -> PyResult<()> {
        let d=PyDict::new(py);
        loop {
            d.set_item("val", self.n)?;
            callback.call(py, (d,),None)?;
            self.n+=1;
        }
    }
}

EDIT: Just noticed #311 has pretty much the same example.

kngwyu · 2020-04-30T11:09:41Z

Here I propose a new idea to remove the internal storage for borrowed objects.
It would look like

pub struct PyAny(ffi::PyObject);

impl PyAny {
    fn from_borrowed_ptr(py: Python, ptr: *mut ffi::PyObject) -> &Self {
        ptr as _
    }
    fn from_owned_ptr(py: Python, ptr: ...) -> &Self {
        py.register_ptr(ptr);
        ptr as _
    }
}

Pros

Compatibility. It doesn't require so much user efforts for migration.
Relationship with PyCell. Now we have many abstraction layers to treat &PyCell(=pointer) and &PyAny(=pointer to pointer) as the same, but with this formulation of PyAny we don't need them.

Cons

Apparently, it solves nothing for owned pointers. Still getting many owned objects can cost.
So I think we should add 3rd constructor for this:

impl PyAny {
    // or better name
    fn as_owned(py: Python, ptr: ...) -> Py<Self> {
        ...
    }
}

Py already has Drop implementation that decrements the reference count, which means it is an owned object. So we can use Py as Owned type.

kngwyu · 2020-04-30T11:32:25Z

Or from_owned_ptr should return Py<Self>? 🤔

davidhewitt · 2020-05-01T07:33:57Z

It's an interesting idea, we should definitely explore it. I think that most of the return types would have to become Py<T> though, which means user will have to write as_ref(py) a lot. Maybe that's an acceptable compromise.

I wonder whether we can skip actually having the concrete layout and just have struct PyAny(Unsendable);

After I realised #883 last night I think getting owned pointers performant and ergonomic is probably more important than borrowed pointers, because I think most return types will have to become owned pointers.

kngwyu · 2020-05-01T07:58:16Z

I wonder whether we can skip actually having the concrete layout and just have struct PyAny(Unsendable)

Yeah, we can skip it... but PyAny(UnsafeCell<ffi::PyObject>) would be easier to implement.

After I realised #883 last night I think getting owned pointers performant and ergonomic is probably more important than borrowed pointers, because I think most return types will have to become owned pointers.

Hmm... 🤔 I don't think either owned or borrowed object is important.

For a pyfunction, args/kwargs is obtained by from_borrowed and the return type is obtained by from_owned.

#[pyfunction] 
fn py_function(arg: Vec<i32>, kwargs: &PyDict) -> PyResult<SomeRustType> {
...
}

davidhewitt · 2020-05-01T08:32:46Z

Hmm... 🤔 I don't think either owned or borrowed object is important.

For a pyfunction, args/kwargs is obtained by from_borrowed and the return type is obtained by from_owned.

Agreed. What I was thinking is that general users will be making a lot of owned objects from these borrowed pointers, so they should hopefully be easy to work with.

kngwyu · 2020-05-01T08:49:37Z

general users will be making a lot of owned objects from these borrowed pointers,

You mean iterating list/dict or so?
I think we need with_borrowed APIs for that.

kngwyu added the enhancement label Dec 1, 2019

kngwyu mentioned this issue Dec 1, 2019

Class creation is slow #661

Closed

kngwyu pinned this issue Dec 5, 2019

This was referenced Dec 9, 2019

Proto #1: Exception instances as Py<BaseException> #686

Closed

A soundness issue with GIL-bound references #687

Closed

kngwyu mentioned this issue Dec 23, 2019

New #[pyclass] system that is aware of its concrete layout #683

Merged

24 tasks

davidhewitt mentioned this issue Dec 23, 2019

Get rid of specialization: Roadmap #697

Closed

3 tasks

ginty mentioned this issue Jan 6, 2020

Export file structure of to_python method Origen-SDK/o2#58

Open

ijl mentioned this issue Jan 16, 2020

Use parking_lot::Mutex instead of spin::Mutex #734

Merged

davidhewitt mentioned this issue Feb 2, 2020

Flaky datetime test #720

Closed

davidhewitt mentioned this issue Feb 18, 2020

Rename PyClassShell with PyCell and do mutability checking #770

Merged

7 tasks

davidhewitt mentioned this issue Mar 5, 2020

PyBytes::new leaks memory #793

Closed

davidhewitt mentioned this issue Mar 12, 2020

guide: add type overview #801

Merged

This was referenced Apr 8, 2020

Improve performance on pointer drop #851

Merged

Remove PyObject, rename PyAny -> PyObject #863

Closed

Easy dangling PyObject with GILPool #864

Closed

Renmusxd mentioned this issue Apr 17, 2020

pyo3 large memory footprint when making lists #872

Closed

davidhewitt mentioned this issue Apr 20, 2020

IntoPyResult Trait Missing For PyResult<PyBytes> #874

Closed

davidhewitt mentioned this issue May 1, 2020

Add lifetime to PyAny etc #885

Closed

8 tasks

kngwyu mentioned this issue May 1, 2020

New Native Types and Lighter GILPool #887

Merged

kngwyu closed this as completed in #887 May 3, 2020

kngwyu unpinned this issue Jun 6, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why object creation is slow in PyO3 and how to enhance it #679

Why object creation is slow in PyO3 and how to enhance it #679

kngwyu commented Dec 1, 2019 •

edited

Loading

nagisa commented Dec 8, 2019 •

edited

Loading

nagisa commented Dec 8, 2019

pganssle commented Dec 8, 2019

nagisa commented Dec 8, 2019 •

edited

Loading

kngwyu commented Dec 9, 2019

davidhewitt commented Dec 16, 2019

davidhewitt commented Dec 16, 2019

davidhewitt commented Feb 8, 2020 •

edited

Loading

kngwyu commented Apr 30, 2020 •

edited

Loading

kngwyu commented Apr 30, 2020

davidhewitt commented May 1, 2020

kngwyu commented May 1, 2020

davidhewitt commented May 1, 2020

kngwyu commented May 1, 2020

Why object creation is slow in PyO3 and how to enhance it #679

Why object creation is slow in PyO3 and how to enhance it #679

Comments

kngwyu commented Dec 1, 2019 • edited Loading

TL; DR

First, let’s revisit our 2 kinds of object

PyObject retrieval and object storage

&’py Lifetime and object storage

How we can remove this overhead

We need discussion and help

nagisa commented Dec 8, 2019 • edited Loading

nagisa commented Dec 8, 2019

pganssle commented Dec 8, 2019

nagisa commented Dec 8, 2019 • edited Loading

kngwyu commented Dec 9, 2019

davidhewitt commented Dec 16, 2019

davidhewitt commented Dec 16, 2019

davidhewitt commented Feb 8, 2020 • edited Loading

kngwyu commented Apr 30, 2020 • edited Loading

Pros

Cons

kngwyu commented Apr 30, 2020

davidhewitt commented May 1, 2020

kngwyu commented May 1, 2020

davidhewitt commented May 1, 2020

kngwyu commented May 1, 2020

kngwyu commented Dec 1, 2019 •

edited

Loading

nagisa commented Dec 8, 2019 •

edited

Loading

nagisa commented Dec 8, 2019 •

edited

Loading

davidhewitt commented Feb 8, 2020 •

edited

Loading

kngwyu commented Apr 30, 2020 •

edited

Loading