Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why object creation is slow in PyO3 and how to enhance it #679

Closed
kngwyu opened this issue Dec 1, 2019 · 14 comments · Fixed by #887
Closed

Why object creation is slow in PyO3 and how to enhance it #679

kngwyu opened this issue Dec 1, 2019 · 14 comments · Fixed by #887

Comments

@kngwyu
Copy link
Member

kngwyu commented Dec 1, 2019

This is a support issue following #661, where I explain why PyO3’s object creation can be slow and how we can speed up this.
Since I joined this project a year ago, I’m not sure my understanding is correct, especially about the historical aspect of the design.

cc: @ijl @ehiggs

TL; DR

  • We have an overhead around object creation.
  • For PyObject and Py<T>, we cannot remove this overhead.
  • For current &PyAny, &PyDict or so, we can remove this overhead by changing them to PyAny<'py>, PyDict<'py>, and so on.
  • But for it to come true, we need a careful design decision about how to handle owned/borrowed pointers.
  • Once this would come true, we could replace many usages of PyObject with low-cost PyAny<'py> types(and then we should rename PyAny with PyObject).

First, let’s revisit our 2 kinds of object

It is a really confusing thing, but we have 2 types for representing Python’s object, named PyObject and PyAny .
So, what’s the difference between them?
The answer is PyAny is forced to use as &’py PyAny, which has the same lifetime as GILGuard, but PyObject isn’t.

    let any: &PyAny = {
        let gil = Python::acquire_gil();
        let dict: &PyDict = PyDict::new(gil.python());
        dict.as_ref()
    };

Thus, this snippet causes a compile error, because &PyAny has the same lifetime as GILGuard.
CPython has a reference-count based GC, which really differs from Rust’s lifetime based memory management system. For integrating these two systems, it’s a natural choice to represent Python objects’ lifetime by GILGuard(= our RAII guard that corresponds to Python’s thread lock).

In contrast, PyObject can exist even after GILGuard drops.

    let obj: PyObject = {
        let gil = Python::acquire_gil();
        let dict: &PyDict = PyDict::new(gil.python());
        dict.to_object()
    };

This ‘not bounded’ nature of PyObject helps users to write complicated extensions. For example, you can send PyObject into another thread, though you cannot send PyAny.
However, &PyAny is a sufficient and reasonable choice for many use cases.

For other types than PyAny like PyDict, this ‘not bounded’ types are represented by Py<T> wrapper type.

Bounded by GIL Not bounded
&PyAny PyObject
&PyDict, &PyList, … Py<PyDict>, Py<PyList>, …

PyObject retrieval and object storage

Then, how we ensure that PyObject can be used after GILGuard drops?
Let’s think about the this situation.

fn object() -> PyObject {
    let gil = Python::acquire_gil();
    // do something
    make_object(gil.python())
}
{
    let obj = object();
    let gil = Python::acquire_gil();
    // do something
}

First, we need to ensure that PyObject is not deallocated after first GILGuard drops by incrementing
the reference count of the object when creating it.
Then we have to decrement the reference count when it drops, but here comes the problem.
Let’s recall that values are dropped in reverse order of declaraton in Rust, which means obj drops after gil drops.

{
    let obj = object();
    let gil = Python::acquire_gil();
    // do something
    implicit_drop(gil);
    implicit_drop(obj);
}

This is a really problematic thing, because we cannot touch any Python objects when we have no GIL.

To prevent this behavior, we have object storage that stores object pointers.
When PyObject drops, we don’t decrement its reference count but store its internal pointer to the storage.
Then after we release another GILGuard, we decrement reference counts of released PyObjects.

Yeah, object storage is a core overhead we discuss in this issue and ideally should be removed.
But for PyObject, I have no idea other than using storage to enable this complicated behavior.

&’py Lifetime and object storage

For &PyAny, the situation is a bit simpler.
What we want to do for &Py~ types is just forcing them to have the same lifetime as GILGuard.
So when we create &Py~ types, we store its internal pointer to the object storage and returns the reference to the pointer in the storage.
And when GILGuard drops, we decrement the object’s reference count for owned objects and do nothing for borrowed1 objects.
To enable this conditional behavior, we have 2 object storages(for owned/borrowed) for &Py~ types.

  1. This owned/borrowed does not mean Rust’s owned value and reference, but the term used in Python C-API doc. In the CPython doc, ‘borrowed’ object means we shouldn’t increment the reference count of the object. E.g., CPython doc says

Note that any Python object references which are provided to the caller are borrowed references; do not decrement their reference count!

How we can remove this overhead

So, yeah, for &Py~ types what we do is just decrementing reference count or doing nothing when it drops.
We can do this operation without the internal storage by Drop::drop.
Then we would have PyAny<’py>, PyDict<’py>, and so on instead of &PyAny, &PyDict or so.
Thus this would be a really breaking change.
However, since I don’t think our current API that uses reference only for representing lifetime is not reasonable, I’m sure it is worth considering.

The problem is how we distinguish borrowed and owned object without two types of object storage.
A possible idea is the use of a wrapper type, say, PyOwned<PyAny>.
This design is clear and zero-cost but it requires lots of API changes.
Another possible idea is to use a boolean flag to represent if the object is owned or not.

struct PyAny<'py> {
    pointer: NonNull<ffi::PyObject>,
    owned: bool,
}

It doesn’t force users to rewrite lots of codes but needs some runtime costs.

We need discussion and help

We appreciate any idea, discussion, and PRs around this area, especially about zero-cost object types design.
Thanks.

@kngwyu kngwyu pinned this issue Dec 5, 2019
@nagisa
Copy link
Contributor

nagisa commented Dec 8, 2019

I feel that perhpas "bounded by GIL" and "not bounded" is a wrong distinction to make and implement, at least at the core construct level. Instead we should just use "owned reference" (Py<Py*>, adjusts refcount on creation and drops) and "borrowed reference" (&Py*, no refcount opereations on creation or drop).

Neither of those would be tied to any particular GIL lock and would require a GIL token to work with. Once we do that, we ought to fix our APIs to properly use these types. For instance, the new methods for native types should return owned references, not &T ones.

We could then look for a way to add versions of owned and borrowed references that are bound to a particular instance of a GIL guard, but IMO those should be an add-on rather than a core construct.

This, I believe, would remove the book-keeping overhead in object creation and deletion in most, but not all, cases. For example, consider this exceedingly common pattern (in pseudocode):

// now
fn myfn(...) {
    let my_thing = Py*::new(py); // calls gil::register_owned(...) = overhead!
    let return_object = my_think.into_object(); // overhead, increasing refcount on `my_thing`.
    drop(gil); // internal overhead, reducing refcount on stuff registered with `gil::register_owned`!
    return return_object.into_ptr();
}

// adjusted
fn some_glue(...) -> ... {
    let my_thing: Py<Py*> = Py*::new(py); // does not need to call `gil::register_owned`.
    let return_object: Py<PyAny> = my_think.into_object(); // effectively a pointer cast
    drop(gil); // does not touch my_thing refcount...
    return return_object.into_ptr();
}

Then gil::register_owned overhead would remain necessary only in the instances where we are able to get an owned reference to be dropped after GIL is unlocked, as in your example from code above:

{
    let obj: Py<Py*> = object();
    let gil = Python::acquire_gil();
    // do something
    implicit_drop(gil);
    implicit_drop(obj);
}

It should be possible to mitigate this overhead here as well by doing something along the lines of:

#[thread_local]
static mut WE_HOLD_THE_GIL = false;

impl Drop for Py<T> {
    fn drop(&mut self) {
        if WE_HOLD_THE_GIL { // super cheap hot branch
             decref(self)
        } else {
             // will be decrefd once pyo3 takes GIL lock.
             register_to_decref_on_next_lock(self)
        }
    }
}

@nagisa
Copy link
Contributor

nagisa commented Dec 8, 2019

Somewhat related, something that could inform our design a little: in order to enable implementation of Python::finalize (Py_Finalize) we should be releasing objects in our object storage on entry to GIL rather than on exit, because otherwise we open ourselves up for double-frees.

@pganssle
Copy link
Member

pganssle commented Dec 8, 2019

For integrating these two systems, it’s a natural choice to represent Python objects’ lifetime by GILGuard(= our RAII guard that corresponds to Python’s thread lock).

I am having trouble totally wrapping my head around how the two object management philosophies mesh, but I thought I would chime in to point out that this is not strictly true: GILGuard does not ensure that you hold the GIL during its lifetime, it ensures that you hold the GIL whenever you are successfully executing Rust code during the lifetime of the GIL guard. Any calls to Python APIs can release and re-acquire the GIL, which is why this hack was necessary. This makes me think that the idea of tying object lifetimes to the lifetime of a given GILGuard object might be the wrong abstraction.

I still don't totally understand the object storage stuff, but it seems like acquiring the gil just around refcount operations (and other operations where we actually perform Python operations that require the GIL) would solve this problem, no?

@nagisa
Copy link
Contributor

nagisa commented Dec 8, 2019

I still don't totally understand the object storage stuff, but it seems like acquiring the gil just around refcount operations (and other operations where we actually perform Python operations that require the GIL) would solve this problem, no?

Technically it would, yes, but you don’t really want GIL to be acquired and released for each and single local that is being dropped – and there could be quite many of them – inefficiency in doing so is exactly what makes it worth implementing the object storage.

Then, again, if we stop putting every single object we create into this storage, then perhaps just taking GIL for the !WE_HOLD_THE_GIL branch would be good enough.

@kngwyu
Copy link
Member Author

kngwyu commented Dec 9, 2019

Instead we should just use "owned reference" (Py<Py*>, adjusts refcount on creation and drops) and "borrowed reference" (&Py*, no refcount opereations on creation or drop).

Bounded by the current design, I didn't come up with such a design but representing borrowed pointer by &Py sounds really natural and good choice.

It should be possible to mitigate this overhead here as well by doing something along the lines of:
#[thread_local]
static mut WE_HOLD_THE_GIL = false;
...

And I like this thread_local too.

I thought I would chime in to point out that this is not strictly true: GILGuard does not ensure that you hold the GIL during its lifetime, it ensures that you hold the GIL whenever you are successfully executing

Ah, thank you for pointing it out. Maybe my explanation should be refined.

Any calls to Python APIs can release and re-acquire the GIL, which is why this hack was necessary. This makes me think that the idea of tying object lifetimes to the lifetime of a given GILGuard object might be the wrong abstraction.

Yeah, just owned or borrowed model can be much clearer.

@davidhewitt
Copy link
Member

I think that &Py* for borrowed, Py<Py*> for owned is a great idea.

Then, again, if we stop putting every single object we create into this storage, then perhaps just taking GIL for the !WE_HOLD_THE_GIL branch would be good enough.

I would think that this is likely to be true. C++'s pybind11 doesn't actually check the GIL state at when decreasing references for owned values. This isn't perfect, though in my experience only creates problems in very rare circumstances. If we had a check for the GIL, from my experience 99% of the time we'll have the GIL, and the slow path will be rarely hit.

It's worth remembering that the GIL lock is ref-counted, so even in the example code we've been discussing:

{
    let obj: Py<Py*> = object();
    let gil = Python::acquire_gil();
    // do something
    implicit_drop(gil);
    implicit_drop(obj);
}

The GIL might well still be held after implicit_drop(gil) - all that's doing is decreasing the GIL lock reference count by one 😄

@davidhewitt
Copy link
Member

Also, for checking if we have the GIL: there's an API PyGILState_Check(). But perhaps it's faster if we maintain the information in a static mut bool as proposed.

@davidhewitt
Copy link
Member

davidhewitt commented Feb 8, 2020

A data point for the discussion here: the following code, from a question on Gitter, will cause memory consumption to indefinitely grow when inside the loop. I think it's from pyo3's use of GilPool to hold all references, meaning none of the temporary PyDict objects ever get freed.

#[pymethods]
impl Test {
    #[new]
    #[args(callback = "None")]
    fn new(py: Python) -> PyResult<Self> {
        Ok(Test{n:0})
    }

    fn run(&mut self,callback: PyObject, py: Python) -> PyResult<()> {
        let d=PyDict::new(py);
        loop {
            d.set_item("val", self.n)?;
            callback.call(py, (d,),None)?;
            self.n+=1;
        }
    }
}

EDIT: Just noticed #311 has pretty much the same example.

@kngwyu
Copy link
Member Author

kngwyu commented Apr 30, 2020

Here I propose a new idea to remove the internal storage for borrowed objects.
It would look like

pub struct PyAny(ffi::PyObject);

impl PyAny {
    fn from_borrowed_ptr(py: Python, ptr: *mut ffi::PyObject) -> &Self {
        ptr as _
    }
    fn from_owned_ptr(py: Python, ptr: ...) -> &Self {
        py.register_ptr(ptr);
        ptr as _
    }
}

Pros

  1. Compatibility. It doesn't require so much user efforts for migration.
  2. Relationship with PyCell. Now we have many abstraction layers to treat &PyCell(=pointer) and &PyAny(=pointer to pointer) as the same, but with this formulation of PyAny we don't need them.

Cons

Apparently, it solves nothing for owned pointers. Still getting many owned objects can cost.
So I think we should add 3rd constructor for this:

impl PyAny {
    // or better name
    fn as_owned(py: Python, ptr: ...) -> Py<Self> {
        ...
    }
}

Py already has Drop implementation that decrements the reference count, which means it is an owned object. So we can use Py as Owned type.

@kngwyu
Copy link
Member Author

kngwyu commented Apr 30, 2020

Or from_owned_ptr should return Py<Self>? 🤔

@davidhewitt
Copy link
Member

It's an interesting idea, we should definitely explore it. I think that most of the return types would have to become Py<T> though, which means user will have to write as_ref(py) a lot. Maybe that's an acceptable compromise.

I wonder whether we can skip actually having the concrete layout and just have struct PyAny(Unsendable);

After I realised #883 last night I think getting owned pointers performant and ergonomic is probably more important than borrowed pointers, because I think most return types will have to become owned pointers.

@kngwyu
Copy link
Member Author

kngwyu commented May 1, 2020

I wonder whether we can skip actually having the concrete layout and just have struct PyAny(Unsendable)

Yeah, we can skip it... but PyAny(UnsafeCell<ffi::PyObject>) would be easier to implement.

After I realised #883 last night I think getting owned pointers performant and ergonomic is probably more important than borrowed pointers, because I think most return types will have to become owned pointers.

Hmm... 🤔 I don't think either owned or borrowed object is important.

For a pyfunction, args/kwargs is obtained by from_borrowed and the return type is obtained by from_owned.

#[pyfunction] 
fn py_function(arg: Vec<i32>, kwargs: &PyDict) -> PyResult<SomeRustType> {
...
}

@davidhewitt
Copy link
Member

Hmm... 🤔 I don't think either owned or borrowed object is important.

For a pyfunction, args/kwargs is obtained by from_borrowed and the return type is obtained by from_owned.

Agreed. What I was thinking is that general users will be making a lot of owned objects from these borrowed pointers, so they should hopefully be easy to work with.

@kngwyu
Copy link
Member Author

kngwyu commented May 1, 2020

general users will be making a lot of owned objects from these borrowed pointers,

You mean iterating list/dict or so?
I think we need with_borrowed APIs for that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants