Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New Native Types and Lighter GILPool #887

Merged
merged 6 commits into from
May 3, 2020
Merged

Conversation

kngwyu
Copy link
Member

@kngwyu kngwyu commented May 1, 2020

TL; DR

This PR changes the internal representation of PyAny from *mut ffi::PyObject to UnsafeCell<ffi::PyObject>.
Thus, &PyAny was 'a pointer to a pointer', but now it is simply a pointer.
Related parts (e.g, GILPool) are also changed.
Partly resolves #679.

What does this PR resolve?

We store almost all objects(except PyObject and Py<T>) to the internal storage, which costs when we get many objects.
This PR resolves 60% of this problem.
Since now &PyAny is a pointer, we don't need to store borrowed pointers to the object storage. So we can simply remove it. 50% of the problem is resolved by this.
However, for owned pointers, we still need the object storage so that we can do obj.refcnt -= 1 when gil drops. But we can make the storage a bit faster than now. Now we use LinkedList for the storage so that the pointer to a pointer(=&PyAny) has a consistent address, but after this PR, we can use faster Vec since we don't use that reference. I believe this optimization resolves 10% of the problem.

What does not this PR resolve?

We still have the object storage after this PR. How should we remove this?
Yeah, we can remove it if all constructors return Py<T> instead of T, say:

impl PyDict {
    pub fn new(py: Python) -> Py<PyDict> {}
}

Py can decrease its own reference count when dropping, so it would be efficient.

Comparison with #885?

#885 also tries to resolve this problem by introducing PyAny<'py>, which is a GIL-bounded 'owned' object type. One thing I'm worried about PyAny<'py> is that it requires users to change too many codes.
The pro of this PR is it does not change almost all API, though it suggests that we should gradually change constructors to return Py<T>.

@kngwyu kngwyu force-pushed the new-nativetypes branch 2 times, most recently from eaee671 to 4ae2fec Compare May 1, 2020 16:52
src/instance.rs Outdated Show resolved Hide resolved
Copy link
Member

@davidhewitt davidhewitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great change, and I would actually be in favour of merging this now! It doesn't make any real changes to the user-facing API, and it looks like a very nice optimisation.

I would think that the follow-up question to this is what do we need to do about owned pointers? As I keep seeing in #883, I think that most of the functions returning borrowed objects are subtly unsound, so I think we should be converting all functions and iterators which return borrowed objects to return owned pointers. (For those that really want to avoid the reference count increase, we could add unsafe apis back for PyDict::get_item_raw, perhaps.)

So about owned pointers, I think we have three choices:

  • &PyAny, like in this PR. We already know this is pretty nice, and so merging this PR now looks good.
  • Py<T> - I'm not sure this is very nice, because every constructor the user calls they will probably want to do:
        let list = PyList::empty(py);  # Py<PyList>
        let list = list.as_ref(py);    # &PyList
    
  • PyAny<'py> - I think this has benefits for simplicity of reference counting, but comes at a large migration cost because of the added lifetime. It's worth exploring this further in Add lifetime to PyAny etc #885 , but I don't think we're ready to merge that yet.

So in summary, I suggest we merge this PR now 😄 and then we can:

  1. Change all APIs returning borrowed pointers to return owned pointers for soundness
  2. Continue to think about which design we like for owned objects - we can take our time over this to write benchmarks and play with designs more carefully.

👍 👍 👍 👍 👍

Comment on lines -9 to +11
use crate::object::PyObject;
use crate::type_object::{PyDowncastImpl, PyTypeInfo, PyTypeObject};
use crate::type_object::{PyTypeInfo, PyTypeObject};
use crate::types::{PyAny, PyDict, PyModule, PyType};
use crate::{AsPyPointer, FromPyPointer, IntoPyPointer, PyTryFrom};
use crate::{
ffi, AsPyPointer, AsPyRef, FromPyPointer, IntoPyPointer, PyNativeType, PyObject, PyTryFrom,
};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have been thinking a bit about whether importing everything from crate like this is a good idea.

It simplifies the imports, but it makes a very busy single line or two which often have merge conflicts when multiple people edit the same file around the same time.

These merge conflicts are not so hard to resolve and it's not a big problem, but I've kinda been thinking that leaving the imports over multiple lines reduces the amount of merge conflicts.

Perhaps we could make guidelines as to what should be imported from crate, and what should be imported from a more specific path?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, sorry about that.

Perhaps we could make guidelines as to what should be imported from crate, and what should be imported from a more specific path?

Hmm ... 🤔
I'll try to find a similar guideline in other projects.

src/gil.rs Outdated
pub unsafe fn register_borrowed(_py: Python, obj: NonNull<ffi::PyObject>) -> &PyAny {
let pool = POOL.get_or_init();
&*(pool.borrowed.push_back(obj) as *const _ as *const PyAny)
pool.owned.push(obj);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Idea: could we make the object storage for owned pointers a thread_local!, rather than using the 'static GILPool? Might be even faster and avoid potential threading issues (might even resolve #756).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're going to support Python on embedded platforms (not sure), we can't use thread_local, however, unless we currently support embedded platforms, I have no issues requiring thread_local support.

Some other thing we should think about, how does it interact with futures trying to move between threads (just make them non-Send?), also, does it work with scoped threads?

Copy link
Contributor

@programmerjake programmerjake May 1, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, does this PR force types like Py<PyAny> to be non-Sendable?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're going to support Python on embedded platforms (not sure), we can't use thread_local, however, unless we currently support embedded platforms, I have no issues requiring thread_local support.

Oh really? Have you some docs on that? I never saw this issue with thread_local before.

Regarding all the Send interactions - I think even if PyAny is not Send, we can explicitly impl Send for Py<T>, right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're going to support Python on embedded platforms (not sure), we can't use thread_local, however, unless we currently support embedded platforms, I have no issues requiring thread_local support.

Oh really? Have you some docs on that? I never saw this issue with thread_local before.

I didn't find docs that explicitly say that, however rust-lang/rust#62918 alludes to that:

Futures currently have a dependency on TLS for storing a pointer to Context<'_>.
https://github.com/rust-lang/rust/blob/master/src/libstd/future.rs#L116

This makes usage in no_std environments quite difficult.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

regarding embedded and TLS, rustasync/team#42 (referenced from previously mentioned issue) is much more clear.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I see; I think what you're saying is that thread_local! isn't available in no_std applications? It would be interesting to think about pyO3 support for no_std, but I think it's a long way off. Also not sure how many users might want this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case, I think we should just switch to thread_local for now, and, whenever we add no_std support, we could cfg something else in.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we make the object storage for owned pointers a thread_local!, rather than using the 'static GILPool?

Possibly we can, but I failed to implement it due to some odd errors... 🤔

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kngwyu I have just opened kngwyu#2 which implements thread_local

@kngwyu
Copy link
Member Author

kngwyu commented May 2, 2020

Some benchmark results:
PyO3-bench1
PyO3-bench2

@davidhewitt
Copy link
Member

davidhewitt commented May 2, 2020

Nice - so I think from the benchmarks:

drop_many_objects is testing ReleasePool rather than GILPool, so thread_local performance there is actually because of the cleanups I did there at the same time as thread_local.

iter_dict is testing borrowed objects performance, which again thread_local doesn't change. But as I say in #883 I think we need to change dict iteration to use owned objects (edit: sorry, wrote borrowed orginally). @kngwyu do you think we should make that change and re-run benchmarks?

@kngwyu
Copy link
Member Author

kngwyu commented May 2, 2020

do you think we should make that change and re-run benchmarks?

I think we should do that in another PR.

@davidhewitt
Copy link
Member

I think we should do that in another PR.

Ok. I'm prepared to write a follow up PR later to change to owned objects. 👍

For now I have one further benchmark for you for iter_dict:

  • Using borrowed objects (master): ~3500ms
  • Using borrowed objects (current): ~1500ms
  • Using owned objects + this PR: ~2350ms
  • Using owned objects + PyAny<'py>: ~2150ms

My conclusions:

  • this branch is much faster than master even with owned objects, so I really think we should use owned objects for dict iteration :)
  • PyAny<'py> is not much faster than this branch, even with owned objects where PyAny<'py> is supposed to be at it's best. So I think that for now, we should use this branch and forget PyAny<'py>? Can always look again at PyAny<'py> in the future if we really want to optimise as much as possible...

(aside, if the scale on your chart is really about ~250 ms then your computer is 10x faster than mine - you using a desktop machine? The benchmark is my laptop using battery power, so worst case measurement 😄 )

@@ -268,7 +262,7 @@ impl PyObject {
impl AsPyRef for PyObject {
type Target = PyAny;
fn as_ref<'p>(&'p self, _py: Python<'p>) -> &'p PyAny {
unsafe { &*(self as *const _ as *const PyAny) }
unsafe { &*(self.as_ptr() as *const PyAny) }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure, maybe can just be

Suggested change
unsafe { &*(self.as_ptr() as *const PyAny) }
unsafe { &*self.as_ptr() }

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks we can't do so...? 🤔

@kngwyu
Copy link
Member Author

kngwyu commented May 3, 2020

you using a desktop machine?

I'm using a desktop machine with Core i5-6500 + Arch Linux 5.6.4.

@kngwyu
Copy link
Member Author

kngwyu commented May 3, 2020

PyAny<'py> is not much faster than this branch, even with owned objects where PyAny<'py> is supposed to be at it's best.

Interesting. Thank you for your benchmark.

Since now our owned pool is thread-safe, it seems good for me to keep &PyAny APIs for a while.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Why object creation is slow in PyO3 and how to enhance it
3 participants