Remove deferred reference count increments and make the global reference pool optional #4095

adamreichold · 2024-04-18T17:18:03Z

While this started out as an optimization to fully avoid the overhead of the global reference pool, #4105 lead us to the conclusion that deferring reference count increments is fundamentally unsound and so has to be removed from the global reference pool in any configuration. As a consequence, the pool will be limited to handling reference count decrements and the Clone implementation for types Py will panic if called without the GIL being held.

Since the new behaviour of Clone is significant hazard to existing code but unavoidable for a sound API, we decided to hide it behind the new py-clone feature which needs to be explicitly enabled thereby giving downstream projects a chance to inspect their usage of Clone before committing to the new possibilities of panics.

Closes #4105

I don't think leaking references in Drop is nice because it is quite easy for this to be called without the GIL being held, especially in embedding scenarios. But it should be safe nevertheless. It might call for an inverted feature to explicitly opt out of the reference pool by enabling e.g. disable-reference-pool.

codspeed-hq · 2024-04-18T17:30:26Z

CodSpeed Performance Report

Merging #4095 will improve performances by 10.33%

_{Comparing opt-ref-pool (04f0f05) with main (444be3b)}

Summary

⚡ 1 improvements
✅ 68 untouched benchmarks

Benchmarks breakdown

	Benchmark	`main`	`opt-ref-pool`	Change
⚡	`sequence_from_tuple`	296.7 ns	268.9 ns	+10.33%

alex · 2024-04-18T17:33:09Z

Why print vs. panic in drop?

adamreichold · 2024-04-18T17:40:12Z

Why print vs. panic in drop?

It is quite hard to avoid dropping without the GIL, especially in embedding situations. And in contrast to Clone, leaking is sound.

adamreichold · 2024-04-18T17:41:47Z

That said, I would also be fine with panicking in Drop and put the onus of avoiding it on the extension author as this is already a bit of a "I-don't-care-just-make-it-fast" knob.

davidhewitt · 2024-04-18T18:27:48Z

I quite like the idea of this feature. I wonder, maybe instead of leaking, without the pool enabled the drop & clone code can just unconditionally call Python::with_gil?

That way the program is correct whether or not the feature is enabled, but users get to control this knob according to the performance characteristics of their program.

davidhewitt · 2024-04-18T18:29:31Z

(and hopefully with nogil in the long term the reference pool feature might just then become deprecated for better options)

adamreichold · 2024-04-18T18:38:15Z

without the pool enabled the drop & clone code can just unconditionally call Python::with_gil?

Isn't this much too prone to deadlocks?

davidhewitt · 2024-04-18T18:40:45Z

Ungh, yes that's true. With nogil what I suggest would be viable. 😭

Icxolu · 2024-04-18T19:25:37Z

Could we provide the Clone impl for Py only under that feature flag? That would eliminate the panic and make it a compile error. Cloning would still be possible by going through Bound, with the guarantee that the gil is held.

adamreichold · 2024-04-18T19:45:40Z

Could we provide the Clone impl for Py only under that feature flag? That would eliminate the panic and make it a compile error. Cloning would still be possible by going through Bound, with the guarantee that the gil is held.

The problem I see is that you might want to still clone Py values to e.g. fulfil trait requirements of Rust types but ensure by context that you have the GIL.

davidhewitt

Yes, I think it's extremely handy to be able to #[derive(Clone)] in particular, which would require Py<T>: Clone.

Also the direction seems to be that panic-on-drop is not allowed (e.g. rust-lang/rfcs#3288) so +1 for not panicking. But I would be open to aborting instead of printing :)

Pending a decision on what to do with Drop I think the idea here gets a 👍 from me as a lever for users to pull when they know their application better than us (i.e. it never clones or drops Python values without the GIL held).

We should document this in the features and performance sections of the guide before merging, though.

src/gil.rs

adamreichold · 2024-04-19T11:53:30Z

Added a section in the performance chapter of the guide. @inducer @matthiasdiener As this was originally reported by you, what do you think about the feature and its explanation in the guide?

Icxolu · 2024-04-19T17:30:31Z

The problem I see is that you might want to still clone Py values to e.g. fulfil trait requirements of Rust types but ensure by context that you have the GIL.

Yes, I think it's extremely handy to be able to #[derive(Clone)] in particular, which would require Py<T>: Clone.

Hmm, I can definitely see that point, but in that case I feel like having this as opt-out of "safe" behavior, rather than opt-in to "i-know-what-im-doing" is not ideal, since this has "spooky" side effects. Starting with no-default-features, or turning them off later and forgetting to turn this one back on, will lead to very different behavior, which won't be noticed until runtime. Arguably this is probably not a very common scenario currently, but it feels like an easy footgun nevertheless.

adamreichold · 2024-04-19T18:10:05Z

Hmm, I can definitely see that point, but in that case I feel like having this as opt-out of "safe" behavior, rather than opt-in to "i-know-what-im-doing" is not ideal, since this has "spooky" side effects. Starting with no-default-features, or turning them off later and forgetting to turn this one back on, will lead to very different behavior, which won't be noticed until runtime. Arguably this is probably not a very common scenario currently, but it feels like an easy footgun nevertheless.

I share that sentiment which is why I included

It might call for an inverted feature to explicitly opt out of the reference pool by enabling e.g. disable-reference-pool.

in the cover letter.

Do note one downside of this though: Rust's features are additive so there is a different kind of spooky side effect. One crate in dependency graph of an extension can enable affecting everything else. It would be bad style to enable it in a non-leaf crate, but it is possible.

So if we really want to make this hard to enable accidentally as possible, we should probably go for a raw --cfg flag enabled by wrangling compiler flags to which I would also be amenable.

But I think we should first hear from the original reporters whether they actually want this. Just dropping the PR because it does not really help them is also an option.

davidhewitt · 2024-04-21T16:10:36Z

Following up from #4105 (comment)

It seems that we may need to migrate to a strategy where Clone without the GIL will always panic. We could gate the whole Clone implementation behind a feature, given that the panic is inconvenient and users may not want to deal with the possibility of runtime crashes. (Instead they would likely need to write Clone implementations by hand.)

For Drop - I think that deferring reference count decreases is still safe (worst it can do is cause leaks).

Maybe we can move the overhead away from all PyO3 functions, and instead if there is a deferred drop we signal up a worker thread to wake, which can attempt to acquire the GIL and drop all currently pending deferred drops? Moving the overheads off every function call may be a good enough solution until GIL-less Python can give us an alternative option.

davidhewitt · 2024-04-21T16:13:46Z

(I know I already suggested this reference-counting-thread idea in #3827 (comment) and @adamreichold correctly pointed out that deferring the work is dangerous.

With what we are now seeing particularly about deferred Clone being incorrect, my thought is the reference-counting-thread idea may be viable when only used for Drop.)

adamreichold · 2024-04-21T16:31:49Z

Maybe we can move the overhead away from all PyO3 functions, and instead if there is a deferred drop we signal up a worker thread to wake, which can attempt to acquire the GIL and drop all currently pending deferred drops? Moving the overheads off every function call may be a good enough solution until GIL-less Python can give us an alternative option.

But isn't this mainly gaming the benchmarks? Our call overhead will reduce, but the actual work to be done will increase (at least there is an additional acquisition of the GIL, and on another thread to which all these objects may be cache-cold).

This could improve throughput at the cost of increase CPU usage, but only if the other threads actually release the GIL for reasonable amounts of time. And if that worker thread never actually gets the GIL, we could see unbounded memory growth.

While deferred drop is certainly not as dangerous a deferred clone, I fear we are inventing a garbage collector here. And if do decide to do that, I think we should go for established solutions, e.g. epoch-based reclamation or hazard pointers which at least have some support crates available.

Personally, I don't think we should make this work. This is adding completely additional semantics that Python itself does not (yet) support which will always be difficult to get right as long as Python does not cooperate in these memory management tasks.

So while I could live without the Clone impl, I would like to avoid making the Drop do more than leak or panic/abort. If we are unused about this, then I think adding a feature flag to control whether Clone is unavailable and Drop leaks versus Clone and Drop both panic/abort makes more sense to avoid any surprises. Adding a panicking Clone also sounds like it fits the additive feature model.

davidhewitt · 2024-04-21T17:46:01Z

But isn't this mainly gaming the benchmarks? Our call overhead will reduce, but the actual work to be done will increase (at least there is an additional acquisition of the GIL, and on another thread to which all these objects may be cache-cold).

This could improve throughput at the cost of increase CPU usage, but only if the other threads actually release the GIL for reasonable amounts of time. And if that worker thread never actually gets the GIL, we could see unbounded memory growth.

Very true. Ok, I hereby agree to drop (heh) the reference-counting-thread idea 👍

I think there's still some open question on whether we should be trying to keep the drop pool around. I feel like for sake of compatibility we can't migrate users immediately from the existing pool to leaking on drop without GIL, as silently adding memory leaks to their program may break confidence in PyO3. The abort-on-drop is similarly unappealing for migration but is at least a noisy failure mode.

Whether to leak / abort, some prior art is that pybind11 throws exceptions if objects are dropped without the GIL held - https://github.com/pybind/pybind11/blob/19a6b9f4efb569129c878b7f8db09132248fbaa1/include/pybind11/pytypes.h#L269

Given the concern about testing for leaks being hard, I'd be tempted to suggest that we add this as a disable-reference-pool feature which replaces the reference pool with aborts.

I think we are now on a course to release 0.22 ASAP, given that there are various behavioural changes afoot here.

So I propose the following for 0.22:

We get the next step of the gil-refs deprecation complete.
We gate a panicking Py::clone implementation behind a py-clone feature.
We add disable-reference-pool to abort on drop instead of pooling.
(Maybe) we also complete the changes to Python::allow_threads

adamreichold · 2024-04-21T18:52:22Z

We gate a panicking Py::clone implementation behind a py-clone feature.

I added a new commit here which removes delayed refcnt increments and adds said feature.

We add disable-reference-pool to abort on drop instead of pooling.

Added another commit which renames the feature proposed here.

(Maybe) we also complete the changes to Python::allow_threads

I would be glad if we did.

Note that the tests currently pass only with py-clone enabled and disable-reference-pool disabled. I will have to work on this, but I thought I push what I have so far to enable review of the more tricky parts of the code and especially the associated documentation.

davidhewitt · 2024-04-21T20:25:12Z

👍 I will do my best to find time to give this a full review tomorrow. Am pretty loaded with family responsibility at the moment so my opportunities to focus are a bit rare this month.

adamreichold · 2024-05-11T08:19:07Z

Any thoughts on two remaining design questions? Otherwise, this should be good to go after conflict resolution.

davidhewitt · 2024-05-11T09:52:40Z

Sorry for the delay, I got delayed by a pile of family responsibility in April which left me quite burned out.

I think a conditional compile flag is good and adding a second one for the downgrade of aborts to leaks seems ok to me too.

I somewhat wonder if we should split the py-clone feature into a separate PR, as I think that changes user experience noticeably and I worry if it is slightly lost in the discussion here. Having a new PR just with that diff and a clearer opening post with references to these threads may help users with archaeology later.

adamreichold · 2024-05-11T09:57:50Z

I think a conditional compile flag is good and adding a second one for the downgrade of aborts to leaks seems ok to me too.

So explicitly, this means we do not want to tie this to cfg(debug_assertions) and I will rather make this leak by default and add another flag disable_pyo3_reference_pool_and_abort_on_drop which will upgrade the leaks to aborts?

I somewhat wonder if we should split the py-clone feature into a separate PR, as I think that changes user experience noticeably and I worry if it is slightly lost in the discussion here. Having a new PR just with that diff and a clearer opening post with references to these threads may help users with archaeology later.

There are somewhat functionally intertwined, so I would glad if I could instead mitigate this by updating the cover letter here to retrace the steps and decisions.

…its overhead.

…viour becomes opt-in

davidhewitt · 2024-05-11T10:19:28Z

I think a conditional compile flag is good and adding a second one for the downgrade of aborts to leaks seems ok to me too.

So explicitly, this means we do not want to tie this to cfg(debug_assertions) and I will rather make this leak by default and add another flag disable_pyo3_reference_pool_and_abort_on_drop which will upgrade the leaks to aborts?

Hmmm. I think in my mind leaking is the worst case (as it implies GC / memory overhead for the rest of program runtime, which may be hard for users to notice). So I'd prefer abort by default and the second flag changes from aborting to leaking.

If we do move ahead with #4174 then having just one flag to get to aborting seems slightly better too.

I somewhat wonder if we should split the py-clone feature into a separate PR, as I think that changes user experience noticeably and I worry if it is slightly lost in the discussion here. Having a new PR just with that diff and a clearer opening post with references to these threads may help users with archaeology later.

There are somewhat functionally intertwined, so I would glad if I could instead mitigate this by updating the cover letter here to retrace the steps and decisions.

👍 Yep that's reasonable to me!

adamreichold · 2024-05-11T10:29:32Z

Hmmm. I think in my mind leaking is the worst case (as it implies GC / memory overhead for the rest of program runtime, which may be hard for users to notice). So I'd prefer abort by default and the second flag changes from aborting to leaking.

There are now two cfg flags: pyo3_disable_reference_pool and pyo3_leak_on_drop_without_reference_pool.

👍 Yep that's reasonable to me!

I have updated the cover letter.

davidhewitt

This looks good to me, thanks! 🎉

alex · 2024-05-11T13:29:35Z

nox > Session check-feature-powerset aborted: some features missing from full meta feature: {'py-clone'}.

…nce count errors as long as these are available

…ble-reference-pool features

… conditional compilation flag Such a flag is harder to use and thereby also harder to abuse. This seems appropriate as this is purely a performance-oriented change which show only be enabled by leaf crates and brings with it additional highly implicit sources of process aborts.

…s when the global reference pool is disabled and the GIL is not held

adamreichold force-pushed the opt-ref-pool branch from 43d791e to fcc7253 Compare April 18, 2024 17:19

davidhewitt reviewed Apr 19, 2024

View reviewed changes

src/gil.rs Outdated Show resolved Hide resolved

src/gil.rs Show resolved Hide resolved

adamreichold force-pushed the opt-ref-pool branch from fcc7253 to 67d8a8d Compare April 19, 2024 11:51

adamreichold force-pushed the opt-ref-pool branch 2 times, most recently from d7dbf11 to 698b312 Compare April 19, 2024 12:20

adamreichold mentioned this pull request Apr 21, 2024

segmentation fault second run of rust function in python pyo3 0.21.2 #4105

Closed

adamreichold force-pushed the opt-ref-pool branch from 698b312 to f9d8ff3 Compare April 21, 2024 18:49

adamreichold force-pushed the opt-ref-pool branch 2 times, most recently from f288495 to 29f89de Compare April 21, 2024 20:16

adamreichold force-pushed the opt-ref-pool branch 2 times, most recently from 992e295 to 6d46508 Compare April 21, 2024 20:31

davidhewitt mentioned this pull request May 11, 2024

add flag to skip reference pool mutex if the program doesn't use the pool #4174

Closed

adamreichold added 2 commits May 11, 2024 12:00

Add feature controlling the global reference pool to enable avoiding …

d50071b

…its overhead.

Document reference-pool feature in the performance guide.

6a374b0

adamreichold changed the title ~~RFC: Add feature controlling the global reference pool to enable avoiding its overhead.~~ Remove deferred reference count increments and make the global reference pool optional May 11, 2024

Invert semantics of feature to disable reference pool so the new beha…

dc846ff

…viour becomes opt-in

adamreichold force-pushed the opt-ref-pool branch 2 times, most recently from 44276ee to d7901a1 Compare May 11, 2024 10:28

adamreichold force-pushed the opt-ref-pool branch from d7901a1 to fb37270 Compare May 11, 2024 10:30

davidhewitt approved these changes May 11, 2024

View reviewed changes

adamreichold added this pull request to the merge queue May 11, 2024

github-merge-queue bot removed this pull request from the merge queue due to failed status checks May 11, 2024

adamreichold added 7 commits May 11, 2024 16:01

Remove delayed reference count increments as we cannot prevent refere…

df81e9a

…nce count errors as long as these are available

Adjust tests to be compatible with disable-reference-pool feature

207c3d7

Adjust tests to be compatible with py-clone feature

30daeb7

Adjust the GIL benchmark to the updated reference pool semantics.

8e3aca6

Further extend and clarify the documentation of the py-clone and disa…

69f5d93

…ble-reference-pool features

Add pyo3_leak_on_drop_without_reference_pool to turn aborts into leak…

04f0f05

…s when the global reference pool is disabled and the GIL is not held

adamreichold force-pushed the opt-ref-pool branch from fb37270 to 04f0f05 Compare May 11, 2024 14:01

adamreichold enabled auto-merge May 11, 2024 14:01

adamreichold added this pull request to the merge queue May 11, 2024

Merged via the queue into main with commit c5f9001 May 11, 2024
43 checks passed

adamreichold deleted the opt-ref-pool branch May 11, 2024 15:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove deferred reference count increments and make the global reference pool optional #4095

Remove deferred reference count increments and make the global reference pool optional #4095

adamreichold commented Apr 18, 2024 •

edited

codspeed-hq bot commented Apr 18, 2024 •

edited

alex commented Apr 18, 2024

adamreichold commented Apr 18, 2024

adamreichold commented Apr 18, 2024

davidhewitt commented Apr 18, 2024

davidhewitt commented Apr 18, 2024

adamreichold commented Apr 18, 2024

davidhewitt commented Apr 18, 2024

Icxolu commented Apr 18, 2024

adamreichold commented Apr 18, 2024

davidhewitt left a comment

adamreichold commented Apr 19, 2024

Icxolu commented Apr 19, 2024

adamreichold commented Apr 19, 2024

davidhewitt commented Apr 21, 2024

davidhewitt commented Apr 21, 2024

adamreichold commented Apr 21, 2024

davidhewitt commented Apr 21, 2024

adamreichold commented Apr 21, 2024 •

edited

davidhewitt commented Apr 21, 2024

adamreichold commented May 11, 2024

davidhewitt commented May 11, 2024

adamreichold commented May 11, 2024

davidhewitt commented May 11, 2024

adamreichold commented May 11, 2024

davidhewitt left a comment

alex commented May 11, 2024

Remove deferred reference count increments and make the global reference pool optional #4095

Remove deferred reference count increments and make the global reference pool optional #4095

Conversation

adamreichold commented Apr 18, 2024 • edited

codspeed-hq bot commented Apr 18, 2024 • edited

CodSpeed Performance Report

Merging #4095 will improve performances by 10.33%

Summary

Benchmarks breakdown

alex commented Apr 18, 2024

adamreichold commented Apr 18, 2024

adamreichold commented Apr 18, 2024

davidhewitt commented Apr 18, 2024

davidhewitt commented Apr 18, 2024

adamreichold commented Apr 18, 2024

davidhewitt commented Apr 18, 2024

Icxolu commented Apr 18, 2024

adamreichold commented Apr 18, 2024

davidhewitt left a comment

Choose a reason for hiding this comment

adamreichold commented Apr 19, 2024

Icxolu commented Apr 19, 2024

adamreichold commented Apr 19, 2024

davidhewitt commented Apr 21, 2024

davidhewitt commented Apr 21, 2024

adamreichold commented Apr 21, 2024

davidhewitt commented Apr 21, 2024

adamreichold commented Apr 21, 2024 • edited

davidhewitt commented Apr 21, 2024

adamreichold commented May 11, 2024

davidhewitt commented May 11, 2024

adamreichold commented May 11, 2024

davidhewitt commented May 11, 2024

adamreichold commented May 11, 2024

davidhewitt left a comment

Choose a reason for hiding this comment

alex commented May 11, 2024

adamreichold commented Apr 18, 2024 •

edited

codspeed-hq bot commented Apr 18, 2024 •

edited

adamreichold commented Apr 21, 2024 •

edited