Skip to content

Commit

Permalink
Remove deferred reference count increments and make the global refere…
Browse files Browse the repository at this point in the history
…nce pool optional (#4095)

* Add feature controlling the global reference pool to enable avoiding its overhead.

* Document reference-pool feature in the performance guide.

* Invert semantics of feature to disable reference pool so the new behaviour becomes opt-in

* Remove delayed reference count increments as we cannot prevent reference count errors as long as these are available

* Adjust tests to be compatible with disable-reference-pool feature

* Adjust tests to be compatible with py-clone feature

* Adjust the GIL benchmark to the updated reference pool semantics.

* Further extend and clarify the documentation of the py-clone and disable-reference-pool features

* Replace disable-reference-pool feature by pyo3_disable_reference_pool conditional compilation flag

Such a flag is harder to use and thereby also harder to abuse. This seems
appropriate as this is purely a performance-oriented change which show only be
enabled by leaf crates and brings with it additional highly implicit sources of
process aborts.

* Add pyo3_leak_on_drop_without_reference_pool to turn aborts into leaks when the global reference pool is disabled and the GIL is not held
  • Loading branch information
adamreichold authored May 11, 2024
1 parent 033caa8 commit c5f9001
Show file tree
Hide file tree
Showing 32 changed files with 226 additions and 240 deletions.
4 changes: 4 additions & 0 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -108,6 +108,9 @@ auto-initialize = []
# Allows use of the deprecated "GIL Refs" APIs.
gil-refs = []

# Enables `Clone`ing references to Python objects `Py<T>` which panics if the GIL is not held.
py-clone = []

# Optimizes PyObject to Vec conversion and so on.
nightly = []

Expand All @@ -129,6 +132,7 @@ full = [
"num-bigint",
"num-complex",
"num-rational",
"py-clone",
"rust_decimal",
"serde",
"smallvec",
Expand Down
2 changes: 1 addition & 1 deletion examples/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -10,5 +10,5 @@ pyo3 = { path = "..", features = ["auto-initialize", "extension-module"] }
[[example]]
name = "decorator"
path = "decorator/src/lib.rs"
crate_type = ["cdylib"]
crate-type = ["cdylib"]
doc-scrape-examples = true
4 changes: 3 additions & 1 deletion guide/src/class.md
Original file line number Diff line number Diff line change
Expand Up @@ -249,7 +249,7 @@ fn return_myclass() -> Py<MyClass> {

let obj = return_myclass();

Python::with_gil(|py| {
Python::with_gil(move |py| {
let bound = obj.bind(py); // Py<MyClass>::bind returns &Bound<'py, MyClass>
let obj_ref = bound.borrow(); // Get PyRef<T>
assert_eq!(obj_ref.num, 1);
Expand Down Expand Up @@ -280,6 +280,8 @@ let py_counter: Py<FrozenCounter> = Python::with_gil(|py| {
});

py_counter.get().value.fetch_add(1, Ordering::Relaxed);

Python::with_gil(move |_py| drop(py_counter));
```

Frozen classes are likely to become the default thereby guiding the PyO3 ecosystem towards a more deliberate application of interior mutability. Eventually, this should enable further optimizations of PyO3's internals and avoid downstream code paying the cost of interior mutability when it is not actually required.
Expand Down
7 changes: 5 additions & 2 deletions guide/src/faq.md
Original file line number Diff line number Diff line change
Expand Up @@ -127,12 +127,10 @@ If you don't want that cloning to happen, a workaround is to allocate the field
```rust
# use pyo3::prelude::*;
#[pyclass]
#[derive(Clone)]
struct Inner {/* fields omitted */}

#[pyclass]
struct Outer {
#[pyo3(get)]
inner: Py<Inner>,
}

Expand All @@ -144,6 +142,11 @@ impl Outer {
inner: Py::new(py, Inner {})?,
})
}

#[getter]
fn inner(&self, py: Python<'_>) -> Py<Inner> {
self.inner.clone_ref(py)
}
}
```
This time `a` and `b` *are* the same object:
Expand Down
10 changes: 10 additions & 0 deletions guide/src/features.md
Original file line number Diff line number Diff line change
Expand Up @@ -75,6 +75,14 @@ This feature is a backwards-compatibility feature to allow continued use of the

This feature and the APIs it enables is expected to be removed in a future PyO3 version.

### `py-clone`

This feature was introduced to ease migration. It was found that delayed reference counts cannot be made sound and hence `Clon`ing an instance of `Py<T>` must panic without the GIL being held. To avoid migrations introducing new panics without warning, the `Clone` implementation itself is now gated behind this feature.

### `pyo3_disable_reference_pool`

This is a performance-oriented conditional compilation flag, e.g. [set via `$RUSTFLAGS`][set-configuration-options], which disabled the global reference pool and the assocaited overhead for the crossing the Python-Rust boundary. However, if enabled, `Drop`ping an instance of `Py<T>` without the GIL being held will abort the process.

### `macros`

This feature enables a dependency on the `pyo3-macros` crate, which provides the procedural macros portion of PyO3's API:
Expand Down Expand Up @@ -195,3 +203,5 @@ struct User {
### `smallvec`

Adds a dependency on [smallvec](https://docs.rs/smallvec) and enables conversions into its [`SmallVec`](https://docs.rs/smallvec/latest/smallvec/struct.SmallVec.html) type.

[set-configuration-options]: https://doc.rust-lang.org/reference/conditional-compilation.html#set-configuration-options
9 changes: 6 additions & 3 deletions guide/src/memory.md
Original file line number Diff line number Diff line change
Expand Up @@ -212,7 +212,8 @@ This example wasn't very interesting. We could have just used a GIL-bound
we are *not* holding the GIL?

```rust
# #![allow(unused_imports)]
# #![allow(unused_imports, dead_code)]
# #[cfg(not(pyo3_disable_reference_pool))] {
# use pyo3::prelude::*;
# use pyo3::types::PyString;
# fn main() -> PyResult<()> {
Expand All @@ -239,12 +240,14 @@ Python::with_gil(|py|
# }
# Ok(())
# }
# }
```

When `hello` is dropped *nothing* happens to the pointed-to memory on Python's
heap because nothing _can_ happen if we're not holding the GIL. Fortunately,
the memory isn't leaked. PyO3 keeps track of the memory internally and will
release it the next time we acquire the GIL.
the memory isn't leaked. If the `pyo3_disable_reference_pool` conditional compilation flag
is not enabled, PyO3 keeps track of the memory internally and will release it
the next time we acquire the GIL.

We can avoid the delay in releasing memory if we are careful to drop the
`Py<Any>` while the GIL is held.
Expand Down
13 changes: 11 additions & 2 deletions guide/src/migration.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,16 @@ fn increment(x: u64, amount: Option<u64>) -> u64 {
x + amount.unwrap_or(1)
}
```
</details>

### `Py::clone` is now gated behind the `py-clone` feature
<details open>
<summary><small>Click to expand</small></summary>
If you rely on `impl<T> Clone for Py<T>` to fulfil trait requirements imposed by existing Rust code written without PyO3-based code in mind, the newly introduced feature `py-clone` must be enabled.

However, take care to note that the behaviour is different from previous versions. If `Clone` was called without the GIL being held, we tried to delay the application of these reference count increments until PyO3-based code would re-acquire it. This turned out to be impossible to implement in a sound manner and hence was removed. Now, if `Clone` is called without the GIL being held, we panic instead for which calling code might not be prepared.

Related to this, we also added a `pyo3_disable_reference_pool` conditional compilation flag which removes the infrastructure necessary to apply delayed reference count decrements implied by `impl<T> Drop for Py<T>`. They do not appear to be a soundness hazard as they should lead to memory leaks in the worst case. However, the global synchronization adds significant overhead to cross the Python-Rust boundary. Enabling this feature will remove these costs and make the `Drop` implementation abort the process if called without the GIL being held instead.
</details>

## from 0.20.* to 0.21
Expand Down Expand Up @@ -676,7 +685,7 @@ drop(second);

The replacement is [`Python::with_gil`](https://docs.rs/pyo3/0.18.3/pyo3/marker/struct.Python.html#method.with_gil) which is more cumbersome but enforces the proper nesting by design, e.g.

```rust
```rust,ignore
# #![allow(dead_code)]
# use pyo3::prelude::*;
Expand All @@ -701,7 +710,7 @@ let second = Python::with_gil(|py| Object::new(py));
drop(first);
drop(second);
// Or it ensure releasing the inner lock before the outer one.
// Or it ensures releasing the inner lock before the outer one.
Python::with_gil(|py| {
let first = Object::new(py);
let second = Python::with_gil(|py| Object::new(py));
Expand Down
44 changes: 44 additions & 0 deletions guide/src/performance.md
Original file line number Diff line number Diff line change
Expand Up @@ -96,3 +96,47 @@ impl PartialEq<Foo> for FooBound<'_> {
}
}
```

## Disable the global reference pool

PyO3 uses global mutable state to keep track of deferred reference count updates implied by `impl<T> Drop for Py<T>` being called without the GIL being held. The necessary synchronization to obtain and apply these reference count updates when PyO3-based code next acquires the GIL is somewhat expensive and can become a significant part of the cost of crossing the Python-Rust boundary.

This functionality can be avoided by setting the `pyo3_disable_reference_pool` conditional compilation flag. This removes the global reference pool and the associated costs completely. However, it does _not_ remove the `Drop` implementation for `Py<T>` which is necessary to interoperate with existing Rust code written without PyO3-based code in mind. To stay compatible with the wider Rust ecosystem in these cases, we keep the implementation but abort when `Drop` is called without the GIL being held. If `pyo3_leak_on_drop_without_reference_pool` is additionally enabled, objects dropped without the GIL being held will be leaked instead which is always sound but might have determinal effects like resource exhaustion in the long term.

This limitation is important to keep in mind when this setting is used, especially when embedding Python code into a Rust application as it is quite easy to accidentally drop a `Py<T>` (or types containing it like `PyErr`, `PyBackedStr` or `PyBackedBytes`) returned from `Python::with_gil` without making sure to re-acquire the GIL beforehand. For example, the following code

```rust,ignore
# use pyo3::prelude::*;
# use pyo3::types::PyList;
let numbers: Py<PyList> = Python::with_gil(|py| PyList::empty_bound(py).unbind());
Python::with_gil(|py| {
numbers.bind(py).append(23).unwrap();
});
Python::with_gil(|py| {
numbers.bind(py).append(42).unwrap();
});
```

will abort if the list not explicitly disposed via

```rust
# use pyo3::prelude::*;
# use pyo3::types::PyList;
let numbers: Py<PyList> = Python::with_gil(|py| PyList::empty_bound(py).unbind());

Python::with_gil(|py| {
numbers.bind(py).append(23).unwrap();
});

Python::with_gil(|py| {
numbers.bind(py).append(42).unwrap();
});

Python::with_gil(move |py| {
drop(numbers);
});
```

[conditional-compilation]: https://doc.rust-lang.org/reference/conditional-compilation.html
1 change: 1 addition & 0 deletions newsfragments/4095.added.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Add `pyo3_disable_reference_pool` conditional compilation flag to avoid the overhead of the global reference pool at the cost of known limitations as explained in the performance section of the guide.
1 change: 1 addition & 0 deletions newsfragments/4095.changed.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
`Clone`ing pointers into the Python heap has been moved behind the `py-clone` feature, as it must panic without the GIL being held as a soundness fix.
12 changes: 3 additions & 9 deletions pyo3-benches/benches/bench_gil.rs
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
use codspeed_criterion_compat::{criterion_group, criterion_main, BatchSize, Bencher, Criterion};
use codspeed_criterion_compat::{criterion_group, criterion_main, Bencher, Criterion};

use pyo3::prelude::*;

Expand All @@ -9,14 +9,8 @@ fn bench_clean_acquire_gil(b: &mut Bencher<'_>) {

fn bench_dirty_acquire_gil(b: &mut Bencher<'_>) {
let obj = Python::with_gil(|py| py.None());
b.iter_batched(
|| {
// Clone and drop an object so that the GILPool has work to do.
let _ = obj.clone();
},
|_| Python::with_gil(|_| {}),
BatchSize::NumBatches(1),
);
// Drop the returned clone of the object so that the reference pool has work to do.
b.iter(|| Python::with_gil(|py| obj.clone_ref(py)));
}

fn criterion_benchmark(c: &mut Criterion) {
Expand Down
2 changes: 2 additions & 0 deletions pyo3-build-config/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -165,6 +165,8 @@ pub fn print_expected_cfgs() {
println!("cargo:rustc-check-cfg=cfg(GraalPy)");
println!("cargo:rustc-check-cfg=cfg(py_sys_config, values(\"Py_DEBUG\", \"Py_REF_DEBUG\", \"Py_TRACE_REFS\", \"COUNT_ALLOCS\"))");
println!("cargo:rustc-check-cfg=cfg(invalid_from_utf8_lint)");
println!("cargo:rustc-check-cfg=cfg(pyo3_disable_reference_pool)");
println!("cargo:rustc-check-cfg=cfg(pyo3_leak_on_drop_without_reference_pool)");

// allow `Py_3_*` cfgs from the minimum supported version up to the
// maximum minor version (+1 for development for the next)
Expand Down
2 changes: 1 addition & 1 deletion src/conversions/std/option.rs
Original file line number Diff line number Diff line change
Expand Up @@ -61,7 +61,7 @@ mod tests {
assert_eq!(option.as_ptr(), std::ptr::null_mut());

let none = py.None();
option = Some(none.clone());
option = Some(none.clone_ref(py));

let ref_cnt = none.get_refcnt(py);
assert_eq!(option.as_ptr(), none.as_ptr());
Expand Down
14 changes: 13 additions & 1 deletion src/err/err_state.rs
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,6 @@ use crate::{
Bound, IntoPy, Py, PyAny, PyObject, PyTypeInfo, Python,
};

#[derive(Clone)]
pub(crate) struct PyErrStateNormalized {
#[cfg(not(Py_3_12))]
ptype: Py<PyType>,
Expand Down Expand Up @@ -63,6 +62,19 @@ impl PyErrStateNormalized {
ptraceback: Py::from_owned_ptr_or_opt(py, ptraceback),
}
}

pub fn clone_ref(&self, py: Python<'_>) -> Self {
Self {
#[cfg(not(Py_3_12))]
ptype: self.ptype.clone_ref(py),
pvalue: self.pvalue.clone_ref(py),
#[cfg(not(Py_3_12))]
ptraceback: self
.ptraceback
.as_ref()
.map(|ptraceback| ptraceback.clone_ref(py)),
}
}
}

pub(crate) struct PyErrStateLazyFnOutput {
Expand Down
2 changes: 1 addition & 1 deletion src/err/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -837,7 +837,7 @@ impl PyErr {
/// ```
#[inline]
pub fn clone_ref(&self, py: Python<'_>) -> PyErr {
PyErr::from_state(PyErrState::Normalized(self.normalized(py).clone()))
PyErr::from_state(PyErrState::Normalized(self.normalized(py).clone_ref(py)))
}

/// Return the cause (either an exception instance, or None, set by `raise ... from ...`)
Expand Down
Loading

0 comments on commit c5f9001

Please sign in to comment.