-
Notifications
You must be signed in to change notification settings - Fork 128
Freeze PyO3 wrappers & introduce interior mutability to avoid PyO3 borrow errors #1253
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
kosiew
wants to merge
22
commits into
apache:main
Choose a base branch
from
kosiew:frozen-1250
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+774
−204
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…ock and Mutex for interior mutability
…, dataframe, and conditional expression modules
Cherry pick from #1252
Updated PR description |
#[pyclass]
)- Added CaseBuilderHandle guard that keeps the underlying CaseBuilder alive while holding the mutex and restores it on drop - Updated when, otherwise, and end methods to operate through the guard and consume the builder explicitly - This prevents transient None states during concurrent access and improves thread safety
…ency - Released Config read guard before converting values to Python objects in get and get_all - Ensures locks are held only while collecting scalar entries, not during expensive Python object conversion - Added regression test that runs Config.get_all and Config.set concurrently to guard against read/write contention regressions - Improves overall performance by reducing lock contention in multi-threaded scenarios
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
pyclass
es frozen if possible. #1250Rationale for this change
Several Python-facing PyO3 wrappers in this crate exposed APIs that required mutable borrows (e.g.
&mut self
) even though the underlying Rust objects are shared viaArc
and intended for concurrent / re-entrant use from Python. When Python code holds onto those objects or uses them from multiple threads, PyO3 must hand out aPyRefMut
, which triggers runtime "Already borrowed" panics when re-entrancy or cross-thread access occurs.To make these thin wrappers behave like
SessionContext
(which is already#[pyclass(frozen)]
and safe to share concurrently), this PR marks many#[pyclass]
types asfrozen
and replaces exposed mutable borrows with methods that use interior mutability (Arc<parking_lot::RwLock<_>>
/Arc<parking_lot::Mutex<_>>
) for the small mutable state those types actually need.This addresses the root cause of runtime borrow errors and allows Python code to keep references to wrappers and call methods concurrently across threads.
What changes are included in this PR?
High-level summary
parking_lot
(lightweight locking primitives) toCargo.toml
andCargo.lock
and useparking_lot::{RwLock, Mutex}
to implement interior mutability for Python-exposed wrappers.#[pyclass(...)]
definitions are updated to#[pyclass(..., frozen)]
so PyO3 will not requirePyRefMut
for access.&mut self
method signatures with&self
where the method no longer needs exclusive borrow, and use internal locks to mutate internal state.#[pyo3(get, set)]
fields that were replaced with interior-mutable fields (examples:SqlSchema
now exposes getters/setters that lock and clone/replace contents).CaseBuilder
wrapper to useArc<Mutex<Option<CaseBuilder>>>
and adopt a take/restore pattern so multiple calls (including calls that return errors) preserve the builder's internal state while making the wrapper safe for concurrent use.PyConfig
to holdArc<RwLock<ConfigOptions>>
and modifyget
,set
,get_all
, and__repr__
to use read/write locks appropriately.PyDataFrame
caching to useArc<Mutex<Option<(Vec<RecordBatch>, bool)>>>
and update repr/html methods to use interior locks. Methods that produced&mut self
are now&self
.PyRecordBatchStream::next
/__next__
take&self
instead of&mut self
.frozen
to ensure the wrappers can be shared safely from Python.Notable changed files (non-exhaustive)
Cargo.toml
,Cargo.lock
— addparking_lot = "0.12"
.python/tests/test_concurrency.py
— new concurrency tests exercisingSqlSchema
,Config
, andDataFrame
from multiple threads.python/tests/test_expr.py
— updates and additions to testCaseBuilder
behavior and avoid boolean literal linter issue.src/common/schema.rs
—SqlSchema
now usesArc<RwLock<...>>
forname
,tables
,views
, andfunctions
and provides getters/setters.src/config.rs
—PyConfig
holdsArc<RwLock<ConfigOptions>>
, methods updated.src/dataframe.rs
— internal caching refactored toArc<Mutex<...>>
, repr methods made&self
.src/expr/conditional_expr.rs
—PyCaseBuilder
changed toArc<Mutex<Option<CaseBuilder>>>
and methods adapted to take/store the builder safely;case
/when
constructors updated.src/functions.rs
—case
andwhen
functions return the newPyCaseBuilder
wrapper viainto()
.src/*.rs
files —#[pyclass(..., frozen)]
added to many thin wrappers and enums so they no longer require&mut self
usage in Python.Behavioral notes
CaseBuilder
API:when
,otherwise
, andend
now returnPyDataFusionResult
and preserve the builder state on both success and failure (the builder is stored back into the wrapper after an attempted operation). This preserves the semantics that Python code can reuse the builder and that errors don't irreversibly consume the builder.Are these changes tested?
Yes: a new Python test (
python/tests/test_concurrency.py
) exercises several wrappers concurrently across threads to reproduce the race/borrow conditions and assert expected behavior. Additional tests for case builder correctness and state preservation were added/updated inpython/tests/test_expr.py
.Please run the full test suite (Rust + Python tests) in CI. I recommend running
maturin
/pytest
for the python bindings andcargo test
for Rust tests locally or in CI.Are there any user-facing changes?
API surface: The public Python API remains compatible at the call-site level for typical consumers — method names and signatures are preserved for end users. Some methods that previously required a mutable borrow at the PyO3 layer now accept
&self
from Python code; this is a transparent improvement for callers.Potential incompatibilities: Marking classes
#[pyclass(frozen)]
changes how PyO3 exposes attribute mutation at the Python attribute level. Any user code that relied on obtaining a mutable reference (PyRefMut
) and mutating the wrapper directly (rather than using the provided setters/methods) may no longer work. The intended mutation points are now exposed via explicit setters/methods (for exampleSqlSchema.set_name
,SqlSchema.set_tables
, etc.). Please review the PR for any specific wrappers your code depends on and adjust to call the explicit setters or APIs provided.Risk & compatibility
The changes are focused and low-level: they replace external mutation with interior mutability and mark wrappers as
frozen
.Risk areas that need careful review:
CaseBuilder
remain intuitive; errors should not permanently consume builder state.Arc<...>
wrappers.Notes for reviewers
CaseBuilder
take/restore logic correctly preserves state on both success and failure paths.#[pyclass(frozen)]
change does not break a previously intended API (particularly for types previously annotated with#[pyo3(get, set)]
). If a previously writable attribute was necessary, confirm the PR provides an explicit setter or alternate API.get
/get_all
/set
onPyConfig
behave as before and that conversion to/from Python objects remains correct.parking_lot
(no poisoning semantics, faster locking). Confirm this dependency is acceptable for the project and CI builds. For context, Datafusion already usesparking_lot
.std::sync::{Arc, Mutex}
) we can adjust, butparking_lot
offers simpler ergonomics and avoids poisoning semantics.