Skip to content

Conversation

kosiew
Copy link
Contributor

@kosiew kosiew commented Sep 26, 2025

Which issue does this PR close?


Rationale for this change

Several Python-facing PyO3 wrappers in this crate exposed APIs that required mutable borrows (e.g. &mut self) even though the underlying Rust objects are shared via Arc and intended for concurrent / re-entrant use from Python. When Python code holds onto those objects or uses them from multiple threads, PyO3 must hand out a PyRefMut, which triggers runtime "Already borrowed" panics when re-entrancy or cross-thread access occurs.

To make these thin wrappers behave like SessionContext (which is already #[pyclass(frozen)] and safe to share concurrently), this PR marks many #[pyclass] types as frozen and replaces exposed mutable borrows with methods that use interior mutability (Arc<parking_lot::RwLock<_>> / Arc<parking_lot::Mutex<_>>) for the small mutable state those types actually need.

This addresses the root cause of runtime borrow errors and allows Python code to keep references to wrappers and call methods concurrently across threads.


What changes are included in this PR?

High-level summary

  • Add parking_lot (lightweight locking primitives) to Cargo.toml and Cargo.lock and use parking_lot::{RwLock, Mutex} to implement interior mutability for Python-exposed wrappers.
  • Most #[pyclass(...)] definitions are updated to #[pyclass(..., frozen)] so PyO3 will not require PyRefMut for access.
  • Replace &mut self method signatures with &self where the method no longer needs exclusive borrow, and use internal locks to mutate internal state.
  • Provide getters/setters for previously public #[pyo3(get, set)] fields that were replaced with interior-mutable fields (examples: SqlSchema now exposes getters/setters that lock and clone/replace contents).
  • Rework the CaseBuilder wrapper to use Arc<Mutex<Option<CaseBuilder>>> and adopt a take/restore pattern so multiple calls (including calls that return errors) preserve the builder's internal state while making the wrapper safe for concurrent use.
  • Change PyConfig to hold Arc<RwLock<ConfigOptions>> and modify get, set, get_all, and __repr__ to use read/write locks appropriately.
  • Change PyDataFrame caching to use Arc<Mutex<Option<(Vec<RecordBatch>, bool)>>> and update repr/html methods to use interior locks. Methods that produced &mut self are now &self.
  • Make PyRecordBatchStream::next / __next__ take &self instead of &mut self.
  • Adjust a large number of thin wrappers and enums to be frozen to ensure the wrappers can be shared safely from Python.

Notable changed files (non-exhaustive)

  • Cargo.toml, Cargo.lock — add parking_lot = "0.12".
  • python/tests/test_concurrency.py — new concurrency tests exercising SqlSchema, Config, and DataFrame from multiple threads.
  • python/tests/test_expr.py — updates and additions to test CaseBuilder behavior and avoid boolean literal linter issue.
  • src/common/schema.rsSqlSchema now uses Arc<RwLock<...>> for name, tables, views, and functions and provides getters/setters.
  • src/config.rsPyConfig holds Arc<RwLock<ConfigOptions>>, methods updated.
  • src/dataframe.rs — internal caching refactored to Arc<Mutex<...>>, repr methods made &self.
  • src/expr/conditional_expr.rsPyCaseBuilder changed to Arc<Mutex<Option<CaseBuilder>>> and methods adapted to take/store the builder safely; case/when constructors updated.
  • src/functions.rscase and when functions return the new PyCaseBuilder wrapper via into().
  • Many other src/*.rs files — #[pyclass(..., frozen)] added to many thin wrappers and enums so they no longer require &mut self usage in Python.

Behavioral notes

  • CaseBuilder API: when, otherwise, and end now return PyDataFusionResult and preserve the builder state on both success and failure (the builder is stored back into the wrapper after an attempted operation). This preserves the semantics that Python code can reuse the builder and that errors don't irreversibly consume the builder.

Are these changes tested?

  • Yes: a new Python test (python/tests/test_concurrency.py) exercises several wrappers concurrently across threads to reproduce the race/borrow conditions and assert expected behavior. Additional tests for case builder correctness and state preservation were added/updated in python/tests/test_expr.py.

  • Please run the full test suite (Rust + Python tests) in CI. I recommend running maturin/pytest for the python bindings and cargo test for Rust tests locally or in CI.


Are there any user-facing changes?

  • API surface: The public Python API remains compatible at the call-site level for typical consumers — method names and signatures are preserved for end users. Some methods that previously required a mutable borrow at the PyO3 layer now accept &self from Python code; this is a transparent improvement for callers.

  • Potential incompatibilities: Marking classes #[pyclass(frozen)] changes how PyO3 exposes attribute mutation at the Python attribute level. Any user code that relied on obtaining a mutable reference (PyRefMut) and mutating the wrapper directly (rather than using the provided setters/methods) may no longer work. The intended mutation points are now exposed via explicit setters/methods (for example SqlSchema.set_name, SqlSchema.set_tables, etc.). Please review the PR for any specific wrappers your code depends on and adjust to call the explicit setters or APIs provided.


Risk & compatibility

  • The changes are focused and low-level: they replace external mutation with interior mutability and mark wrappers as frozen.

  • Risk areas that need careful review:

    • Ensure no long-lived locks are held across Python callbacks to avoid deadlocks or blocking Python threads.
    • Ensure getters return clones (not references) to avoid holding locks while Python code touches returned objects.
    • Confirm error semantics for CaseBuilder remain intuitive; errors should not permanently consume builder state.
    • Ensure no accidental lifetime/ownership regressions introduced by switching to Arc<...> wrappers.

Notes for reviewers

  • Focus review on the concurrent patterns (places that lock RwLock/Mutex) and ensure we don't hold locks while calling back into Python or performing expensive operations.
  • Verify CaseBuilder take/restore logic correctly preserves state on both success and failure paths.
  • Verify every #[pyclass(frozen)] change does not break a previously intended API (particularly for types previously annotated with #[pyo3(get, set)]). If a previously writable attribute was necessary, confirm the PR provides an explicit setter or alternate API.
  • Check that get/get_all/set on PyConfig behave as before and that conversion to/from Python objects remains correct.
  • The PR uses parking_lot (no poisoning semantics, faster locking). Confirm this dependency is acceptable for the project and CI builds. For context, Datafusion already uses parking_lot.
  • The PR intentionally keeps public Python APIs stable while preventing PyO3 borrow errors. If maintainers prefer a different interior-mutation primitive (e.g., std::sync::{Arc, Mutex}) we can adjust, but parking_lot offers simpler ergonomics and avoids poisoning semantics.

@kosiew kosiew marked this pull request as ready for review September 26, 2025 15:28
@kosiew
Copy link
Contributor Author

kosiew commented Sep 28, 2025

Cherry pick from #1252

  • Added frozen to PyO3 classes exposed via #[pyclass], including:
    • Core structs (e.g. RawCatalog, RawSchema, RawTable, DFSchema, etc.)
    • Logical and physical plan nodes (LogicalPlan, ExecutionPlan, and all logical expressions)
    • Enum types (e.g. RexType, SqlType, PythonType, TableType, etc.)
    • Configuration types (SessionConfig, SQLOptions, RuntimeEnvBuilder, etc.)
    • UDFs, UDAFs, UDWFs, and UDTFs
    • Substrait components and storage contexts
    • Miscellaneous classes like Signature, Dialect, and Unparser

Updated PR description

@kosiew kosiew changed the title Make PyO3 wrappers thread-safe (parking_lot + frozen #[pyclass]) Freeze PyO3 wrappers & introduce interior mutability to avoid PyO3 borrow errors Sep 29, 2025
- Added CaseBuilderHandle guard that keeps the underlying CaseBuilder alive while holding the mutex and restores it on drop
- Updated when, otherwise, and end methods to operate through the guard and consume the builder explicitly
- This prevents transient None states during concurrent access and improves thread safety
…ency

- Released Config read guard before converting values to Python objects in get and get_all
- Ensures locks are held only while collecting scalar entries, not during expensive Python object conversion
- Added regression test that runs Config.get_all and Config.set concurrently to guard against read/write contention regressions
- Improves overall performance by reducing lock contention in multi-threaded scenarios
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Mark pyclasses frozen if possible.
2 participants