feat(python/sedonadb): restrict DataFrame __getitem__ to single-column lookup by jiayuasu · Pull Request #852 · apache/sedona-db

jiayuasu · 2026-05-17T07:08:17Z

Post-merge follow-up to #846 addressing @paleolimbot's review feedback.

What changes

DataFrame.__getitem__ is now strictly single-column lookup:

df["x"]    # → Expr referencing column x
df[0]      # → Expr referencing the first column
df[-1]     # → Expr referencing the last column

Previously the same method also handled df[["x","y"]] (list projection) and df[bool_expr] (filter). Those are dropped. Users go through the explicit df.select(...) and df.filter(...) entry points instead.

The motivation, mirroring Ibis's same deprecation: a single return type (Expr) lets IDEs and type checkers resolve df["x"].<method> cleanly. With the old polymorphic shape, the return type was DataFrame | Expr, which broke autocomplete on the common case.

Error paths

Key	Behavior
Unknown column name	`KeyError`, message lists available columns
Out-of-range int	`IndexError`
`bool`	`TypeError` (guarded explicitly; bool is a subclass of int in Python)
`list`, `slice`, `Expr`, anything else	`TypeError` with a message pointing at `select` / `filter`

While here — import-discipline cleanup

__getitem__, select, and filter previously had lazy in-function imports of Expr, Literal, col, and _to_expr. Per the policy in this module ("lazy imports are reserved for optional dependencies like pyarrow"), those move to module level (combined) and the method annotations switch from string forward references to the runtime-imported names.

Test plan

11 tests in tests/expr/test_dataframe_getitem.py cover name / positive / negative / first / out-of-range / unknown / bool / list / slice / Expr keys plus the operator-composition path on df["x"].
All exact repr() == ... for the positive cases, per the test-policy convention locked in earlier PRs.
Existing test_dataframe_select.py and test_dataframe_filter.py still pass (35 expr-dataframe tests total).
pytest --doctest-modules dataframe.py passes (17 doctests including the updated __getitem__ example).
ruff check and ruff format clean.

@paleolimbot

Address post-merge feedback on apache#846. - Restrict `DataFrame.__getitem__` to `(key: Union[str, int]) -> Expr`. Drops the polymorphic list-projection and bool-Expr-filter forms that apache#846 added. Reasons mirror Ibis's deprecation of the same forms: a single return type keeps IDE/type-checker inference on `df["x"].<method>` clean, and the multi-column / filter cases have obvious named entry points (`select`, `filter`). - Integer indexing now resolves a column by 0-based position, with negative indexing supported. Out-of-range integers raise `IndexError`; unknown string names raise `KeyError` listing the available columns. `bool` is rejected explicitly so a stray `df[True]` doesn't silently mean `df[1]` (bool is a subclass of int in Python). List, slice, and Expr keys raise `TypeError` with messages pointing at `select` / `filter`. - Move the lazy `Expr` / `Literal` / `col` / `_to_expr` imports inside `__getitem__`, `select`, and `filter` to module level (combined). Per project policy in this module, lazy imports are reserved for optional dependencies like pyarrow. The fix while here keeps the import discipline consistent across the three Expr-aware methods rather than churning them separately later. - Update annotations on `select` and `filter` to use the new module-level names directly (no more string forward references). Tests rewritten for the new shape: 11 cases covering name and position lookup, negative indexing, operator composition, and the `KeyError` / `IndexError` / `TypeError` error paths. cc @paleolimbot

Copilot

Pull request overview

This PR narrows Python DataFrame.__getitem__ to a single-column lookup API returning Expr, aligning bracket access with predictable typing while directing projection/filtering to explicit methods.

Changes:

Restricts df[key] to string column names and integer column positions, including negative indices.
Adds explicit error handling for unknown names, out-of-range indices, bools, lists, slices, and Expr keys.
Updates tests to cover the revised lookup behavior and rejected legacy shortcut forms.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File	Description
`python/sedonadb/python/sedonadb/dataframe.py`	Updates `__getitem__`, import placement, annotations, and related `select`/`filter` type checks.
`python/sedonadb/tests/expr/test_dataframe_getitem.py`	Replaces polymorphic getitem tests with single-column lookup and error-path coverage.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

github-actions Bot requested a review from paleolimbot May 17, 2026 07:08

jiayuasu force-pushed the feature/df-getitem-restrict branch from 6f6c081 to 1a319e1 Compare May 17, 2026 07:16

jiayuasu requested a review from Copilot May 17, 2026 07:17

Copilot started reviewing on behalf of jiayuasu May 17, 2026 07:17 View session

Copilot AI reviewed May 17, 2026

View reviewed changes

jiayuasu marked this pull request as ready for review May 18, 2026 00:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(python/sedonadb): restrict DataFrame getitem to single-column lookup#852

feat(python/sedonadb): restrict DataFrame getitem to single-column lookup#852
jiayuasu wants to merge 1 commit into
apache:mainfrom
jiayuasu:feature/df-getitem-restrict

jiayuasu commented May 17, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jiayuasu commented May 17, 2026

What changes

Error paths

While here — import-discipline cleanup

Test plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants