feat(python/sedonadb): add DataFrame.rename by jiayuasu · Pull Request #878 · apache/sedona-db

jiayuasu · 2026-05-26T07:05:48Z

Continues Phase P2 of #791 with DataFrame.rename. Same shape as drop from #871 — small schema op, strings only, varargs-style (well, dict-style) API instead of pandas' columns= keyword.

API

df.rename({"a": "x"})
df.rename({"a": "x", "b": "y"})

Single dict[str, str] arg. No columns= kwarg.
Direction is {old: new} — matches Polars and the inner shape of pandas.rename(columns={...}). Not Ibis's kwarg-flipped style.
Strings only for both keys and values; no Expr.
Multi-rename in one call, applied as a single plan transformation.

Validation

All Python-side, locked with exact-message tests where the message is the feature being verified:

Empty dict → ValueError.
Non-dict arg / non-str key or value → TypeError.
Unknown old-name → KeyError listing available columns. Forced by DataFusion's with_column_renamed being permissive (same trap as drop_columns).
Final-state collisions ({"a": "z", "b": "z"} or {"a": "b"} when b already exists) → ValueError("duplicate column names").

Swap-pair behavior (worth flagging)

{"a": "b", "b": "a"} has a unique final schema [b, a] so the Python-side collision check passes. But DataFusion applies renames sequentially, and the intermediate state after a→b collides with the original b. The error surfaces as SedonaError from plan-build with message Projections require unique expression names....

I considered building a rename-graph and detecting cycles Python-side, but the DataFusion error is correct, clear-enough, and trying to reorder Python-side would either reimplement DataFusion's check or miss edge cases. Locked in test_rename_swap_pair_raises_at_plan_build. Users wanting a swap route through a temporary name explicitly.

Tests

13 in tests/expr/test_dataframe_rename.py:

Positive: single, multi, column-order preservation.
Lazy return: isinstance(out, DataFrame).
Errors: empty dict; non-dict; non-str key; non-str value; columns= kwarg; unknown old-name (exact KeyError message); rename-onto-existing collision; new-name-to-new-name collision; sequential-application swap.

Local: 13 unit + 20 doctests + ruff format + ruff check all clean.

Pandas-style column rename on the lazy DataFrame, following the single-dict / `{old: new}` direction confirmed in the design discussion. API: df.rename({"a": "x"}) df.rename({"a": "x", "b": "y"}) - Single `dict[str, str]` arg, no `columns=` kwarg (Python's standard unexpected-keyword TypeError covers misuse). - Direction is `{old: new}` — matches Polars and the inner shape of pandas' `rename(columns={...})`. Not the Ibis kwarg-flipped style. - Strings only for both keys and values. Validation (all Python-side, locked with exact-message tests where the message is the feature): - Empty dict → `ValueError`. - Non-dict arg → `TypeError`. - Non-str key or value → `TypeError`. - Unknown old-name → `KeyError` listing available columns. Forced by DataFusion's `with_column_renamed` being permissive — it silently no-ops on an unknown name, hiding typos. Same Python-side guard pattern as `drop` from apache#871. - Final-state collisions (e.g. `{"a": "z", "b": "z"}` or renaming onto an already-present column) → `ValueError("duplicate column names")`. - Two-cycle swaps (e.g. `{"a": "b", "b": "a"}`) have a unique final schema but DataFusion applies renames sequentially and the intermediate state collides. Surfaces as `SedonaError` from plan-build; locked by a test rather than caught Python-side. Rust side: `InternalDataFrame::rename` folds DataFusion's per-pair `with_column_renamed` over the mapping. Step-by-step comments explain why we don't try to be cleverer than DataFusion's sequential application — the per-step uniqueness check is exactly what prevents the swap case, and trying to reorder Python-side would either reimplement the check or miss edge cases. Tests: 13 covering single/multi/order-preservation, lazy return, each error path with pinned messages, the kwarg rejection, and the sequential-application contract for swaps.

paleolimbot

My preference would be to skip this one for now...it can be replicated with a one liner as a workaround (df.select(*[col(k).alias(v) for k, v in mapping.items()])) and there is more important APIs to surface like grouping, aggregation, join, and UDFs.

I also very much dislike the Pandas rename syntax (rename(a="b") is easier to type for those of us still typing this stuff)

jiayuasu · 2026-05-26T22:30:31Z

My preference would be to skip this one for now...it can be replicated with a one liner as a workaround (df.select(*[col(k).alias(v) for k, v in mapping.items()])) and there is more important APIs to surface like grouping, aggregation, join, and UDFs.

I also very much dislike the Pandas rename syntax (rename(a="b") is easier to type for those of us still typing this stuff)

Fine by me. Moving on to the next operator then.

github-actions Bot requested a review from zhangfengcdt May 26, 2026 07:17

paleolimbot reviewed May 26, 2026

View reviewed changes

jiayuasu closed this May 26, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(python/sedonadb): add DataFrame.rename#878

feat(python/sedonadb): add DataFrame.rename#878
jiayuasu wants to merge 1 commit into
apache:mainfrom
jiayuasu:feature/df-rename

jiayuasu commented May 26, 2026

Uh oh!

paleolimbot left a comment •

edited

Loading

Uh oh!

jiayuasu commented May 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jiayuasu commented May 26, 2026

API

Validation

Swap-pair behavior (worth flagging)

Tests

Uh oh!

paleolimbot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jiayuasu commented May 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

paleolimbot left a comment •

edited

Loading