Skip to content

feat(transaction): support transform-based sort orders#2765

Open
viirya wants to merge 1 commit into
apache:mainfrom
viirya:feat/sort-order-transform
Open

feat(transaction): support transform-based sort orders#2765
viirya wants to merge 1 commit into
apache:mainfrom
viirya:feat/sort-order-transform

Conversation

@viirya

@viirya viirya commented Jul 3, 2026

Copy link
Copy Markdown
Member

What changes are included in this PR?

ReplaceSortOrderAction only supported sorting by a column's raw value (an implicit identity transform) — asc/desc hardcoded Transform::Identity and only accepted a column name. There was no way to declare a sort order using a transform (bucket[N], year, truncate[W], etc.), which Java's SortOrderBuilder.asc/desc(Term, NullOrder) supports.

Adds asc_with_transform / desc_with_transform:

tx.replace_sort_order()
    .asc_with_transform("event_time", Transform::Year, NullOrder::First)
    .desc_with_transform("id", Transform::Bucket(16), NullOrder::Last);

asc/desc are unchanged (same signature, same behavior) — they now delegate to the new methods with Transform::Identity, so no existing caller is affected.

Whether a transform is valid for the source column's type is checked at commit time (via the existing SortOrder::builder().build(schema)check_compatibility), matching the timing of Java's SortOrder.Builder.build() — an incompatible transform is only rejected once the table schema is available, not at the asc_with_transform call site.

Scope

This only extends the metadata-declaration API — which sort order (including which transform) can be recorded on a table. Whether the write path actually sorts a RecordBatch to match a table's declared sort order is a separate, pre-existing gap: no writer in crates/iceberg/src/writer/ reads SortOrder or sorts data today, for any sort order, including the pre-existing identity case. This PR does not touch that; it only makes a wider set of sort orders declarable.

Are these changes tested?

  • test_replace_sort_order_with_transform — the builder records the requested transform per field.
  • test_replace_sort_order_with_transform_commits — end-to-end commit; the resulting TableUpdate::AddSortOrder's SortField carries the requested transform.
  • test_replace_sort_order_rejects_incompatible_transformTransform::Year on a long column is rejected at commit time with ErrorKind::Unexpected, exercising the existing check_compatibility path.
  • Existing test_replace_sort_order (plain asc/desc) passes unchanged.

Full iceberg lib suite passes; clippy and rustfmt clean.

ReplaceSortOrderAction only supported sorting by a column's raw value:
`asc`/`desc` hardcoded Transform::Identity and only accepted a column name.
Java's SortOrderBuilder.asc/desc accept a Term, which can be a transform
expression (bucket[N], year, truncate[W], ...).

Add asc_with_transform / desc_with_transform. asc/desc are unchanged (same
signature and behavior) and now delegate to the new methods with
Transform::Identity, so no existing caller is affected.

Transform-compatibility with the source column's type is checked at commit
time via the existing SortOrder::builder().build(schema) -> check_compatibility
path, matching the timing of Java's SortOrder.Builder.build() -- the spec-level
type was already general (SortField.transform, check_compatibility already
validate arbitrary transforms); only the transaction-layer API was narrow.

Scope: this only extends the metadata-declaration API. Whether the write path
sorts data to match a table's declared sort order -- for any transform,
including the pre-existing identity case -- is a separate, pre-existing gap
untouched here.

Closes apache#2764
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant