feat(transaction): support transform-based sort orders#2765
Open
viirya wants to merge 1 commit into
Open
Conversation
ReplaceSortOrderAction only supported sorting by a column's raw value: `asc`/`desc` hardcoded Transform::Identity and only accepted a column name. Java's SortOrderBuilder.asc/desc accept a Term, which can be a transform expression (bucket[N], year, truncate[W], ...). Add asc_with_transform / desc_with_transform. asc/desc are unchanged (same signature and behavior) and now delegate to the new methods with Transform::Identity, so no existing caller is affected. Transform-compatibility with the source column's type is checked at commit time via the existing SortOrder::builder().build(schema) -> check_compatibility path, matching the timing of Java's SortOrder.Builder.build() -- the spec-level type was already general (SortField.transform, check_compatibility already validate arbitrary transforms); only the transaction-layer API was narrow. Scope: this only extends the metadata-declaration API. Whether the write path sorts data to match a table's declared sort order -- for any transform, including the pre-existing identity case -- is a separate, pre-existing gap untouched here. Closes apache#2764
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What changes are included in this PR?
ReplaceSortOrderActiononly supported sorting by a column's raw value (an implicit identity transform) —asc/deschardcodedTransform::Identityand only accepted a column name. There was no way to declare a sort order using a transform (bucket[N],year,truncate[W], etc.), which Java'sSortOrderBuilder.asc/desc(Term, NullOrder)supports.Adds
asc_with_transform/desc_with_transform:asc/descare unchanged (same signature, same behavior) — they now delegate to the new methods withTransform::Identity, so no existing caller is affected.Whether a transform is valid for the source column's type is checked at commit time (via the existing
SortOrder::builder().build(schema)→check_compatibility), matching the timing of Java'sSortOrder.Builder.build()— an incompatible transform is only rejected once the table schema is available, not at theasc_with_transformcall site.Scope
This only extends the metadata-declaration API — which sort order (including which transform) can be recorded on a table. Whether the write path actually sorts a
RecordBatchto match a table's declared sort order is a separate, pre-existing gap: no writer incrates/iceberg/src/writer/readsSortOrderor sorts data today, for any sort order, including the pre-existing identity case. This PR does not touch that; it only makes a wider set of sort orders declarable.Are these changes tested?
test_replace_sort_order_with_transform— the builder records the requested transform per field.test_replace_sort_order_with_transform_commits— end-to-end commit; the resultingTableUpdate::AddSortOrder'sSortFieldcarries the requested transform.test_replace_sort_order_rejects_incompatible_transform—Transform::Yearon alongcolumn is rejected at commit time withErrorKind::Unexpected, exercising the existingcheck_compatibilitypath.test_replace_sort_order(plainasc/desc) passes unchanged.Full
iceberglib suite passes; clippy and rustfmt clean.