[SPARK-56241][SQL] Derive `outputOrdering` from `KeyedPartitioning` key expressions by peter-toth · Pull Request #55036 · apache/spark

peter-toth · 2026-03-26T16:41:36Z

What changes were proposed in this pull request?

Within a KeyedPartitioning partition, all rows share the same key value, so the key expressions are trivially sorted within each partition.

This PR makes two plan nodes expose that structural guarantee via outputOrdering:

DataSourceV2ScanExecBase: when outputPartitioning is a KeyedPartitioning and the source reports no ordering via SupportsReportOrdering, derive one ascending SortOrder per key expression. When the source does report ordering, it is returned as-is.
GroupPartitionsExec:
- Non-coalescing (every group has ≤ 1 input partition): pass through child.outputOrdering unchanged.
- Coalescing without reducers: re-derive ordering from the output KeyedPartitioning key expressions; a join may embed multiple KeyedPartitionings with different expressions — expose equivalences via sameOrderExpressions.
- Coalescing with reducers: fall back to super.outputOrdering (empty), because merged partitions share only the reduced key.

Why are the changes needed?

Before this change, outputOrdering on both nodes returned an empty sequence (unless SupportsReportOrdering was implemented), even though the within-partition ordering was structurally guaranteed by the partitioning itself.
As a result, EnsureRequirements would insert a redundant SortExec before SortMergeJoin inputs that are already in key order.

Does this PR introduce any user-facing change?

Yes. Queries involving storage-partitioned joins (v2 bucketing) no longer add a redundant SortExec before SortMergeJoin when the join keys match the partition keys, reducing CPU and memory overhead.

How was this patch tested?

New unit test class GroupPartitionsExecSuite covering all four outputOrdering branches (non-coalescing, coalescing without reducers with single and multi-key, join sameOrderExpressions, coalescing with reducers).
New SQL integration tests in KeyGroupedPartitioningSuite:
- Scan with KeyedPartitioning reports key-derived outputOrdering.
- Non-coalescing GroupPartitionsExec (non-identical key sets) passes through child ordering — no pre-join SortExec.
- Coalescing GroupPartitionsExec derives ordering from key expressions — no pre-join SortExec.
Updated expected output in DataSourceV2Suite for the case where a source is partitioned by a key with no reported ordering — groupBy on the partition key no longer requires a sort.

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Sonnet 4.6

dongjoon-hyun

It's a nice improvement. I expected many generated query plan changes in the test case, but there is no change from the existing generated plan. Is there any reason, @peter-toth ?

peter-toth · 2026-03-26T17:13:20Z

It's a nice improvement. I expected many generated query plan changes in the test case, but there is no change from the existing generated plan. Is there any reason, @peter-toth ?

We don't have any prodiction ready DSv2 filesources in Spark so the generated test plans / expected outputs doesn't cover this feature either.

dongjoon-hyun · 2026-03-26T17:15:56Z

Got it~

dongjoon-hyun

+1, LGTM. Thank you, @peter-toth .

dongjoon-hyun · 2026-03-26T17:17:57Z

cc @cloud-fan , @szehon-ho , @aokolnychyi , @gengliangwang , too

peter-toth · 2026-03-26T17:20:04Z

Iceberg can benefit from the change.
I will add a follow-up improvement in the scope of SPARK-55715 to keep ordering even when we coalesce partitions, and once @anuragmantri's apache/iceberg#14948 is also merged it will be a major improvement.

peter-toth · 2026-03-26T20:23:10Z

Marked as draft for now. Let me doublecheck a few edgecases as changing the reported ordering without the concept of constant order, which would be safe to prepend to any ordering, can be problematic.

Stale review.

…xpressions ### What changes were proposed in this pull request? Within a `KeyedPartitioning` partition, all rows share the same key value, so the key expressions are trivially sorted (ascending) within each partition. This PR makes two plan nodes expose that structural guarantee via `outputOrdering`: - **`DataSourceV2ScanExecBase`**: when `outputPartitioning` is a `KeyedPartitioning` and the source reports no ordering via `SupportsReportOrdering`, derive one ascending `SortOrder` per key expression. When the source does report ordering, it is returned as-is. - **`GroupPartitionsExec`**: - *Non-coalescing* (every group has ≤ 1 input partition): pass through `child.outputOrdering` unchanged. - *Coalescing without reducers*: re-derive ordering from the output `KeyedPartitioning` key expressions; a join may embed multiple `KeyedPartitioning`s with different expressions — expose equivalences via `sameOrderExpressions`. - *Coalescing with reducers*: fall back to `super.outputOrdering` (empty), because merged partitions share only the reduced key. ### Why are the changes needed? Before this change, `outputOrdering` on both nodes returned an empty sequence (unless `SupportsReportOrdering` was implemented), even though the within- partition ordering was structurally guaranteed by the partitioning itself. As a result, `EnsureRequirements` would insert a redundant `SortExec` before `SortMergeJoin` inputs that are already in key order. ### Does this PR introduce _any_ user-facing change? Yes. Queries involving storage-partitioned joins (v2 bucketing) no longer add a redundant `SortExec` before `SortMergeJoin` when the join keys match the partition keys, reducing CPU and memory overhead. ### How was this patch tested? - New unit test class `GroupPartitionsExecSuite` covering all four `outputOrdering` branches (non-coalescing, coalescing without reducers with single and multi-key, join `sameOrderExpressions`, coalescing with reducers). - New SQL integration tests in `KeyGroupedPartitioningSuite` (SPARK-56241): - Scan with `KeyedPartitioning` reports key-derived `outputOrdering`. - Non-coalescing `GroupPartitionsExec` (non-identical key sets) passes through child ordering — no pre-join `SortExec`. - Coalescing `GroupPartitionsExec` derives ordering from key expressions — no pre-join `SortExec`. - Updated expected output in `DataSourceV2Suite` for the case where a source is partitioned by a key with no reported ordering — groupBy on the partition key no longer requires a sort. ### Was this patch authored or co-authored using generative AI tooling? Generated-by: Claude Sonnet 4.6

peter-toth · 2026-03-27T15:07:51Z

This PR now follows a safer approach and doesn't alter the reported ordering.

The failures doesn't seem related, but looks like the same we hit here as well: #55048 (comment)

szehon-ho · 2026-03-27T19:51:11Z

.../src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2ScanExecBase.scala

   */
-  override def outputOrdering: Seq[SortOrder] = ordering.getOrElse(super.outputOrdering)
+  override def outputOrdering: Seq[SortOrder] = {
+    val reportedOrdering = ordering.getOrElse(Seq.empty)


minor suggestion:

override def outputOrdering: Seq[SortOrder] = { ordering match { case Some(reported) if reported.nonEmpty => reported case _ => outputPartitioning match { case k: KeyedPartitioning => k.expressions.map(SortOrder(_, Ascending)) case _ => Seq.empty } } }

szehon-ho · 2026-03-27T19:51:30Z

.../src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2ScanExecBase.scala

+    if (reportedOrdering.nonEmpty) {
+      reportedOrdering
+    } else {
+      outputPartitioning match {


is there any way to make not calculate every time? (is it an issue?)

szehon-ho · 2026-03-27T22:12:33Z

sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/GroupPartitionsExec.scala

-      super.outputOrdering
+      // Coalescing: multiple input partitions are merged into one output partition. The child's
+      // within-partition ordering is lost due to concatenation, so we rederive ordering purely from
+      // the key expressions. A join may embed multiple `KeyedPartitioning`s (one per join side)


nit: re-derive?

also, maybe a short example would help

szehon-ho · 2026-03-27T22:14:16Z

sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/GroupPartitionsExec.scala

+          // Without reducers all merged partitions share the same original key value, so the key
+          // expressions remain constant within the output partition.
+          val keyedPartitionings = p.collect { case k: KeyedPartitioning => k }
+          keyedPartitionings.map(_.expressions).transpose.map { exprs =>


just trying to wrap my head around it, is it because its the same partition value? thats why its ordered?

szehon-ho

thanks for the patch, i think it mostly looks good. So its not just SPJ case but also 'order by', is it right? And now we don't need SupportsReportOrdering?

szehon-ho · 2026-03-27T22:21:33Z

I also feel we should gate this behind a flag, as its a new feature with certain risk

dongjoon-hyun reviewed Mar 26, 2026

View reviewed changes

dongjoon-hyun previously approved these changes Mar 26, 2026

View reviewed changes

peter-toth force-pushed the SPARK-56241-outputordering-from-keyedpartitioning branch from 7946dce to 4260f53 Compare March 26, 2026 19:39

peter-toth marked this pull request as draft March 26, 2026 20:21

peter-toth force-pushed the SPARK-56241-outputordering-from-keyedpartitioning branch from 4260f53 to f28c056 Compare March 27, 2026 10:01

peter-toth marked this pull request as ready for review March 27, 2026 15:05

peter-toth mentioned this pull request Mar 27, 2026

[SPARK-56250][SQL] Remove confusing defensive code in SortExec.rowSorter and add warning comment #55048

Open

szehon-ho reviewed Mar 27, 2026

View reviewed changes

Conversation

peter-toth commented Mar 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

dongjoon-hyun left a comment

Choose a reason for hiding this comment

Uh oh!

peter-toth commented Mar 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dongjoon-hyun commented Mar 26, 2026

Uh oh!

dongjoon-hyun left a comment

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun commented Mar 26, 2026

Uh oh!

peter-toth commented Mar 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

peter-toth commented Mar 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

peter-toth commented Mar 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

szehon-ho Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

szehon-ho Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

szehon-ho Mar 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

szehon-ho Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

szehon-ho left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

szehon-ho commented Mar 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

peter-toth commented Mar 26, 2026 •

edited

Loading

peter-toth commented Mar 26, 2026 •

edited

Loading

peter-toth commented Mar 26, 2026 •

edited

Loading

peter-toth commented Mar 26, 2026 •

edited

Loading

peter-toth commented Mar 27, 2026 •

edited

Loading

szehon-ho Mar 27, 2026 •

edited

Loading

szehon-ho left a comment •

edited

Loading