[SPARK-55535][SQL][FOLLOW-UP] Fix `OrderedDistribution` handling and minor improvements to `EnsureRequirements` by peter-toth · Pull Request #54727 · apache/spark

peter-toth · 2026-03-10T13:58:07Z

What changes were proposed in this pull request?

This is a follow-up PR to #54330 to fix OrderedDistribution handling in EnsureRequirements so as to avoid a correctness bug. The PR contains minor improvements to EnsureRequirements and configuration docs updates as well.

Why are the changes needed?

To fix a correctness bug introduced with the refactor.

Does this PR introduce any user-facing change?

Yes, but the refactor (#54330) hasn't been released.

How was this patch tested?

Added new UT.

Was this patch authored or co-authored using generative AI tooling?

No.

…other improvements to `EnsureRequirements`

peter-toth · 2026-03-10T14:12:31Z

sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/EnsureRequirements.scala

                  .map(_.asInstanceOf[Attribute])
                val keyRowOrdering = RowOrdering.create(o.ordering, attrs)
                val keyOrdering = keyRowOrdering.on((t: InternalRowComparableWrapper) => t.row)
-                val sorted = satisfyingKeyedPartitioning.partitionKeys.sorted(keyOrdering)


The bug is that sorted should be distict as well (as it was before the refactor), but after the refactor we can do better:

We can avoid adding a grouping operator entirelly when the non-grouped satisfyingKeyedPartitioning.partitionKeys satisfies the required sort order.

Or if it doesn't, then we need to add a GroupPartitionsExec operator, but we can avoid coalescing partitions in the operator with setting applyPartialClustering.

peter-toth · 2026-03-10T14:21:23Z

sql/core/src/test/scala/org/apache/spark/sql/connector/KeyGroupedPartitioningSuite.scala

+      val dfWithDuplicate = sql(s"SELECT id FROM testcat.ns.$items i ORDER BY id")
+
+      val expectedWithDuplicate = Seq(1, 2, 2, 3).map(Row(_))
+      checkAnswer(dfWithDuplicate, expectedWithDuplicate)


This is the bug test as it returned Seq(1, 2, 2, 2, 2, 3) before the fix.

peter-toth · 2026-03-10T14:24:40Z

sql/core/src/test/scala/org/apache/spark/sql/connector/KeyGroupedPartitioningSuite.scala

+        df -> Seq.empty,
+        reverseDf -> Seq(3),
+        dfWithDuplicate -> Seq.empty,
+        reverseDfWithDuplicate -> Seq(4)


This is a minor improvement compared to pre-refactor. Although we need to add GroupPartitions to reorder the 4 partitions by their key in descending order, we don't need to coalesce them into 3.

peter-toth · 2026-03-10T14:26:58Z

@cloud-fan, @dongjoon-hyun, @viirya, @szehon-ho, @chirag-s-db this is a follow-up PR to #54330.

peter-toth · 2026-03-10T15:57:03Z

sql/core/src/test/scala/org/apache/spark/sql/connector/KeyGroupedPartitioningSuite.scala

+          // shuffles or group partitions
          Seq(Row(null, 3), Row(10.0, 2), Row(15.5, null),
-            Row(15.5, 3), Row(40.0, 1), Row(41.0, 1)))
+            Row(15.5, 3), Row(40.0, 1), Row(41.0, 1)), 0)


This is a minor improvement compared to pre-refactor.

dongjoon-hyun

+1, LGTM (Pending CIs). Thank you, @peter-toth .

viirya

One Minor Concern

applyPartialClustering=true is semantically overloaded. This flag was designed for the partial clustering join optimization, but here it's being used purely as a way to say "distribute one input split per output partition." The actual partial clustering join logic (deciding which side to replicate based on stats, join type checks, etc.) is completely irrelevant here — the flag just happens to switch alignToExpectedKeys into the "one split per task" branch instead of "all splits into one task."

This is correct but confusing. Someone reading GroupPartitionsExec(..., applyPartialClustering=true) in an ORDER BY context would reasonably wonder what partial clustering has to do with sorting. A cleaner fix might be a dedicated boolean like distributeInputPartitions, but that's a bigger change and the current approach works correctly.

A comment at the call site explaining why applyPartialClustering=true is used here would at minimum help, even if renaming the parameter is too big a change.

viirya

Small, well-targeted fix. The correctness bug was real and the fix is correct. The main review note is the semantic overloading of applyPartialClustering.

peter-toth · 2026-03-10T18:48:14Z

One Minor Concern

applyPartialClustering=true is semantically overloaded. This flag was designed for the partial clustering join optimization, but here it's being used purely as a way to say "distribute one input split per output partition." The actual partial clustering join logic (deciding which side to replicate based on stats, join type checks, etc.) is completely irrelevant here — the flag just happens to switch alignToExpectedKeys into the "one split per task" branch instead of "all splits into one task."

This is correct but confusing. Someone reading GroupPartitionsExec(..., applyPartialClustering=true) in an ORDER BY context would reasonably wonder what partial clustering has to do with sorting. A cleaner fix might be a dedicated boolean like distributeInputPartitions, but that's a bigger change and the current approach works correctly.

A comment at the call site explaining why applyPartialClustering=true is used here would at minimum help, even if renaming the parameter is too big a change.

Yeah, during the refactor I too was thinking about whether keeping the 2 flags (applyPartialClustering and replicatePartitions) in GroupPartitionsExec makes sense, because one flag would be enough to select from the 2 modes of GroupPartitionsExec to allign partitions to expectedPartitionKeys.
And now introducing the 3rd one seemed like an overkill.
How about changing GroupPartitionsExec and keeping only 1 flag as groupPartitions (or maybe distributePartitions), which is generic enough to be used in different contexts.

viirya · 2026-03-10T21:01:17Z

Yeah, during the refactor I too was thinking about whether keeping the 2 flags (applyPartialClustering and replicatePartitions) in GroupPartitionsExec makes sense, because one flag would be enough to select from the 2 modes of GroupPartitionsExec to allign partitions to expectedPartitionKeys. And now introducing the 3rd one seemed like an overkill. How about changing GroupPartitionsExec and keeping only 1 flag as groupPartitions (or maybe distributePartitions), which is generic enough to be used in different contexts.

The suggestion of a single distributePartitions flag is cleaner and more honest about what's actually happening:

distributePartitions=false → put all splits for a key into every expected output task (group when numSplits=1, replicate when numSplits>1)
distributePartitions=true → spread splits one per output task

This also resolves the naming confusion — distributePartitions describes the mechanical behavior of alignToExpectedKeys without implying anything about joins or skew handling. It would read naturally in both the partial clustering context and the OrderedDistribution sorting context.

…distributePartitions`

…th `KeyedPartitioning`

peter-toth · 2026-03-11T10:08:13Z

Yeah, during the refactor I too was thinking about whether keeping the 2 flags (applyPartialClustering and replicatePartitions) in GroupPartitionsExec makes sense, because one flag would be enough to select from the 2 modes of GroupPartitionsExec to allign partitions to expectedPartitionKeys. And now introducing the 3rd one seemed like an overkill. How about changing GroupPartitionsExec and keeping only 1 flag as groupPartitions (or maybe distributePartitions), which is generic enough to be used in different contexts.

The suggestion of a single distributePartitions flag is cleaner and more honest about what's actually happening:

distributePartitions=false → put all splits for a key into every expected output task (group when numSplits=1, replicate when numSplits>1) distributePartitions=true → spread splits one per output task

This also resolves the naming confusion — distributePartitions describes the mechanical behavior of alignToExpectedKeys without implying anything about joins or skew handling. It would read naturally in both the partial clustering context and the OrderedDistribution sorting context.

fad74ff does the rename.

I added 2 more commits:

afe32cc to remove unnecessary KeyedPartitioning.equals() and .hashCode() because partitionKeys is Seq[InternalRowComparableWrapper] after the refactor and
286574c to rename KeyGroupedShuffleSpec to KeyedShuffleSpec to be in sync with KeyedPartitioning.

peter-toth · 2026-03-12T09:30:06Z

Thank you @dongjoon-hyun and @viirya for the review.

Merged to master (4.2.0).

[SPARK-55535][SQL][FOLLOW-UP] Fix OrderedDistribution handling and …

2e418af

…other improvements to `EnsureRequirements`

peter-toth commented Mar 10, 2026

View reviewed changes

peter-toth changed the title ~~[SPARK-55535][SQL][FOLLOW-UP] Fix OrderedDistribution handling and other improvements to EnsureRequirements~~ [SPARK-55535][SQL][FOLLOW-UP] Fix OrderedDistribution handling and minor improvements to EnsureRequirements Mar 10, 2026

fix old test as we spare one GroupPartitionsExec after the improvement

83628c5

peter-toth commented Mar 10, 2026

View reviewed changes

dongjoon-hyun approved these changes Mar 10, 2026

View reviewed changes

viirya reviewed Mar 10, 2026

View reviewed changes

viirya approved these changes Mar 10, 2026

View reviewed changes

peter-toth added 3 commits March 11, 2026 10:27

combine applyPartialClustering and replicatePartitions flags to `…

fad74ff

…distributePartitions`

remove unnecessary KeyedPartitioning.equals() and .hashCode()

afe32cc

rename KeyGroupedShuffleSpec to KeyedShuffleSpec to be in sync wi…

286574c

…th `KeyedPartitioning`

peter-toth closed this in 14d659e Mar 12, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-55535][SQL][FOLLOW-UP] Fix `OrderedDistribution` handling and minor improvements to `EnsureRequirements`#54727

[SPARK-55535][SQL][FOLLOW-UP] Fix `OrderedDistribution` handling and minor improvements to `EnsureRequirements`#54727
peter-toth wants to merge 5 commits intoapache:masterfrom
peter-toth:SPARK-55535-refactor-kgp-and-spj-follow-up

peter-toth commented Mar 10, 2026 •

edited

Loading

Uh oh!

peter-toth Mar 10, 2026

Uh oh!

peter-toth Mar 10, 2026 •

edited

Loading

Uh oh!

peter-toth Mar 10, 2026 •

edited

Loading

Uh oh!

peter-toth commented Mar 10, 2026

Uh oh!

peter-toth Mar 10, 2026 •

edited

Loading

Uh oh!

dongjoon-hyun left a comment

Uh oh!

viirya left a comment

Uh oh!

viirya left a comment

Uh oh!

peter-toth commented Mar 10, 2026

Uh oh!

viirya commented Mar 10, 2026

Uh oh!

peter-toth commented Mar 11, 2026

Uh oh!

peter-toth commented Mar 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

peter-toth commented Mar 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

peter-toth Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

peter-toth Mar 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

peter-toth Mar 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

peter-toth commented Mar 10, 2026

Uh oh!

peter-toth Mar 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun left a comment

Choose a reason for hiding this comment

Uh oh!

viirya left a comment

Choose a reason for hiding this comment

Uh oh!

viirya left a comment

Choose a reason for hiding this comment

Uh oh!

peter-toth commented Mar 10, 2026

Uh oh!

viirya commented Mar 10, 2026

Uh oh!

peter-toth commented Mar 11, 2026

Uh oh!

peter-toth commented Mar 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

peter-toth commented Mar 10, 2026 •

edited

Loading

peter-toth Mar 10, 2026 •

edited

Loading

peter-toth Mar 10, 2026 •

edited

Loading

peter-toth Mar 10, 2026 •

edited

Loading