[SPARK-53322][SQL] Select a KeyGroupedShuffleSpec only when join key positions can be fully pushed down #53098

chirag-s-db · 2025-11-17T17:06:39Z

What changes were proposed in this pull request?

When a KeyGroupedShuffleSpec is used to shuffle another child of a JOIN, we must be able to push down JOIN keys or partition values to be able to ensure that both children have matching partitioning. If one child reports a KeyGroupedPartitioning but we can't push down these values (for example, if the child was a key-grouped scan that was checkpointed), then this information cannot be pushed down to the child scan and we should avoid using this shuffle spec to shuffle other children.

Why are the changes needed?

Prevents potential correctness issue when key-grouped partitioning is used on a checkpointed RDD.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

See test changes.

Was this patch authored or co-authored using generative AI tooling?

No.

chirag-s-db · 2025-11-17T17:06:59Z

@cloud-fan @szehon-ho Could you take a look at this PR when you get the chance?

szehon-ho

Nice catch! @sunchao FYI

szehon-ho · 2025-11-17T19:27:41Z

sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/EnsureRequirements.scala

+            // To choose a KeyGroupedShuffleSpec, we must be able to push down SPJ parameters into
+            // the scan (for join key positions). If these parameters can't be pushed down, this
+            // spec can't be used to shuffle other children.
+            case (idx, _: KeyGroupedShuffleSpec) =>  canPushDownSPJParamsToScan(children(idx))


nit: extra space after '=>'

szehon-ho · 2025-11-17T19:30:26Z

sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/EnsureRequirements.scala

      // Choose all the specs that can be used to shuffle other children
      val candidateSpecs = specs
          .filter(_._2.canCreatePartitioning)
+          .filter {


just for reference , were both checks needed? ie this and the other check in 'checkKeyGroupCompatible'

yeah, both checks are needed. The reason We need the check in checkKeyGroupCompatible for the case that both children are key-grouped partitionings, and this check handles the case where only 1 child is a key-grouped partitioning and is shuffling a non-KGP plan

Trying to understand this too. In checkKeyGroupCompatible we already makes sure that both children are of KeyGroupedPartitioning. This new check additionally checks that leaf nodes from both are all KeyGroupedPartitionedScan?

checkKeyGroupCompatible applies to the case where we have 2 KeyGroupedPartitioned scans that are being joined against each other. For example, something like:

SortMergeJoinExec ... +- BatchScanExec tbl1 ... -> reporting KeyGroupedPartitioning +- BatchScanExec tbl2 ... -> reporting KeyGroupedPartitioning

If one child is not KeyGroupedPartitioned, we can still avoid the shuffle for one child (in general):

SortMergeJoinExec ... +- BatchScanExec tbl1 ... -> reporting KeyGroupedPartitioning +- ShuffleExchangeExec KeyGroupedPartitioning +- BatchScanExec tbl2 ... -> reporting UnknownPartitioning

However, if the child reporting the KeyGroupedPartitioning is not a BatchScanExec, then we can't safely push down the JOIN keys, making it unsafe to do this. This may arise if we call .checkpoint() on a BatchScanExec:

SortMergeJoinExec ... +- RDDScanExec ... -> reporting KeyGroupedPartitioning (coming from ckpt of tbl1 scan) +- ShuffleExchangeExec KeyGroupedPartitioning +- BatchScanExec tbl2 ... -> reporting UnknownPartitioning

This extra check is for this second case, where we want to make sure that we're not using a KeyGroupedPartitioning to shuffle another child of a JOIN without being able to push down JOIN keys. The test "SPARK-53322: checkpointed scans can't shuffle other children on SPJ" is for this case, and will fail without this change.

I see, thanks for the explanation!

szehon-ho · 2025-11-17T21:15:03Z

sql/core/src/test/scala/org/apache/spark/sql/execution/exchange/EnsureRequirementsSuite.scala


  private val EnsureRequirements = new EnsureRequirements()

+  /** Helper to add dummy BatchScanExec child to a dummy plan (to ensure SPJ can kick in). */


actually why not have another case class altogether (that inherit from DummySparkPlan) and use that in the tests?

Can't be a case class (no case-to-case inheritance), but we can just do a normal class

sunchao · 2025-11-18T06:37:13Z

sql/core/src/test/scala/org/apache/spark/sql/connector/KeyGroupedPartitioningSuite.scala

+  test("SPARK-53322: checkpointed scans avoid shuffles for aggregates") {
+    withTempDir { dir =>
+      spark.sparkContext.setCheckpointDir(dir.getPath)
+      val items_partitions = Array(identity("id"))


nit: use camel cases?

sunchao · 2025-11-18T06:46:06Z

sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/EnsureRequirements.scala

      // Choose all the specs that can be used to shuffle other children
      val candidateSpecs = specs
          .filter(_._2.canCreatePartitioning)
+          .filter {


Trying to understand this too. In checkKeyGroupCompatible we already makes sure that both children are of KeyGroupedPartitioning. This new check additionally checks that leaf nodes from both are all KeyGroupedPartitionedScan?

sunchao

LGTM pending CI

sunchao · 2025-11-18T23:34:13Z

sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/EnsureRequirements.scala

      // Choose all the specs that can be used to shuffle other children
      val candidateSpecs = specs
          .filter(_._2.canCreatePartitioning)
+          .filter {


I see, thanks for the explanation!

cloud-fan · 2025-11-19T00:57:28Z

@chirag-s-db can you re-trigger the CI jobs? Seems flaky

cloud-fan · 2025-11-19T03:22:42Z

thanks, merging to master!

cloud-fan · 2025-11-19T03:26:06Z

@chirag-s-db can you open a new PR against branch-4.1 as this is for correctness?

…positions can be fully pushed down ### What changes were proposed in this pull request? When a KeyGroupedShuffleSpec is used to shuffle another child of a JOIN, we must be able to push down JOIN keys or partition values to be able to ensure that both children have matching partitioning. If one child reports a KeyGroupedPartitioning but we can't push down these values (for example, if the child was a key-grouped scan that was checkpointed), then this information cannot be pushed down to the child scan and we should avoid using this shuffle spec to shuffle other children. ### Why are the changes needed? Prevents potential correctness issue when key-grouped partitioning is used on a checkpointed RDD. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? See test changes. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#53098 from chirag-s-db/checkpoint-pushdown. Lead-authored-by: Chirag Singh <chirag.singh@databricks.com> Co-authored-by: Chirag Singh <137233133+chirag-s-db@users.noreply.github.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>

chirag-s-db · 2025-11-19T15:41:23Z

@cloud-fan Opened #53132

…positions can be fully pushed down ### What changes were proposed in this pull request? When a KeyGroupedShuffleSpec is used to shuffle another child of a JOIN, we must be able to push down JOIN keys or partition values to be able to ensure that both children have matching partitioning. If one child reports a KeyGroupedPartitioning but we can't push down these values (for example, if the child was a key-grouped scan that was checkpointed), then this information cannot be pushed down to the child scan and we should avoid using this shuffle spec to shuffle other children. ### Why are the changes needed? Prevents potential correctness issue when key-grouped partitioning is used on a checkpointed RDD. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? See test changes. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#53098 from chirag-s-db/checkpoint-pushdown. Lead-authored-by: Chirag Singh <chirag.singh@databricks.com> Co-authored-by: Chirag Singh <137233133+chirag-s-db@users.noreply.github.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>

fixes

28c1075

github-actions bot added the SQL label Nov 17, 2025

fix

61920d8

szehon-ho reviewed Nov 17, 2025

View reviewed changes

fix

260d008

chirag-s-db requested a review from szehon-ho November 17, 2025 23:50

szehon-ho approved these changes Nov 18, 2025

View reviewed changes

sunchao reviewed Nov 18, 2025

View reviewed changes

fix

58a638e

chirag-s-db requested a review from sunchao November 18, 2025 15:52

sunchao approved these changes Nov 18, 2025

View reviewed changes

Merge branch 'apache:master' into checkpoint-pushdown

6292918

cloud-fan closed this in dff0620 Nov 19, 2025


		private val EnsureRequirements = new EnsureRequirements()

		/** Helper to add dummy BatchScanExec child to a dummy plan (to ensure SPJ can kick in). */

[SPARK-53322][SQL] Select a KeyGroupedShuffleSpec only when join key positions can be fully pushed down #53098

[SPARK-53322][SQL] Select a KeyGroupedShuffleSpec only when join key positions can be fully pushed down #53098

Conversation

chirag-s-db commented Nov 17, 2025

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

chirag-s-db commented Nov 17, 2025

Uh oh!

szehon-ho left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sunchao left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cloud-fan commented Nov 19, 2025

Uh oh!

cloud-fan commented Nov 19, 2025

Uh oh!

cloud-fan commented Nov 19, 2025

Uh oh!

chirag-s-db commented Nov 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants