[SPARK-31869][SQL] BroadcastHashJoinExec can utilize the build side for its output partitioning #28676

imback82 · 2020-05-30T04:20:17Z

What changes were proposed in this pull request?

Currently, the BroadcastHashJoinExec's outputPartitioning only uses the streamed side's outputPartitioning. However, if the join type of BroadcastHashJoinExec is an inner-like join, the build side's info (the join keys) can be added to BroadcastHashJoinExec's outputPartitioning.

For example,

spark.conf.set("spark.sql.autoBroadcastJoinThreshold", "500")
val t1 = (0 until 100).map(i => (i % 5, i % 13)).toDF("i1", "j1")
val t2 = (0 until 100).map(i => (i % 5, i % 13)).toDF("i2", "j2")
val t3 = (0 until 20).map(i => (i % 7, i % 11)).toDF("i3", "j3")
val t4 = (0 until 100).map(i => (i % 5, i % 13)).toDF("i4", "j4")

// join1 is a sort merge join.
val join1 = t1.join(t2, t1("i1") === t2("i2"))

// join2 is a broadcast join where t3 is broadcasted.
val join2 = join1.join(t3, join1("i1") === t3("i3"))

// Join on the column from the broadcasted side (i3).
val join3 = join2.join(t4, join2("i3") === t4("i4"))

join3.explain

You see that Exchange hashpartitioning(i2#103, 200) is introduced because there is no output partitioning info from the build side.

== Physical Plan ==
*(6) SortMergeJoin [i3#29], [i4#40], Inner
:- *(4) Sort [i3#29 ASC NULLS FIRST], false, 0
:  +- Exchange hashpartitioning(i3#29, 200), true, [id=#55]
:     +- *(3) BroadcastHashJoin [i1#7], [i3#29], Inner, BuildRight
:        :- *(3) SortMergeJoin [i1#7], [i2#18], Inner
:        :  :- *(1) Sort [i1#7 ASC NULLS FIRST], false, 0
:        :  :  +- Exchange hashpartitioning(i1#7, 200), true, [id=#28]
:        :  :     +- LocalTableScan [i1#7, j1#8]
:        :  +- *(2) Sort [i2#18 ASC NULLS FIRST], false, 0
:        :     +- Exchange hashpartitioning(i2#18, 200), true, [id=#29]
:        :        +- LocalTableScan [i2#18, j2#19]
:        +- BroadcastExchange HashedRelationBroadcastMode(List(cast(input[0, int, false] as bigint))), [id=#34]
:           +- LocalTableScan [i3#29, j3#30]
+- *(5) Sort [i4#40 ASC NULLS FIRST], false, 0
   +- Exchange hashpartitioning(i4#40, 200), true, [id=#39]
      +- LocalTableScan [i4#40, j4#41]

This PR proposes to introduce output partitioning for the build side for BroadcastHashJoinExec if the streamed side has a HashPartitioning or a collection of HashPartitionings.

There is a new internal config spark.sql.execution.broadcastHashJoin.outputPartitioningExpandLimit, which can limit the number of partitioning a HashPartitioning can expand to. It can be set to "0" to disable this feature.

Why are the changes needed?

To remove unnecessary shuffle.

Does this PR introduce any user-facing change?

Yes, now the shuffle in the above example can be eliminated:

== Physical Plan ==
*(5) SortMergeJoin [i3#108], [i4#119], Inner
:- *(3) Sort [i3#108 ASC NULLS FIRST], false, 0
:  +- *(3) BroadcastHashJoin [i1#86], [i3#108], Inner, BuildRight
:     :- *(3) SortMergeJoin [i1#86], [i2#97], Inner
:     :  :- *(1) Sort [i1#86 ASC NULLS FIRST], false, 0
:     :  :  +- Exchange hashpartitioning(i1#86, 200), true, [id=#120]
:     :  :     +- LocalTableScan [i1#86, j1#87]
:     :  +- *(2) Sort [i2#97 ASC NULLS FIRST], false, 0
:     :     +- Exchange hashpartitioning(i2#97, 200), true, [id=#121]
:     :        +- LocalTableScan [i2#97, j2#98]
:     +- BroadcastExchange HashedRelationBroadcastMode(List(cast(input[0, int, false] as bigint))), [id=#126]
:        +- LocalTableScan [i3#108, j3#109]
+- *(4) Sort [i4#119 ASC NULLS FIRST], false, 0
   +- Exchange hashpartitioning(i4#119, 200), true, [id=#130]
      +- LocalTableScan [i4#119, j4#120]

How was this patch tested?

Added new tests.

sql/core/src/main/scala/org/apache/spark/sql/execution/joins/BroadcastHashJoinExec.scala

SparkQA · 2020-05-30T07:05:02Z

Test build #123310 has finished for PR 28676 at commit 985834b.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds no public classes.

maropu · 2020-05-30T07:17:57Z

retest this please

SparkQA · 2020-05-30T10:30:19Z

Test build #123313 has finished for PR 28676 at commit 985834b.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

sql/core/src/test/scala/org/apache/spark/sql/execution/adaptive/AdaptiveQueryExecSuite.scala

SparkQA · 2020-05-31T07:05:02Z

Test build #123326 has finished for PR 28676 at commit 683a705.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds no public classes.

imback82 · 2020-05-31T07:15:54Z

retest this please

SparkQA · 2020-05-31T11:48:16Z

Test build #123334 has finished for PR 28676 at commit 683a705.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

sql/core/src/main/scala/org/apache/spark/sql/execution/joins/BroadcastHashJoinExec.scala

SparkQA · 2020-06-26T07:05:01Z

Test build #124528 has finished for PR 28676 at commit 488e051.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds no public classes.

maropu · 2020-06-27T06:04:30Z

retest this please

SparkQA · 2020-06-27T07:05:02Z

Test build #124561 has finished for PR 28676 at commit 488e051.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds no public classes.

maropu · 2020-06-27T10:08:42Z

retest this please

SparkQA · 2020-06-27T15:03:43Z

Test build #124565 has finished for PR 28676 at commit 488e051.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

sql/core/src/main/scala/org/apache/spark/sql/execution/joins/BroadcastHashJoinExec.scala

SparkQA · 2020-06-30T16:19:46Z

Test build #124672 has started for PR 28676 at commit febc402.

sql/core/src/main/scala/org/apache/spark/sql/execution/joins/BroadcastHashJoinExec.scala

cloud-fan · 2020-07-16T11:12:46Z

sql/core/src/main/scala/org/apache/spark/sql/execution/joins/BroadcastHashJoinExec.scala

+        generateExprCombinations(current.tail, accumulated :+ current.head) ++
+          buildKeys.map { bKeys =>
+            bKeys.flatMap { bKey =>
+              if (currentNumCombinations < maxNumCombinations) {


do we need this if? I think generateExprCombinations will return Nil if hitting the upper bound.

Wanted to avoid unnecessary recursion (+ not creating new Seq, etc.), but I removed the check for simplicity.

sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashJoin.scala

sql/core/src/test/scala/org/apache/spark/sql/execution/joins/BroadcastJoinSuite.scala

sql/core/src/test/scala/org/apache/spark/sql/execution/adaptive/AdaptiveQueryExecSuite.scala

SparkQA · 2020-07-16T12:51:54Z

Test build #125947 has finished for PR 28676 at commit afa5aca.

This patch fails PySpark pip packaging tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2020-07-17T02:45:14Z

Test build #126004 has finished for PR 28676 at commit 51187dc.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala

viirya

Can you also mention the config in the description?

sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala

SparkQA · 2020-07-17T23:56:14Z

Test build #126067 has finished for PR 28676 at commit 80df4dc.

This patch fails PySpark pip packaging tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2020-07-18T00:06:56Z

Test build #126068 has finished for PR 28676 at commit ba19acb.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2020-07-18T02:32:30Z

Test build #126075 has finished for PR 28676 at commit 9caeecd.

This patch fails PySpark pip packaging tests.
This patch merges cleanly.
This patch adds no public classes.

imback82 · 2020-07-18T03:16:13Z

retest this please

SparkQA · 2020-07-18T03:56:05Z

Test build #126087 has finished for PR 28676 at commit 9caeecd.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

imback82 · 2020-07-18T04:20:47Z

retest this please

SparkQA · 2020-07-18T07:05:02Z

Test build #126091 has finished for PR 28676 at commit 9caeecd.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds no public classes.

imback82 · 2020-07-18T07:15:39Z

retest this please

SparkQA · 2020-07-18T12:07:13Z

Test build #126097 has finished for PR 28676 at commit 9caeecd.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2020-07-20T14:25:49Z

thanks, merging to master!

imback82 added 3 commits May 29, 2020 20:22

initial checkin

93947ab

Merge branch 'master' into broadcast_join_output

225e250

update comment

985834b

probot-autolabeler bot added the SQL label May 30, 2020

viirya reviewed May 30, 2020

View reviewed changes