[SPARK-13136][SQL] Create a dedicated Broadcast exchange operator #11083

hvanhovell · 2016-02-04T21:47:28Z

Quite a few Spark SQL join operators broadcast one side of the join to all nodes. The are a few problems with this:

This conflates broadcasting (a data exchange) with joining. Data exchanges should be managed by a different operator.
All these nodes implement their own (duplicate) broadcasting logic.
Re-use of indices is quite hard.

This PR defines both a BroadcastDistribution and BroadcastPartitioning, these contain a BroadcastMode. The BroadcastMode defines the way in which we transform the Array of InternalRow's into an index. We currently support the following BroadcastMode's:

IdentityBroadcastMode: This broadcasts the rows in their original form.
HashSetBroadcastMode: This applies a projection to the input rows, deduplicates these rows and broadcasts the resulting Set.
HashedRelationBroadcastMode: This transforms the input rows into a HashedRelation, and broadcasts this index.

To match this distribution we implement a BroadcastExchange operator which will perform the broadcast for us, and have EnsureRequirements plan this operator. The old Exchange operator has been renamed into ShuffleExchange in order to clearly separate between Shuffled and Broadcasted exchanges. Finally the classes in Exchange.scala have been moved to a dedicated package.

cc @rxin @davies

SparkQA · 2016-02-04T23:03:22Z

Test build #50775 has finished for PR 11083 at commit c2b7533.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- case class Broadcast(

# Conflicts: # sql/core/src/main/scala/org/apache/spark/sql/execution/joins/BroadcastHashJoin.scala # sql/core/src/main/scala/org/apache/spark/sql/execution/joins/BroadcastHashOuterJoin.scala

hvanhovell · 2016-02-06T21:37:16Z

Retest this please

SparkQA · 2016-02-06T22:42:36Z

Test build #50879 has finished for PR 11083 at commit 02a61b8.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-02-06T22:59:44Z

Test build #50881 has finished for PR 11083 at commit 02a61b8.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

hvanhovell · 2016-02-06T23:35:33Z

Retest this please

SparkQA · 2016-02-07T00:59:46Z

Test build #50883 has finished for PR 11083 at commit 02a61b8.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

hvanhovell · 2016-02-07T09:02:19Z

This one is ready for review.

SparkQA · 2016-02-07T11:13:51Z

Test build #50897 has finished for PR 11083 at commit c12c8e6.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

hvanhovell · 2016-02-07T11:18:09Z

retest this please

SparkQA · 2016-02-07T12:42:03Z

Test build #50898 has finished for PR 11083 at commit c12c8e6.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

hvanhovell · 2016-02-07T13:43:21Z

retest this please

SparkQA · 2016-02-07T15:10:59Z

Test build #50900 has finished for PR 11083 at commit c12c8e6.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

hvanhovell · 2016-02-08T17:45:22Z

retest this please

davies · 2016-02-08T18:42:52Z

sql/core/src/main/scala/org/apache/spark/sql/execution/Broadcast.scala

+case class Broadcast(
+    f: Iterable[InternalRow] => Any,
+    child: SparkPlan)
+  extends UnaryNode with CodegenSupport {


Since we do include this in generated code of BroadcastHashJoin, I think it's better to not implement CodegenSupport, then we don't need the special case in CollapseCodegenStages

SparkQA · 2016-02-08T19:22:24Z

Test build #50928 has finished for PR 11083 at commit c12c8e6.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-02-08T23:24:58Z

Test build #50942 has finished for PR 11083 at commit c7dd7ae.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- case class Broadcast(f: Iterable[InternalRow] => Any, child: SparkPlan) extends UnaryNode

# Conflicts: # sql/core/src/main/scala/org/apache/spark/sql/execution/joins/BroadcastHashJoin.scala

SparkQA · 2016-02-10T19:37:58Z

Test build #51039 has finished for PR 11083 at commit e847383.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- case class AssertNotNull(child: Expression, walkedTypePath: Seq[String])
- case class ReturnAnswer(child: LogicalPlan) extends UnaryNode
- public class UnsafeRowParquetRecordReader extends SpecificParquetRecordReaderBase<InternalRow>
- case class CollectLimit(limit: Int, child: SparkPlan) extends UnaryNode
- trait BaseLimit extends UnaryNode
- case class LocalLimit(limit: Int, child: SparkPlan) extends BaseLimit
- case class GlobalLimit(limit: Int, child: SparkPlan) extends BaseLimit
- case class TakeOrderedAndProject(
- class FileStreamSource(
- trait HadoopFsRelationProvider extends StreamSourceProvider

rxin · 2016-02-11T06:09:09Z

@yhuai if you have some time this wk, can you review this?

rxin · 2016-02-11T06:10:04Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/physical/partitioning.scala

+  * Represents data where tuples are broadcasted to every node. It is quite common that the
+  * entire set of tuples is transformed into different data structure.
+  */
+case class BroadcastDistribution(f: Iterable[InternalRow] => Any = identity) extends Distribution


i'm thinking maybe it's better to just declare that we want a hashed broadcast distribution, and then don't take a closure. The reason it is bad to take a closure is that this won't work if we want to whole-stage codegen the building of the hash table, or if we want to change the internal engine to a push-based model.

hvanhovell · 2016-02-17T07:29:35Z

Retest this please

SparkQA · 2016-02-17T09:39:12Z

Test build #51418 has finished for PR 11083 at commit c7429bb.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

# Conflicts: # sql/core/src/main/scala/org/apache/spark/sql/execution/joins/BroadcastHashJoin.scala # sql/core/src/main/scala/org/apache/spark/sql/execution/joins/BroadcastHashOuterJoin.scala # sql/core/src/test/scala/org/apache/spark/sql/execution/joins/InnerJoinSuite.scala

SparkQA · 2016-02-20T15:44:00Z

Test build #51596 has finished for PR 11083 at commit b12bbc2.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

rxin · 2016-02-20T21:20:29Z

sql/core/src/main/scala/org/apache/spark/sql/execution/joins/LeftSemiJoinBNL.scala

@@ -29,22 +29,20 @@ import org.apache.spark.sql.execution.metric.SQLMetrics
 * for hash join.
 */
 case class LeftSemiJoinBNL(
-    streamed: SparkPlan, broadcast: SparkPlan, condition: Option[Expression])
+    left: SparkPlan, right: SparkPlan, condition: Option[Expression])


why did you do this change (streamed -> left, broadcast -> right)? this makes the variable name more confusing.

Yeah, I'll revert that.

rxin · 2016-02-20T21:23:18Z

I'm going to review this more carefully tonight.

rxin · 2016-02-20T21:23:48Z

@hvanhovell when you get a chance, please update the description if it merits any change.

SparkQA · 2016-02-21T01:12:11Z

Test build #51605 has finished for PR 11083 at commit 54b558d.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

rxin · 2016-02-21T08:41:47Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/physical/partitioning.scala

+ * Marker trait to identify the shape in which tuples are broadcasted. Typical examples of this are
+ * identity (tuples remain unchanged) or hashed (tuples are converted into some hash index).
+ */
+trait BroadcastMode {


I'd move this and IdentityBroadcastMode into a new file.

rxin · 2016-02-21T09:26:08Z

This looks pretty good actually.

hvanhovell · 2016-02-21T14:50:18Z

@rxin I agree that this is stretching the definitions of both Distribution and Partitioning. We should be able to define the form/shape in which a child node delivers it data to the current node. This would also allow us to pass ColumnarBatches, or could even be used to specify the row type passed. I have created https://issues.apache.org/jira/browse/SPARK-13421 to track this.

SparkQA · 2016-02-21T16:51:58Z

Test build #51635 has finished for PR 11083 at commit 4b5978b.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- trait BroadcastMode

SparkQA · 2016-02-21T16:59:03Z

Test build #51637 has finished for PR 11083 at commit c8c175e.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

rxin · 2016-02-21T20:32:12Z

Thanks. I'm going to merge this.

hvanhovell added 2 commits February 4, 2016 20:13

Initial Broadcast design

aa7120e

Fix Exchange and initial code gen attempt.

c2b7533

hvanhovell added 4 commits February 6, 2016 16:09

Move broadcast retreval to SparkPlan

6a5568a

Merge remote-tracking branch 'spark/master' into SPARK-13136

9adecdd

# Conflicts: # sql/core/src/main/scala/org/apache/spark/sql/execution/joins/BroadcastHashJoin.scala # sql/core/src/main/scala/org/apache/spark/sql/execution/joins/BroadcastHashOuterJoin.scala

Fix Codegen & Add other broadcast joins.

d0194fb

Minor touchup

02a61b8

hvanhovell changed the title ~~[SPARK-13136][SQL] Create a dedicated Broadcast exchange operator [WIP]~~ [SPARK-13136][SQL] Create a dedicated Broadcast exchange operator Feb 7, 2016

Move broadcast relation retrieval.

c12c8e6

davies reviewed Feb 8, 2016
View reviewed changes

Remove codegen from broadcast.

c7dd7ae

Merge remote-tracking branch 'spark/master' into SPARK-13136

e847383

# Conflicts: # sql/core/src/main/scala/org/apache/spark/sql/execution/joins/BroadcastHashJoin.scala

rxin reviewed Feb 11, 2016
View reviewed changes

hvanhovell mentioned this pull request Feb 20, 2016

[SPARK-13306] [SQL] uncorrelated scalar subquery #11190

Closed

rxin reviewed Feb 20, 2016
View reviewed changes

hvanhovell added 2 commits February 21, 2016 00:07

Revert renaming of variabels in LeftSemiJoinBNL.

9d52650

Revert renaming of variabels in LeftSemiJoinBNL.

54b558d

rxin reviewed Feb 21, 2016
View reviewed changes

hvanhovell added 3 commits February 21, 2016 14:27

Move all exchange related operators into the exchange package.

f33d2cb

CR

28363c8

Merge remote-tracking branch 'apache-github/master' into SPARK-13136

f812a31

hvanhovell added 2 commits February 21, 2016 15:54

put broadcast mode in a separate file.

4b5978b

Fix style in sqlcontext.

c8c175e

asfgit closed this in b6a873d Feb 21, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-13136][SQL] Create a dedicated Broadcast exchange operator #11083

[SPARK-13136][SQL] Create a dedicated Broadcast exchange operator #11083

hvanhovell commented Feb 4, 2016

SparkQA commented Feb 4, 2016

hvanhovell commented Feb 6, 2016

SparkQA commented Feb 6, 2016

SparkQA commented Feb 6, 2016

hvanhovell commented Feb 6, 2016

SparkQA commented Feb 7, 2016

hvanhovell commented Feb 7, 2016

SparkQA commented Feb 7, 2016

hvanhovell commented Feb 7, 2016

SparkQA commented Feb 7, 2016

hvanhovell commented Feb 7, 2016

SparkQA commented Feb 7, 2016

hvanhovell commented Feb 8, 2016

davies Feb 8, 2016

SparkQA commented Feb 8, 2016

SparkQA commented Feb 8, 2016

SparkQA commented Feb 10, 2016

rxin commented Feb 11, 2016

rxin Feb 11, 2016

hvanhovell commented Feb 17, 2016

SparkQA commented Feb 17, 2016

SparkQA commented Feb 20, 2016

rxin Feb 20, 2016

hvanhovell Feb 20, 2016

rxin commented Feb 20, 2016

rxin commented Feb 20, 2016

SparkQA commented Feb 21, 2016

rxin Feb 21, 2016

rxin commented Feb 21, 2016

hvanhovell commented Feb 21, 2016

SparkQA commented Feb 21, 2016

SparkQA commented Feb 21, 2016

rxin commented Feb 21, 2016

[SPARK-13136][SQL] Create a dedicated Broadcast exchange operator #11083

[SPARK-13136][SQL] Create a dedicated Broadcast exchange operator #11083

Conversation

hvanhovell commented Feb 4, 2016

SparkQA commented Feb 4, 2016

hvanhovell commented Feb 6, 2016

SparkQA commented Feb 6, 2016

SparkQA commented Feb 6, 2016

hvanhovell commented Feb 6, 2016

SparkQA commented Feb 7, 2016

hvanhovell commented Feb 7, 2016

SparkQA commented Feb 7, 2016

hvanhovell commented Feb 7, 2016

SparkQA commented Feb 7, 2016

hvanhovell commented Feb 7, 2016

SparkQA commented Feb 7, 2016

hvanhovell commented Feb 8, 2016

davies Feb 8, 2016

Choose a reason for hiding this comment

SparkQA commented Feb 8, 2016

SparkQA commented Feb 8, 2016

SparkQA commented Feb 10, 2016

rxin commented Feb 11, 2016

rxin Feb 11, 2016

Choose a reason for hiding this comment

hvanhovell commented Feb 17, 2016

SparkQA commented Feb 17, 2016

SparkQA commented Feb 20, 2016

rxin Feb 20, 2016

Choose a reason for hiding this comment

hvanhovell Feb 20, 2016

Choose a reason for hiding this comment

rxin commented Feb 20, 2016

rxin commented Feb 20, 2016

SparkQA commented Feb 21, 2016

rxin Feb 21, 2016

Choose a reason for hiding this comment

rxin commented Feb 21, 2016

hvanhovell commented Feb 21, 2016

SparkQA commented Feb 21, 2016

SparkQA commented Feb 21, 2016

rxin commented Feb 21, 2016