[SPARK-4485][SQL] (1) Add broadcast hash outer join, (2) Fix SparkPlanTest #7162

kai-zeng · 2015-07-01T17:30:00Z

This pull request
(1) extracts common functions used by hash outer joins and put it in interface HashOuterJoin
(2) adds ShuffledHashOuterJoin and BroadcastHashOuterJoin
(3) adds test cases for shuffled and broadcast hash outer join
(3) makes SparkPlanTest to support binary or more complex operators, and fixes bugs in plan composition in SparkPlanTest

SparkQA · 2015-07-01T17:36:28Z

Test build #36283 has started for PR 7162 at commit b5a4efa.

JoshRosen · 2015-07-01T17:38:13Z

sql/core/src/test/scala/org/apache/spark/sql/execution/SparkPlanTest.scala

+   * @param expectedAnswer the expected result in a [[Seq]] of [[Product]]s.
+   */
+  protected def checkAnswer[A <: Product : TypeTag](
+    left: DataFrame,


Style: these parameters need to be indented more.

tarekbecker · 2015-07-01T17:41:00Z

I assume there is a Jira ticket for this, can you add it to the title?

SparkQA · 2015-07-01T17:49:39Z

Test build #36285 has started for PR 7162 at commit dc5127e.

kai-zeng · 2015-07-01T22:12:44Z

cc @marmbrus

rxin · 2015-07-02T02:23:44Z

sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala

-        joins.HashOuterJoin(
-          leftKeys, rightKeys, joinType, condition, planLater(left), planLater(right)) :: Nil
+        joinType match {
+          case LeftOuter if sqlContext.conf.autoBroadcastJoinThreshold > 0 &&


can you use CanBroadcast here

SparkQA · 2015-07-02T07:14:36Z

Test build #36360 has started for PR 7162 at commit 14e4bf8.

marmbrus · 2015-07-03T23:09:45Z

sql/core/src/main/scala/org/apache/spark/sql/execution/joins/BroadcastHashOuterJoin.scala

+    val input: Array[InternalRow] = buildPlan.execute().map(_.copy()).collect()
+    // buildHashTable uses code-generated rows as keys, which are not serializable
+    val hashed = new GeneralHashedRelation(
+      buildHashTable(input.iterator, newProjection(buildKeys, buildPlan.output)))


I think you need to use new InterpretedProjection here, otherwise you try to broadcast code-generated SpecificRows, which fails when in non-local mode. See: #7213.

@davies / @rxin , I'm now officially in favor of removing GenerateProjection.

@marmbrus Yeah, that's why I used GeneralHashedRelation to wrap the hash table. Which way do you think is better?

This still fails for me when I run it on a real cluster. I'd just change this to buildHashTable(input.iterator, new InterpretedProjection(buildKeys, buildPlan.output))) or we might even just change newProjection to always use InterpretedProjection.

… relations

SparkQA · 2015-07-04T00:06:40Z

Test build #36517 has started for PR 7162 at commit 3742359.

marmbrus · 2015-07-06T19:52:03Z

Here are some performance results:

Pretty significant speedups when one of the tables is much larger.

marmbrus · 2015-07-06T19:53:20Z

sql/core/src/main/scala/org/apache/spark/sql/execution/joins/BroadcastHashOuterJoin.scala

+object BroadcastHashOuterJoin {
+
+  private val broadcastHashOuterJoinExecutionContext = ExecutionContext.fromExecutorService(
+    ThreadUtils.newDaemonCachedThreadPool("broadcast-hash-outer-join", 128))


It would probably be reasonable to have a single threadpool that we share for all broadcasting.

marmbrus · 2015-07-06T21:33:09Z

I'm going to go ahead and merge this to unblock other work on the plan tests. We can take care of threadpool consolidation in a followup. Thanks!

JoshRosen · 2015-07-06T23:40:07Z

sql/core/src/test/scala/org/apache/spark/sql/execution/SparkPlanTest.scala

-      case plan: SparkPlan =>
-        val inputMap = plan.children.flatMap(_.output).zipWithIndex.map {
-          case (a, i) =>
-            (a.name, BoundReference(i, a.dataType, a.nullable))


@kai-zeng, why did you remove this code which creates BoundReferences? In the other half of the diff below, it looks like the new code is only mapping from the attribute's name back to the attribute itself rather than binding the reference. This change caused a test case in my sorting patch to fail.

After changing this new code to continue to generate BoundReferences, the test in this patch fails:

[info] == Exception == [info] org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 1888.0 failed 1 times, most recent failure: Lost task 0.0 in stage 1888.0 (TID 6596, localhost): java.lang.ArrayIndexOutOfBoundsException: 2 [info] at org.apache.spark.sql.catalyst.expressions.ArrayBackedRow$class.apply(rows.scala:88) [info] at org.apache.spark.sql.catalyst.expressions.GenericInternalRow.apply(rows.scala:144) [info] at org.apache.spark.sql.Row$class.isNullAt(Row.scala:182) [info] at org.apache.spark.sql.catalyst.InternalRow.isNullAt(InternalRow.scala:28) [info] at SC$SpecificProjection.apply(Unknown Source) [info] at org.apache.spark.sql.execution.Exchange$$anonfun$doExecute$1$$anonfun$6$$anonfun$apply$3.apply(Exchange.scala:166) [info] at org.apache.spark.sql.execution.Exchange$$anonfun$doExecute$1$$anonfun$6$$anonfun$apply$3.apply(Exchange.scala:166) [info] at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) [info] at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.insertAll(BypassMergeSortShuffleWriter.java:119) [info] at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:73) [info] at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:70) [info] at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) [info] at org.apache.spark.scheduler.Task.run(Task.scala:70) [info] at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) [info] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [info] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [info] at java.lang.Thread.run(Thread.java:745)

/cc @marmbrus

@JoshRosen Hi Josh. I still prefer my way of resolving attributes, for two reasons:
(1) References are bound in each operator, that's certainly something we should test. So in my opinion, we shouldn't bind the references manually in the test suite.
(2) Manually binding the references isn't good for operators with two or more inputs. For these operators, there actually could be different ways to binding references depending on which implementation is used. The old implementation SparkPlanTest cannot handle operators with two or more inputs. We can certainly fix the old implementation by fix binding for binary operators, but that's gonna be tedious later, say if we are gona change some implementation

@kai-zeng, I ended up discussing offline with Michael and reached the same conclusion. Thanks for your help!

… my test." This reverts commit 7c3c864.

This patch adds a cache-friendly external sorter which operates on serialized bytes and uses this sorter to implement a new sort operator for Spark SQL and DataFrames. ### Overview of the new sorter The new sorter design is inspired by [Alphasort](http://research.microsoft.com/pubs/68249/alphasort.doc) and implements a key-prefix optimization in order to improve the cache friendliness of the sort. In naive sort implementations, the sorting algorithm operates on an array of record pointers. To compare two records for ordering, the sorter must dereference these pointers, which likely involves random memory access, then compare the objects themselves. ![image](https://cloud.githubusercontent.com/assets/50748/8611390/3b1402ae-2675-11e5-8308-1a10bf347e6e.png) In a key-prefix sort, the sort operates on an array which stores the record pointer alongside a prefix of the record's key. When comparing two records for ordering, the sorter first compares the the stored key prefixes. If the ordering can be determined from the key prefixes (i.e. the prefixes are unequal), then the sort can avoid directly comparing the records, avoiding random memory accesses and full record comparisons. For example, if we're sorting a list of strings then we can store the first 8 bytes of the UTF-8 encoded string as the key-prefix and can perform unsigned byte-at-a-time comparisons to determine the ordering of strings based on their prefixes, only resorting to full comparisons for strings that share a common prefix. In cases where the sort key can fit entirely in the space allotted for the key prefix (e.g. the sorting key is an integer), we completely avoid direct record comparison. In this patch's implementation of key-prefix sorting, our sorter's internal array stores a 64-bit long and 64-bit pointer for each record being sorted. The key prefixes are generated by the user when inserting records into the sorter, which uses a user-defined comparison function for comparing them. The `PrefixComparators` object implements a set of comparators for many common types, including primitive numeric types and UTF-8 strings. The actual sorting is implemented by `UnsafeInMemorySorter`. Most consumers will not use this directly, but instead will use `UnsafeExternalSorter`, a class which implements a sort that can spill to disk in response to memory pressure. Internally, `UnsafeExternalSorter` creates `UnsafeInMemorySorters` to perform sorting and uses `UnsafeSortSpillReader/Writer` to spill and read back runs of sorted records and `UnsafeSortSpillMerger` to merge multiple sorted spills into a single sorted iterator. This external sorter integrates with Spark's existing ShuffleMemoryManager for controlling spilling. Many parts of this sorter's design are based on / copied from the more specialized external sort implementation that I designed for the new UnsafeShuffleManager write path; see #5868 for more details on that patch. ### Sorting rows in Spark SQL For now, `UnsafeExternalSorter` is only used by Spark SQL, which uses it to implement a new sort operator, `UnsafeExternalSort`. This sort operator uses a SQL-specific class called `UnsafeExternalRowSorter` that configures an `UnsafeExternalSorter` to use prefix generators and comparators that operate on rows encoded in the UnsafeRow format that was designed for Project Tungsten. I used some interesting unit-testing techniques to test this patch's SQL-specific components. `UnsafeExternalSortSuite` uses the SQL random data generators introduced in #7176 to test the UnsafeSort operator with all atomic types both with and without nullability and in both ascending and descending sort orders. `PrefixComparatorsSuite` contains a cool use of ScalaCheck + ScalaTest's `GeneratorDrivenPropertyChecks` in order to test UTF8String prefix comparison. ### Misc. additional improvements made in this patch This patch made several miscellaneous improvements to related code in Spark SQL: - The logic for selecting physical sort operator implementations, which was partially duplicated in both `Exchange` and `SparkStrategies, has now been consolidated into a `getSortOperator()` helper function in `SparkStrategies`. - The `SparkPlanTest` unit testing helper trait has been extended with new methods for comparing the output produced by two different physical plans. This makes it easy to write tests which assert that two physical operator implementations should produce the same output. I also added a method for disabling the implicit sorting of outputs prior to comparing them, a change which is necessary in order to be able to write proper SparkPlan tests for sort operators. ### Tasks deferred to followup patches While most of this patch's features are reasonably well-tested and complete, there are a number of tasks that are intentionally being deferred to followup patches: - Add tests which mock the ShuffleMemoryManager to check that memory pressure properly triggers spilling (there are examples of this type of test in #5868). - Add tests to ensure that spill files are properly cleaned up after errors. I'd like to do this in the context of a patch which introduces more general metrics for ensuring proper cleanup of tasks' temporary files; see https://issues.apache.org/jira/browse/SPARK-8966 for more details. - Metrics integration: there are some open questions regarding how to track / report spill metrics for non-shuffle operations, so I've deferred most of the IO / shuffle metrics integration for now. - Performance profiling.  [<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/spark/6444)  Author: Josh Rosen <joshrosen@databricks.com> Closes #6444 from JoshRosen/sql-external-sort and squashes the following commits: 6beb467 [Josh Rosen] Remove a bunch of overloaded methods to avoid default args. issue 2bbac9c [Josh Rosen] Merge remote-tracking branch 'origin/master' into sql-external-sort 35dad9f [Josh Rosen] Make sortAnswers = false the default in SparkPlanTest 5135200 [Josh Rosen] Fix spill reading for large rows; add test 2f48777 [Josh Rosen] Add test and fix bug for sorting empty arrays d1e28bc [Josh Rosen] Merge remote-tracking branch 'origin/master' into sql-external-sort cd05866 [Josh Rosen] Fix scalastyle 3947fc1 [Josh Rosen] Merge remote-tracking branch 'origin/master' into sql-external-sort d13ac55 [Josh Rosen] Hacky approach to copying of UnsafeRows for sort followed by limit. 845bea3 [Josh Rosen] Remove unnecessary zeroing of row conversion buffer c56ec18 [Josh Rosen] Clean up final row copying code. d31f180 [Josh Rosen] Re-enable NullType sorting test now that SPARK-8868 is fixed 844f4ca [Josh Rosen] Merge remote-tracking branch 'origin/master' into sql-external-sort 293f109 [Josh Rosen] Add missing license header. f99a612 [Josh Rosen] Fix bugs in string prefix comparison. 9d00afc [Josh Rosen] Clean up prefix comparators for integral types 88aff18 [Josh Rosen] NULL_PREFIX has to be negative infinity for floating point types 613e16f [Josh Rosen] Test with larger data. 1d7ffaa [Josh Rosen] Somewhat hacky fix for descending sorts 08701e7 [Josh Rosen] Fix prefix comparison of null primitives. b86e684 [Josh Rosen] Set global = true in UnsafeExternalSortSuite. 1c7bad8 [Josh Rosen] Make sorting of answers explicit in SparkPlanTest.checkAnswer(). b81a920 [Josh Rosen] Temporarily enable only the passing sort tests 5d6109d [Josh Rosen] Fix inconsistent handling / encoding of record lengths. 87b6ed9 [Josh Rosen] Fix critical issues in test which led to false negatives. 8d7fbe7 [Josh Rosen] Fixes to multiple spilling-related bugs. 82e21c1 [Josh Rosen] Force spilling in UnsafeExternalSortSuite. 88b72db [Josh Rosen] Test ascending and descending sort orders. f27be09 [Josh Rosen] Fix tests by binding attributes. 0a79d39 [Josh Rosen] Revert "Undo part of a SparkPlanTest change in #7162 that broke my test." 7c3c864 [Josh Rosen] Undo part of a SparkPlanTest change in #7162 that broke my test. 9969c14 [Josh Rosen] Merge remote-tracking branch 'origin/master' into sql-external-sort 5822e6f [Josh Rosen] Fix test compilation issue 939f824 [Josh Rosen] Remove code gen experiment. 0dfe919 [Josh Rosen] Implement prefix sort for strings (albeit inefficiently). 66a813e [Josh Rosen] Prefix comparators for float and double b310c88 [Josh Rosen] Integrate prefix comparators for Int and Long (others coming soon) 95058d9 [Josh Rosen] Add missing SortPrefixUtils file 4c37ba6 [Josh Rosen] Add tests for sorting on all primitive types. 6890863 [Josh Rosen] Fix memory leak on empty inputs. d246e29 [Josh Rosen] Fix consideration of column types when choosing sort implementation. 6b156fb [Josh Rosen] Some WIP work on prefix comparison. 7f875f9 [Josh Rosen] Commit failing test demonstrating bug in handling objects in spills 41b8881 [Josh Rosen] Get UnsafeInMemorySorterSuite to pass (WIP) 90c2b6a [Josh Rosen] Update test name 6d6a1e6 [Josh Rosen] Centralize logic for picking sort operator implementations 9869ec2 [Josh Rosen] Clean up Exchange code a bit 82bb0ec [Josh Rosen] Fix IntelliJ complaint due to negated if condition 1db845a [Josh Rosen] Many more changes to harmonize with shuffle sorter ebf9eea [Josh Rosen] Harmonization with shuffle's unsafe sorter 206bfa2 [Josh Rosen] Add some missing newlines at the ends of files 26c8931 [Josh Rosen] Back out some Hive changes that aren't needed anymore 62f0bb8 [Josh Rosen] Update to reflect SparkPlanTest changes 21d7d93 [Josh Rosen] Back out of BlockObjectWriter change 7eafecf [Josh Rosen] Port test to SparkPlanTest d468a88 [Josh Rosen] Update for InternalRow refactoring 269cf86 [Josh Rosen] Back out SMJ operator change; isolate changes to selection of sort op. 1b841ca [Josh Rosen] WIP towards copying b420a71 [Josh Rosen] Move most of the existing SMJ code into Java. dfdb93f [Josh Rosen] SparkFunSuite change 73cc761 [Josh Rosen] Fix whitespace 9cc98f5 [Josh Rosen] Move more code to Java; fix bugs in UnsafeRowConverter length type. c8792de [Josh Rosen] Remove some debug logging dda6752 [Josh Rosen] Commit some missing code from an old git stash. 58f36d0 [Josh Rosen] Merge in a sketch of a unit test for the new sorter (now failing). 2bd8c9a [Josh Rosen] Import my original tests and get them to pass. d5d3106 [Josh Rosen] WIP towards external sorter for Spark SQL.

JoshRosen · 2015-08-04T20:03:53Z

sql/core/src/main/scala/org/apache/spark/sql/execution/joins/ShuffledHashOuterJoin.scala

+import org.apache.spark.sql.catalyst.plans.{FullOuter, JoinType, LeftOuter, RightOuter}
+import org.apache.spark.sql.execution.{BinaryNode, SparkPlan}
+
+import scala.collection.JavaConversions._


@kai-zeng I'm in the process of refactoring some parts of this file as part of another PR of mine and wanted to briefly call out two minor performance considerations that I noticed while looking at this code.

The first is the use of JavaConversions here, which allows for implicit conversion between Java and Scala classes. This leads to the creation of an unnecessary per-row MapWrapper in FullOuter.

Yep. I agree with your first concern. It is really necessary to refactor the outer join as now we have both shuffled and broadcast outer join.

(1) Add broadcast hash outer join, (2) Fix SparkPlanTest

b5a4efa

kai-zeng changed the title ~~(1) Add broadcast hash outer join, (2) Fix SparkPlanTest~~ [SQL] (1) Add broadcast hash outer join, (2) Fix SparkPlanTest Jul 1, 2015

JoshRosen reviewed Jul 1, 2015
View reviewed changes

code style fixes

dc5127e

kai-zeng changed the title ~~[SQL] (1) Add broadcast hash outer join, (2) Fix SparkPlanTest~~ [SPARK-4485][SQL] (1) Add broadcast hash outer join, (2) Fix SparkPlanTest Jul 1, 2015

rxin reviewed Jul 2, 2015
View reviewed changes

Use CanBroadcast in broadcast outer join planning

14e4bf8

marmbrus reviewed Jul 3, 2015
View reviewed changes

Fix not-serializable exception for code-generated keys in broadcasted…

3742359

… relations

marmbrus reviewed Jul 6, 2015
View reviewed changes

asfgit closed this in 2471c0b Jul 6, 2015

JoshRosen reviewed Jul 6, 2015
View reviewed changes

JoshRosen added a commit to JoshRosen/spark that referenced this pull request Jul 6, 2015

Undo part of a SparkPlanTest change in apache#7162 that broke my test.

7c3c864

JoshRosen added a commit to JoshRosen/spark that referenced this pull request Jul 7, 2015

Revert "Undo part of a SparkPlanTest change in apache#7162 that broke…

0a79d39

… my test." This reverts commit 7c3c864.

JoshRosen reviewed Aug 4, 2015
View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-4485][SQL] (1) Add broadcast hash outer join, (2) Fix SparkPlanTest #7162

[SPARK-4485][SQL] (1) Add broadcast hash outer join, (2) Fix SparkPlanTest #7162

kai-zeng commented Jul 1, 2015

SparkQA commented Jul 1, 2015

JoshRosen Jul 1, 2015

tarekbecker commented Jul 1, 2015

SparkQA commented Jul 1, 2015

kai-zeng commented Jul 1, 2015

rxin Jul 2, 2015

SparkQA commented Jul 2, 2015

marmbrus Jul 3, 2015

kai-zeng Jul 3, 2015

marmbrus Jul 3, 2015

kai-zeng Jul 3, 2015

SparkQA commented Jul 4, 2015

marmbrus commented Jul 6, 2015

marmbrus Jul 6, 2015

marmbrus commented Jul 6, 2015

JoshRosen Jul 6, 2015

JoshRosen Jul 6, 2015

kai-zeng Jul 7, 2015

JoshRosen Jul 7, 2015

JoshRosen Aug 4, 2015

kai-zeng Aug 4, 2015

[SPARK-4485][SQL] (1) Add broadcast hash outer join, (2) Fix SparkPlanTest #7162

[SPARK-4485][SQL] (1) Add broadcast hash outer join, (2) Fix SparkPlanTest #7162

Conversation

kai-zeng commented Jul 1, 2015

SparkQA commented Jul 1, 2015

Choose a reason for hiding this comment

tarekbecker commented Jul 1, 2015

SparkQA commented Jul 1, 2015

kai-zeng commented Jul 1, 2015

Choose a reason for hiding this comment

SparkQA commented Jul 2, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SparkQA commented Jul 4, 2015

marmbrus commented Jul 6, 2015

Choose a reason for hiding this comment

marmbrus commented Jul 6, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment