Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-4485][SQL] (1) Add broadcast hash outer join, (2) Fix SparkPlanTest #7162

Closed
wants to merge 4 commits into from

Conversation

kai-zeng
Copy link

@kai-zeng kai-zeng commented Jul 1, 2015

This pull request
(1) extracts common functions used by hash outer joins and put it in interface HashOuterJoin
(2) adds ShuffledHashOuterJoin and BroadcastHashOuterJoin
(3) adds test cases for shuffled and broadcast hash outer join
(3) makes SparkPlanTest to support binary or more complex operators, and fixes bugs in plan composition in SparkPlanTest

@kai-zeng kai-zeng changed the title (1) Add broadcast hash outer join, (2) Fix SparkPlanTest [SQL] (1) Add broadcast hash outer join, (2) Fix SparkPlanTest Jul 1, 2015
@SparkQA
Copy link

SparkQA commented Jul 1, 2015

Test build #36283 has started for PR 7162 at commit b5a4efa.

* @param expectedAnswer the expected result in a [[Seq]] of [[Product]]s.
*/
protected def checkAnswer[A <: Product : TypeTag](
left: DataFrame,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Style: these parameters need to be indented more.

@tarekbecker
Copy link
Contributor

I assume there is a Jira ticket for this, can you add it to the title?

@kai-zeng kai-zeng changed the title [SQL] (1) Add broadcast hash outer join, (2) Fix SparkPlanTest [SPARK-4485][SQL] (1) Add broadcast hash outer join, (2) Fix SparkPlanTest Jul 1, 2015
@SparkQA
Copy link

SparkQA commented Jul 1, 2015

Test build #36285 has started for PR 7162 at commit dc5127e.

@kai-zeng
Copy link
Author

kai-zeng commented Jul 1, 2015

cc @marmbrus

joins.HashOuterJoin(
leftKeys, rightKeys, joinType, condition, planLater(left), planLater(right)) :: Nil
joinType match {
case LeftOuter if sqlContext.conf.autoBroadcastJoinThreshold > 0 &&
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you use CanBroadcast here

@SparkQA
Copy link

SparkQA commented Jul 2, 2015

Test build #36360 has started for PR 7162 at commit 14e4bf8.

val input: Array[InternalRow] = buildPlan.execute().map(_.copy()).collect()
// buildHashTable uses code-generated rows as keys, which are not serializable
val hashed = new GeneralHashedRelation(
buildHashTable(input.iterator, newProjection(buildKeys, buildPlan.output)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you need to use new InterpretedProjection here, otherwise you try to broadcast code-generated SpecificRows, which fails when in non-local mode. See: #7213.

@davies / @rxin , I'm now officially in favor of removing GenerateProjection.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@marmbrus Yeah, that's why I used GeneralHashedRelation to wrap the hash table. Which way do you think is better?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This still fails for me when I run it on a real cluster. I'd just change this to buildHashTable(input.iterator, new InterpretedProjection(buildKeys, buildPlan.output))) or we might even just change newProjection to always use InterpretedProjection.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure

@SparkQA
Copy link

SparkQA commented Jul 4, 2015

Test build #36517 has started for PR 7162 at commit 3742359.

@marmbrus
Copy link
Contributor

marmbrus commented Jul 6, 2015

Here are some performance results:

screen shot 2015-07-06 at 12 46 32 pm

Pretty significant speedups when one of the tables is much larger.

object BroadcastHashOuterJoin {

private val broadcastHashOuterJoinExecutionContext = ExecutionContext.fromExecutorService(
ThreadUtils.newDaemonCachedThreadPool("broadcast-hash-outer-join", 128))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would probably be reasonable to have a single threadpool that we share for all broadcasting.

@marmbrus
Copy link
Contributor

marmbrus commented Jul 6, 2015

I'm going to go ahead and merge this to unblock other work on the plan tests. We can take care of threadpool consolidation in a followup. Thanks!

@asfgit asfgit closed this in 2471c0b Jul 6, 2015
case plan: SparkPlan =>
val inputMap = plan.children.flatMap(_.output).zipWithIndex.map {
case (a, i) =>
(a.name, BoundReference(i, a.dataType, a.nullable))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kai-zeng, why did you remove this code which creates BoundReferences? In the other half of the diff below, it looks like the new code is only mapping from the attribute's name back to the attribute itself rather than binding the reference. This change caused a test case in my sorting patch to fail.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After changing this new code to continue to generate BoundReferences, the test in this patch fails:

[info]    == Exception ==
[info]    org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 1888.0 failed 1 times, most recent failure: Lost task 0.0 in stage 1888.0 (TID 6596, localhost): java.lang.ArrayIndexOutOfBoundsException: 2
[info]      at org.apache.spark.sql.catalyst.expressions.ArrayBackedRow$class.apply(rows.scala:88)
[info]      at org.apache.spark.sql.catalyst.expressions.GenericInternalRow.apply(rows.scala:144)
[info]      at org.apache.spark.sql.Row$class.isNullAt(Row.scala:182)
[info]      at org.apache.spark.sql.catalyst.InternalRow.isNullAt(InternalRow.scala:28)
[info]      at SC$SpecificProjection.apply(Unknown Source)
[info]      at org.apache.spark.sql.execution.Exchange$$anonfun$doExecute$1$$anonfun$6$$anonfun$apply$3.apply(Exchange.scala:166)
[info]      at org.apache.spark.sql.execution.Exchange$$anonfun$doExecute$1$$anonfun$6$$anonfun$apply$3.apply(Exchange.scala:166)
[info]      at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
[info]      at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.insertAll(BypassMergeSortShuffleWriter.java:119)
[info]      at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:73)
[info]      at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:70)
[info]      at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
[info]      at org.apache.spark.scheduler.Task.run(Task.scala:70)
[info]      at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
[info]      at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
[info]      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
[info]      at java.lang.Thread.run(Thread.java:745)

/cc @marmbrus

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JoshRosen Hi Josh. I still prefer my way of resolving attributes, for two reasons:
(1) References are bound in each operator, that's certainly something we should test. So in my opinion, we shouldn't bind the references manually in the test suite.
(2) Manually binding the references isn't good for operators with two or more inputs. For these operators, there actually could be different ways to binding references depending on which implementation is used. The old implementation SparkPlanTest cannot handle operators with two or more inputs. We can certainly fix the old implementation by fix binding for binary operators, but that's gonna be tedious later, say if we are gona change some implementation

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kai-zeng, I ended up discussing offline with Michael and reached the same conclusion. Thanks for your help!

JoshRosen added a commit to JoshRosen/spark that referenced this pull request Jul 6, 2015
JoshRosen added a commit to JoshRosen/spark that referenced this pull request Jul 7, 2015
JoshRosen added a commit that referenced this pull request Jul 11, 2015
This patch adds a cache-friendly external sorter which operates on serialized bytes and uses this sorter to implement a new sort operator for Spark SQL and DataFrames.

### Overview of the new sorter

The new sorter design is inspired by [Alphasort](http://research.microsoft.com/pubs/68249/alphasort.doc) and implements a key-prefix optimization in order to improve the cache friendliness of the sort.  In naive sort implementations, the sorting algorithm operates on an array of record pointers.  To compare two records for ordering, the sorter must dereference these pointers, which likely involves random memory access, then compare the objects themselves.

![image](https://cloud.githubusercontent.com/assets/50748/8611390/3b1402ae-2675-11e5-8308-1a10bf347e6e.png)

In a key-prefix sort, the sort operates on an array which stores the record pointer alongside a prefix of the record's key. When comparing two records for ordering, the sorter first compares the the stored key prefixes. If the ordering can be determined from the key prefixes (i.e. the prefixes are unequal), then the sort can avoid directly comparing the records, avoiding random memory accesses and full record comparisons. For example, if we're sorting a list of strings then we can store the first 8 bytes of the UTF-8 encoded string as the key-prefix and can perform unsigned byte-at-a-time comparisons to determine the ordering of strings based on their prefixes, only resorting to full comparisons for strings that share a common prefix.  In cases where the sort key can fit entirely in the space allotted for the key prefix (e.g. the sorting key is an integer), we completely avoid direct record comparison.

In this patch's implementation of key-prefix sorting, our sorter's internal array stores a 64-bit long and 64-bit pointer for each record being sorted. The key prefixes are generated by the user when inserting records into the sorter, which uses a user-defined comparison function for comparing them.  The `PrefixComparators` object implements a set of comparators for many common types, including primitive numeric types and UTF-8 strings.

The actual sorting is implemented by `UnsafeInMemorySorter`.  Most consumers will not use this directly, but instead will use `UnsafeExternalSorter`, a class which implements a sort that can spill to disk in response to memory pressure.  Internally, `UnsafeExternalSorter` creates `UnsafeInMemorySorters` to perform sorting and uses `UnsafeSortSpillReader/Writer` to spill and read back runs of sorted records and `UnsafeSortSpillMerger` to merge multiple sorted spills into a single sorted iterator.  This external sorter integrates with Spark's existing ShuffleMemoryManager for controlling spilling.

Many parts of this sorter's design are based on / copied from the more specialized external sort implementation that I designed for the new UnsafeShuffleManager write path; see #5868 for more details on that patch.

### Sorting rows in Spark SQL

For now, `UnsafeExternalSorter` is only used by Spark SQL, which uses it to implement a new sort operator, `UnsafeExternalSort`.  This sort operator uses a SQL-specific class called `UnsafeExternalRowSorter` that configures an `UnsafeExternalSorter` to use prefix generators and comparators that operate on rows encoded in the UnsafeRow format that was designed for Project Tungsten.

I used some interesting unit-testing techniques to test this patch's SQL-specific components.  `UnsafeExternalSortSuite` uses the SQL random data generators introduced in #7176 to test the UnsafeSort operator with all atomic types both with and without nullability and in both ascending and descending sort orders.  `PrefixComparatorsSuite` contains a cool use of ScalaCheck + ScalaTest's `GeneratorDrivenPropertyChecks` in order to test UTF8String prefix comparison.

### Misc. additional improvements made in this patch

This patch made several miscellaneous improvements to related code in Spark SQL:

- The logic for selecting physical sort operator implementations, which was partially duplicated in both `Exchange` and `SparkStrategies, has now been consolidated into a `getSortOperator()` helper function in `SparkStrategies`.
- The `SparkPlanTest` unit testing helper trait has been extended with new methods for comparing the output produced by two different physical plans. This makes it easy to write tests which assert that two physical operator implementations should produce the same output.  I also added a method for disabling the implicit sorting of outputs prior to comparing them, a change which is necessary in order to be able to write proper SparkPlan tests for sort operators.

### Tasks deferred to followup patches

While most of this patch's features are reasonably well-tested and complete, there are a number of tasks that are intentionally being deferred to followup patches:

- Add tests which mock the ShuffleMemoryManager to check that memory pressure properly triggers spilling (there are examples of this type of test in #5868).
- Add tests to ensure that spill files are properly cleaned up after errors.  I'd like to do this in the context of a patch which introduces more general metrics for ensuring proper cleanup of tasks' temporary files; see https://issues.apache.org/jira/browse/SPARK-8966 for more details.
- Metrics integration: there are some open questions regarding how to track / report spill metrics for non-shuffle operations, so I've deferred most of the IO / shuffle metrics integration for now.
- Performance profiling.

<!-- Reviewable:start -->
[<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/spark/6444)
<!-- Reviewable:end -->

Author: Josh Rosen <joshrosen@databricks.com>

Closes #6444 from JoshRosen/sql-external-sort and squashes the following commits:

6beb467 [Josh Rosen] Remove a bunch of overloaded methods to avoid default args. issue
2bbac9c [Josh Rosen] Merge remote-tracking branch 'origin/master' into sql-external-sort
35dad9f [Josh Rosen] Make sortAnswers = false the default in SparkPlanTest
5135200 [Josh Rosen] Fix spill reading for large rows; add test
2f48777 [Josh Rosen] Add test and fix bug for sorting empty arrays
d1e28bc [Josh Rosen] Merge remote-tracking branch 'origin/master' into sql-external-sort
cd05866 [Josh Rosen] Fix scalastyle
3947fc1 [Josh Rosen] Merge remote-tracking branch 'origin/master' into sql-external-sort
d13ac55 [Josh Rosen] Hacky approach to copying of UnsafeRows for sort followed by limit.
845bea3 [Josh Rosen] Remove unnecessary zeroing of row conversion buffer
c56ec18 [Josh Rosen] Clean up final row copying code.
d31f180 [Josh Rosen] Re-enable NullType sorting test now that SPARK-8868 is fixed
844f4ca [Josh Rosen] Merge remote-tracking branch 'origin/master' into sql-external-sort
293f109 [Josh Rosen] Add missing license header.
f99a612 [Josh Rosen] Fix bugs in string prefix comparison.
9d00afc [Josh Rosen] Clean up prefix comparators for integral types
88aff18 [Josh Rosen] NULL_PREFIX has to be negative infinity for floating point types
613e16f [Josh Rosen] Test with larger data.
1d7ffaa [Josh Rosen] Somewhat hacky fix for descending sorts
08701e7 [Josh Rosen] Fix prefix comparison of null primitives.
b86e684 [Josh Rosen] Set global = true in UnsafeExternalSortSuite.
1c7bad8 [Josh Rosen] Make sorting of answers explicit in SparkPlanTest.checkAnswer().
b81a920 [Josh Rosen] Temporarily enable only the passing sort tests
5d6109d [Josh Rosen] Fix inconsistent handling / encoding of record lengths.
87b6ed9 [Josh Rosen] Fix critical issues in test which led to false negatives.
8d7fbe7 [Josh Rosen] Fixes to multiple spilling-related bugs.
82e21c1 [Josh Rosen] Force spilling in UnsafeExternalSortSuite.
88b72db [Josh Rosen] Test ascending and descending sort orders.
f27be09 [Josh Rosen] Fix tests by binding attributes.
0a79d39 [Josh Rosen] Revert "Undo part of a SparkPlanTest change in #7162 that broke my test."
7c3c864 [Josh Rosen] Undo part of a SparkPlanTest change in #7162 that broke my test.
9969c14 [Josh Rosen] Merge remote-tracking branch 'origin/master' into sql-external-sort
5822e6f [Josh Rosen] Fix test compilation issue
939f824 [Josh Rosen] Remove code gen experiment.
0dfe919 [Josh Rosen] Implement prefix sort for strings (albeit inefficiently).
66a813e [Josh Rosen] Prefix comparators for float and double
b310c88 [Josh Rosen] Integrate prefix comparators for Int and Long (others coming soon)
95058d9 [Josh Rosen] Add missing SortPrefixUtils file
4c37ba6 [Josh Rosen] Add tests for sorting on all primitive types.
6890863 [Josh Rosen] Fix memory leak on empty inputs.
d246e29 [Josh Rosen] Fix consideration of column types when choosing sort implementation.
6b156fb [Josh Rosen] Some WIP work on prefix comparison.
7f875f9 [Josh Rosen] Commit failing test demonstrating bug in handling objects in spills
41b8881 [Josh Rosen] Get UnsafeInMemorySorterSuite to pass (WIP)
90c2b6a [Josh Rosen] Update test name
6d6a1e6 [Josh Rosen] Centralize logic for picking sort operator implementations
9869ec2 [Josh Rosen] Clean up Exchange code a bit
82bb0ec [Josh Rosen] Fix IntelliJ complaint due to negated if condition
1db845a [Josh Rosen] Many more changes to harmonize with shuffle sorter
ebf9eea [Josh Rosen] Harmonization with shuffle's unsafe sorter
206bfa2 [Josh Rosen] Add some missing newlines at the ends of files
26c8931 [Josh Rosen] Back out some Hive changes that aren't needed anymore
62f0bb8 [Josh Rosen] Update to reflect SparkPlanTest changes
21d7d93 [Josh Rosen] Back out of BlockObjectWriter change
7eafecf [Josh Rosen] Port test to SparkPlanTest
d468a88 [Josh Rosen] Update for InternalRow refactoring
269cf86 [Josh Rosen] Back out SMJ operator change; isolate changes to selection of sort op.
1b841ca [Josh Rosen] WIP towards copying
b420a71 [Josh Rosen] Move most of the existing SMJ code into Java.
dfdb93f [Josh Rosen] SparkFunSuite change
73cc761 [Josh Rosen] Fix whitespace
9cc98f5 [Josh Rosen] Move more code to Java; fix bugs in UnsafeRowConverter length type.
c8792de [Josh Rosen] Remove some debug logging
dda6752 [Josh Rosen] Commit some missing code from an old git stash.
58f36d0 [Josh Rosen] Merge in a sketch of a unit test for the new sorter (now failing).
2bd8c9a [Josh Rosen] Import my original tests and get them to pass.
d5d3106 [Josh Rosen] WIP towards external sorter for Spark SQL.
import org.apache.spark.sql.catalyst.plans.{FullOuter, JoinType, LeftOuter, RightOuter}
import org.apache.spark.sql.execution.{BinaryNode, SparkPlan}

import scala.collection.JavaConversions._
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kai-zeng I'm in the process of refactoring some parts of this file as part of another PR of mine and wanted to briefly call out two minor performance considerations that I noticed while looking at this code.

The first is the use of JavaConversions here, which allows for implicit conversion between Java and Scala classes. This leads to the creation of an unnecessary per-row MapWrapper in FullOuter.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep. I agree with your first concern. It is really necessary to refactor the outer join as now we have both shuffled and broadcast outer join.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
6 participants