[SQL][SPARK-2212]Hash Outer Join #1147

chenghao-intel · 2014-06-20T05:54:15Z

This patch is to support the hash based outer join. Currently, outer join for big relations are resort to BoradcastNestedLoopJoin, which is super slow. This PR will create 2 hash tables for both relations in the same partition, which greatly reduce the table scans.

Here is the testing code that I used:

package org.apache.spark.sql.hive

import org.apache.spark.SparkContext
import org.apache.spark.SparkConf
import org.apache.spark.sql._

case class Record(key: String, value: String)

object JoinTablePrepare extends App {
  import TestHive2._

  val rdd = sparkContext.parallelize((1 to 3000000).map(i => Record(s"${i % 828193}", s"val_$i")))

  runSqlHive("SHOW TABLES")
  runSqlHive("DROP TABLE if exists a")
  runSqlHive("DROP TABLE if exists b")
  runSqlHive("DROP TABLE if exists result")
  rdd.registerAsTable("records")

  runSqlHive("""CREATE TABLE a (key STRING, value STRING)
                 | ROW FORMAT SERDE 
                 | 'org.apache.hadoop.hive.serde2.columnar.LazyBinaryColumnarSerDe' 
                 | STORED AS RCFILE
               """.stripMargin)
  runSqlHive("""CREATE TABLE b (key STRING, value STRING)
                 | ROW FORMAT SERDE 
                 | 'org.apache.hadoop.hive.serde2.columnar.LazyBinaryColumnarSerDe' 
                 | STORED AS RCFILE
               """.stripMargin)
  runSqlHive("""CREATE TABLE result (key STRING, value STRING)
                 | ROW FORMAT SERDE 
                 | 'org.apache.hadoop.hive.serde2.columnar.LazyBinaryColumnarSerDe' 
                 | STORED AS RCFILE
               """.stripMargin)

  hql(s"""from records 
             | insert into table a
             | select key, value
           """.stripMargin)
  hql(s"""from records 
             | insert into table b select key + 100000, value
           """.stripMargin)
}

object JoinTablePerformanceTest extends App {
  import TestHive2._

  hql("SHOW TABLES")
  hql("set spark.sql.shuffle.partitions=20")

  val leftOuterJoin = "insert overwrite table result select a.key, b.value from a left outer join b on a.key=b.key"
  val rightOuterJoin = "insert overwrite table result select a.key, b.value from a right outer join b on a.key=b.key"
  val fullOuterJoin = "insert overwrite table result select a.key, b.value from a full outer join b on a.key=b.key"

  val results = ("LeftOuterJoin", benchmark(leftOuterJoin)) :: ("LeftOuterJoin", benchmark(leftOuterJoin)) :: 
                ("RightOuterJoin", benchmark(rightOuterJoin)) :: ("RightOuterJoin", benchmark(rightOuterJoin)) :: 
                ("FullOuterJoin", benchmark(fullOuterJoin)) :: ("FullOuterJoin", benchmark(fullOuterJoin)) :: Nil
  val explains = hql(s"explain $leftOuterJoin").collect ++ hql(s"explain $rightOuterJoin").collect ++ hql(s"explain $fullOuterJoin").collect
  println(explains.mkString(",\n"))
  results.foreach { case (prompt, result) => {
      println(s"$prompt: took ${result._1} ms (${result._2} records)")
    }
  }

  def benchmark(cmd: String) = {
    val begin = System.currentTimeMillis()
    val result = hql(cmd)
    val end = System.currentTimeMillis()
    val count = hql("select count(1) from result").collect.mkString("")
    ((end - begin), count)
  }
}

And the result as shown below:

[Physical execution plan:],
[InsertIntoHiveTable (MetastoreRelation default, result, None), Map(), true],
[ Project [key#95,value#98]],
[  HashOuterJoin [key#95], [key#97], LeftOuter, None],
[   Exchange (HashPartitioning [key#95], 20)],
[    HiveTableScan [key#95], (MetastoreRelation default, a, None), None],
[   Exchange (HashPartitioning [key#97], 20)],
[    HiveTableScan [key#97,value#98], (MetastoreRelation default, b, None), None],
[Physical execution plan:],
[InsertIntoHiveTable (MetastoreRelation default, result, None), Map(), true],
[ Project [key#102,value#105]],
[  HashOuterJoin [key#102], [key#104], RightOuter, None],
[   Exchange (HashPartitioning [key#102], 20)],
[    HiveTableScan [key#102], (MetastoreRelation default, a, None), None],
[   Exchange (HashPartitioning [key#104], 20)],
[    HiveTableScan [key#104,value#105], (MetastoreRelation default, b, None), None],
[Physical execution plan:],
[InsertIntoHiveTable (MetastoreRelation default, result, None), Map(), true],
[ Project [key#109,value#112]],
[  HashOuterJoin [key#109], [key#111], FullOuter, None],
[   Exchange (HashPartitioning [key#109], 20)],
[    HiveTableScan [key#109], (MetastoreRelation default, a, None), None],
[   Exchange (HashPartitioning [key#111], 20)],
[    HiveTableScan [key#111,value#112], (MetastoreRelation default, b, None), None]
LeftOuterJoin: took 16072 ms ([3000000] records)
LeftOuterJoin: took 14394 ms ([3000000] records)
RightOuterJoin: took 14802 ms ([3000000] records)
RightOuterJoin: took 14747 ms ([3000000] records)
FullOuterJoin: took 17715 ms ([6000000] records)
FullOuterJoin: took 17629 ms ([6000000] records)

Without this PR, the benchmark will run seems never end.

AmplabJenkins · 2014-06-20T05:54:57Z

Merged build triggered.

AmplabJenkins · 2014-06-20T05:55:05Z

Merged build started.

AmplabJenkins · 2014-06-20T05:56:32Z

Merged build finished.

AmplabJenkins · 2014-06-20T05:56:32Z

Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15942/

chenghao-intel · 2014-06-20T06:28:29Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/planning/patterns.scala

@@ -114,48 +94,27 @@ object HashFilteredJoin extends Logging with PredicateHelper {
    (JoinType, Seq[Expression], Seq[Expression], Option[Expression], LogicalPlan, LogicalPlan)

  def unapply(plan: LogicalPlan): Option[ReturnType] = plan match {


Remove the join filter push down stuff, as we have the PushPredicateThroughJoin in Optimizer(Catalyst),

marmbrus · 2014-06-21T19:37:43Z

I think it would be much better if this PR just added support for LeftOuter (and maybe RightOuter too?) to HashJoin instead of rewriting all of our join planning code.

chenghao-intel · 2014-06-23T02:12:25Z

Thank you all for the comments, I will changed some of the code accordingly.
This PR actually contains 2 relevant parts:

Code Re-factor for Join
- Removed FilteredOperation from the patterns.scala, cause the filters(WHERE CONDITION & JOIN CONDITION) has been pushed down via the PushPredicateThroughJoin in logical.Optimizer.scala already. Discard the combination of filters(where and join condition) seems make the join pattern match more clean and simple.
- Pattern matching order is actually very critical for the Join Operator Selection in SparkStrategies.scala, hence I merged the 3 Join Strategies into 1.
- The trait BinaryJoinNode, which can be utilized by HashJoin / SortMergeJoin(will implement soon) / CartesionProduct(InnerJoin) / MapSide Join (Left/Inner/LeftSemi, assume the right table is the build table) for all of the join types; and if we want to add code gen for join condition, only we need to modify is the trait BinaryJoinNode.
Add Outer Join Support for HashJoin
- With BinaryJoinNode, add hash based outer join support is easy.

Sorry, this PR changes lots of code, and make the whole logic not easy to understand.

chenghao-intel · 2014-06-25T00:33:54Z

I am planning to split the PR into 3 smaller ones, 2 of those (#1187 & #1190) are done , and I will update this PR as the 3rd one once the previous have been merged.

chenghao-intel · 2014-06-25T00:44:54Z

BTW, it will be cool if #1163 is merged before I start working on this updating.

AmplabJenkins · 2014-06-27T02:50:25Z

Merged build triggered.

AmplabJenkins · 2014-06-27T02:50:35Z

Merged build started.

AmplabJenkins · 2014-06-27T04:37:41Z

Merged build finished. All automated tests passed.

AmplabJenkins · 2014-06-27T04:37:41Z

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16189/

chenghao-intel · 2014-06-27T05:09:36Z

Thanks @marmbrus merged some of the dependent PRs. I've also updated both code and description for this PR accordingly, some of the issues were listed in the PR description, probably we can discuss those during the code review.

SparkQA · 2014-07-31T04:38:56Z

QA tests have started for PR 1147. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17551/consoleFull

chenghao-intel · 2014-07-31T04:47:21Z

I've updated the code by rebasing the latest master, this PR will greatly improve the outer join performance for big tables. See the local benchmark in the description.

marmbrus · 2014-08-01T01:49:04Z

sql/core/src/test/scala/org/apache/spark/sql/JoinSuite.scala

+  var left: UnresolvedRelation = _
+  var right: UnresolvedRelation = _
+
+  override def beforeEach() {


I think we can remove left, right, beforeEach, afterEach.

marmbrus · 2014-08-01T02:00:19Z

Thanks a lot for working on this! I agree that it would be great to merge this in before 1.1. I'm a little worried about how much memory this is going to require. However, the current implementation is so bad and since I think what you have done here is strictly better, we can probably just merge this in and then revisit.

marmbrus · 2014-08-01T02:01:10Z

I also really like how isolated this change is and the inclusion of the benchmark :)

chenghao-intel · 2014-08-01T07:37:41Z

Thank you @marmbrus I've updated the code as suggested.

SparkQA · 2014-08-01T07:39:30Z

QA tests have started for PR 1147. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17663/consoleFull

SparkQA · 2014-08-01T08:56:05Z

QA results for PR 1147:
- This patch PASSES unit tests.
- This patch merges cleanly
- This patch adds the following public classes (experimental):
case class HashOuterJoin(

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17663/consoleFull

marmbrus · 2014-08-01T18:18:28Z

sql/core/src/main/scala/org/apache/spark/sql/execution/joins.scala

+ */
+object HashOuterJoin {
+  val DUMMY_LIST = Seq[Row](null)
+  val EMPTY_LIST = Seq[Row]()


Please use Seq.empty[Row] inline instead of this variable.

marmbrus · 2014-08-01T18:26:53Z

I'm going to go ahead and merge this so we can have it in 1.1. It would be great if you could address the final readability concerns in a follow up PR.

Thanks again for implementing this :) We have had a few questions about it on the mailing list!

chenghao-intel · 2014-08-02T00:36:47Z

Thank you very much @marmbrus , I will create follow up for the improvement. :)

…for HashOuterJoin This is a follow up for #1147 , this PR will improve the performance about 10% - 15% in my local tests. ``` Before: LeftOuterJoin: took 16750 ms ([3000000] records) LeftOuterJoin: took 15179 ms ([3000000] records) RightOuterJoin: took 15515 ms ([3000000] records) RightOuterJoin: took 15276 ms ([3000000] records) FullOuterJoin: took 19150 ms ([6000000] records) FullOuterJoin: took 18935 ms ([6000000] records) After: LeftOuterJoin: took 15218 ms ([3000000] records) LeftOuterJoin: took 13503 ms ([3000000] records) RightOuterJoin: took 13663 ms ([3000000] records) RightOuterJoin: took 14025 ms ([3000000] records) FullOuterJoin: took 16624 ms ([6000000] records) FullOuterJoin: took 16578 ms ([6000000] records) ``` Besides the performance improvement, I also do some clean up as suggested in #1147 Author: Cheng Hao <hao.cheng@intel.com> Closes #1765 from chenghao-intel/hash_outer_join_fixing and squashes the following commits: ab1f9e0 [Cheng Hao] Reduce the memory copy while building the hashmap (cherry picked from commit 5d54d71) Signed-off-by: Michael Armbrust <michael@databricks.com>

…for HashOuterJoin This is a follow up for #1147 , this PR will improve the performance about 10% - 15% in my local tests. ``` Before: LeftOuterJoin: took 16750 ms ([3000000] records) LeftOuterJoin: took 15179 ms ([3000000] records) RightOuterJoin: took 15515 ms ([3000000] records) RightOuterJoin: took 15276 ms ([3000000] records) FullOuterJoin: took 19150 ms ([6000000] records) FullOuterJoin: took 18935 ms ([6000000] records) After: LeftOuterJoin: took 15218 ms ([3000000] records) LeftOuterJoin: took 13503 ms ([3000000] records) RightOuterJoin: took 13663 ms ([3000000] records) RightOuterJoin: took 14025 ms ([3000000] records) FullOuterJoin: took 16624 ms ([6000000] records) FullOuterJoin: took 16578 ms ([6000000] records) ``` Besides the performance improvement, I also do some clean up as suggested in #1147 Author: Cheng Hao <hao.cheng@intel.com> Closes #1765 from chenghao-intel/hash_outer_join_fixing and squashes the following commits: ab1f9e0 [Cheng Hao] Reduce the memory copy while building the hashmap

This patch is to support the hash based outer join. Currently, outer join for big relations are resort to `BoradcastNestedLoopJoin`, which is super slow. This PR will create 2 hash tables for both relations in the same partition, which greatly reduce the table scans. Here is the testing code that I used: ``` package org.apache.spark.sql.hive import org.apache.spark.SparkContext import org.apache.spark.SparkConf import org.apache.spark.sql._ case class Record(key: String, value: String) object JoinTablePrepare extends App { import TestHive2._ val rdd = sparkContext.parallelize((1 to 3000000).map(i => Record(s"${i % 828193}", s"val_$i"))) runSqlHive("SHOW TABLES") runSqlHive("DROP TABLE if exists a") runSqlHive("DROP TABLE if exists b") runSqlHive("DROP TABLE if exists result") rdd.registerAsTable("records") runSqlHive("""CREATE TABLE a (key STRING, value STRING) | ROW FORMAT SERDE | 'org.apache.hadoop.hive.serde2.columnar.LazyBinaryColumnarSerDe' | STORED AS RCFILE """.stripMargin) runSqlHive("""CREATE TABLE b (key STRING, value STRING) | ROW FORMAT SERDE | 'org.apache.hadoop.hive.serde2.columnar.LazyBinaryColumnarSerDe' | STORED AS RCFILE """.stripMargin) runSqlHive("""CREATE TABLE result (key STRING, value STRING) | ROW FORMAT SERDE | 'org.apache.hadoop.hive.serde2.columnar.LazyBinaryColumnarSerDe' | STORED AS RCFILE """.stripMargin) hql(s"""from records | insert into table a | select key, value """.stripMargin) hql(s"""from records | insert into table b select key + 100000, value """.stripMargin) } object JoinTablePerformanceTest extends App { import TestHive2._ hql("SHOW TABLES") hql("set spark.sql.shuffle.partitions=20") val leftOuterJoin = "insert overwrite table result select a.key, b.value from a left outer join b on a.key=b.key" val rightOuterJoin = "insert overwrite table result select a.key, b.value from a right outer join b on a.key=b.key" val fullOuterJoin = "insert overwrite table result select a.key, b.value from a full outer join b on a.key=b.key" val results = ("LeftOuterJoin", benchmark(leftOuterJoin)) :: ("LeftOuterJoin", benchmark(leftOuterJoin)) :: ("RightOuterJoin", benchmark(rightOuterJoin)) :: ("RightOuterJoin", benchmark(rightOuterJoin)) :: ("FullOuterJoin", benchmark(fullOuterJoin)) :: ("FullOuterJoin", benchmark(fullOuterJoin)) :: Nil val explains = hql(s"explain $leftOuterJoin").collect ++ hql(s"explain $rightOuterJoin").collect ++ hql(s"explain $fullOuterJoin").collect println(explains.mkString(",\n")) results.foreach { case (prompt, result) => { println(s"$prompt: took ${result._1} ms (${result._2} records)") } } def benchmark(cmd: String) = { val begin = System.currentTimeMillis() val result = hql(cmd) val end = System.currentTimeMillis() val count = hql("select count(1) from result").collect.mkString("") ((end - begin), count) } } ``` And the result as shown below: ``` [Physical execution plan:], [InsertIntoHiveTable (MetastoreRelation default, result, None), Map(), true], [ Project [key#95,value#98]], [ HashOuterJoin [key#95], [key#97], LeftOuter, None], [ Exchange (HashPartitioning [key#95], 20)], [ HiveTableScan [key#95], (MetastoreRelation default, a, None), None], [ Exchange (HashPartitioning [key#97], 20)], [ HiveTableScan [key#97,value#98], (MetastoreRelation default, b, None), None], [Physical execution plan:], [InsertIntoHiveTable (MetastoreRelation default, result, None), Map(), true], [ Project [key#102,value#105]], [ HashOuterJoin [key#102], [key#104], RightOuter, None], [ Exchange (HashPartitioning [key#102], 20)], [ HiveTableScan [key#102], (MetastoreRelation default, a, None), None], [ Exchange (HashPartitioning [key#104], 20)], [ HiveTableScan [key#104,value#105], (MetastoreRelation default, b, None), None], [Physical execution plan:], [InsertIntoHiveTable (MetastoreRelation default, result, None), Map(), true], [ Project [key#109,value#112]], [ HashOuterJoin [key#109], [key#111], FullOuter, None], [ Exchange (HashPartitioning [key#109], 20)], [ HiveTableScan [key#109], (MetastoreRelation default, a, None), None], [ Exchange (HashPartitioning [key#111], 20)], [ HiveTableScan [key#111,value#112], (MetastoreRelation default, b, None), None] LeftOuterJoin: took 16072 ms ([3000000] records) LeftOuterJoin: took 14394 ms ([3000000] records) RightOuterJoin: took 14802 ms ([3000000] records) RightOuterJoin: took 14747 ms ([3000000] records) FullOuterJoin: took 17715 ms ([6000000] records) FullOuterJoin: took 17629 ms ([6000000] records) ``` Without this PR, the benchmark will run seems never end. Author: Cheng Hao <hao.cheng@intel.com> Closes apache#1147 from chenghao-intel/hash_based_outer_join and squashes the following commits: 65c599e [Cheng Hao] Fix issues with the community comments 72b1394 [Cheng Hao] Fix bug of stale value in joinedRow 55baef7 [Cheng Hao] Add HashOuterJoin

…for HashOuterJoin This is a follow up for apache#1147 , this PR will improve the performance about 10% - 15% in my local tests. ``` Before: LeftOuterJoin: took 16750 ms ([3000000] records) LeftOuterJoin: took 15179 ms ([3000000] records) RightOuterJoin: took 15515 ms ([3000000] records) RightOuterJoin: took 15276 ms ([3000000] records) FullOuterJoin: took 19150 ms ([6000000] records) FullOuterJoin: took 18935 ms ([6000000] records) After: LeftOuterJoin: took 15218 ms ([3000000] records) LeftOuterJoin: took 13503 ms ([3000000] records) RightOuterJoin: took 13663 ms ([3000000] records) RightOuterJoin: took 14025 ms ([3000000] records) FullOuterJoin: took 16624 ms ([6000000] records) FullOuterJoin: took 16578 ms ([6000000] records) ``` Besides the performance improvement, I also do some clean up as suggested in apache#1147 Author: Cheng Hao <hao.cheng@intel.com> Closes apache#1765 from chenghao-intel/hash_outer_join_fixing and squashes the following commits: ab1f9e0 [Cheng Hao] Reduce the memory copy while building the hashmap

chenghao-intel reviewed Jun 20, 2014
View reviewed changes

chenghao-intel changed the title ~~[SQL][SPARK-2212]HashJoin(Shuffled)~~ [SQL][SPARK-2212]Hash Outer Join Jun 27, 2014

Add HashOuterJoin

55baef7

marmbrus reviewed Aug 1, 2014
View reviewed changes

Fix issues with the community comments

65c599e

marmbrus reviewed Aug 1, 2014
View reviewed changes

asfgit closed this in 4415722 Aug 1, 2014

chenghao-intel deleted the hash_based_outer_join branch August 4, 2014 03:11

chenghao-intel mentioned this pull request Aug 4, 2014

[SQL] [SPARK-2826] Reduce the memory copy while building the hashmap for HashOuterJoin #1765

Closed

wangyum pushed a commit that referenced this pull request May 26, 2023

[CARMEL-6375] Support Potential Skewed Operator Tagging (#1147)

c25fdd6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SQL][SPARK-2212]Hash Outer Join #1147

[SQL][SPARK-2212]Hash Outer Join #1147

chenghao-intel commented Jun 20, 2014

AmplabJenkins commented Jun 20, 2014

AmplabJenkins commented Jun 20, 2014

AmplabJenkins commented Jun 20, 2014

AmplabJenkins commented Jun 20, 2014

chenghao-intel Jun 20, 2014

marmbrus commented Jun 21, 2014

chenghao-intel commented Jun 23, 2014

chenghao-intel commented Jun 25, 2014

chenghao-intel commented Jun 25, 2014

AmplabJenkins commented Jun 27, 2014

AmplabJenkins commented Jun 27, 2014

AmplabJenkins commented Jun 27, 2014

AmplabJenkins commented Jun 27, 2014

chenghao-intel commented Jun 27, 2014

SparkQA commented Jul 31, 2014

chenghao-intel commented Jul 31, 2014

marmbrus Aug 1, 2014

marmbrus commented Aug 1, 2014

marmbrus commented Aug 1, 2014

chenghao-intel commented Aug 1, 2014

SparkQA commented Aug 1, 2014

SparkQA commented Aug 1, 2014

marmbrus Aug 1, 2014

marmbrus commented Aug 1, 2014

chenghao-intel commented Aug 2, 2014

		@@ -114,48 +94,27 @@ object HashFilteredJoin extends Logging with PredicateHelper {
		(JoinType, Seq[Expression], Seq[Expression], Option[Expression], LogicalPlan, LogicalPlan)

		def unapply(plan: LogicalPlan): Option[ReturnType] = plan match {

[SQL][SPARK-2212]Hash Outer Join #1147

[SQL][SPARK-2212]Hash Outer Join #1147

Conversation

chenghao-intel commented Jun 20, 2014

AmplabJenkins commented Jun 20, 2014

AmplabJenkins commented Jun 20, 2014

AmplabJenkins commented Jun 20, 2014

AmplabJenkins commented Jun 20, 2014

chenghao-intel Jun 20, 2014

Choose a reason for hiding this comment

marmbrus commented Jun 21, 2014

chenghao-intel commented Jun 23, 2014

chenghao-intel commented Jun 25, 2014

chenghao-intel commented Jun 25, 2014

AmplabJenkins commented Jun 27, 2014

AmplabJenkins commented Jun 27, 2014

AmplabJenkins commented Jun 27, 2014

AmplabJenkins commented Jun 27, 2014

chenghao-intel commented Jun 27, 2014

SparkQA commented Jul 31, 2014

chenghao-intel commented Jul 31, 2014

marmbrus Aug 1, 2014

Choose a reason for hiding this comment

marmbrus commented Aug 1, 2014

marmbrus commented Aug 1, 2014

chenghao-intel commented Aug 1, 2014

SparkQA commented Aug 1, 2014

SparkQA commented Aug 1, 2014

marmbrus Aug 1, 2014

Choose a reason for hiding this comment

marmbrus commented Aug 1, 2014

chenghao-intel commented Aug 2, 2014