[SPARK-24556][SQL] Always rewrite output partitioning in ReusedExchangeExec and InMemoryTableScanExec #21564

yucai · 2018-06-14T08:49:07Z

What changes were proposed in this pull request?

Currently, ReusedExchange and InMemoryTableScanExec only rewrite output partitioning if child's partitioning is HashPartitioning and do nothing for other partitioning, e.g., RangePartitioning. We should always rewrite it, otherwise, unnecessary shuffle could be introduced like https://issues.apache.org/jira/browse/SPARK-24556.

How was this patch tested?

Add new tests.

…also when child's partitioning is RangePartitioning

yucai · 2018-06-14T08:56:27Z

@cloud-fan @viirya @gatorsmile , could you help review this?

viirya · 2018-06-14T09:00:09Z

sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/InMemoryTableScanExec.scala

@@ -170,6 +170,8 @@ case class InMemoryTableScanExec(
  override def outputPartitioning: Partitioning = {
    relation.cachedPlan.outputPartitioning match {
      case h: HashPartitioning => updateAttribute(h).asInstanceOf[HashPartitioning]
+      case r: RangePartitioning =>
+        r.copy(ordering = r.ordering.map(updateAttribute(_).asInstanceOf[SortOrder]))


Not sure why RangePartitioning isn't included at first.

mgaido91 · 2018-06-14T09:03:31Z

sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/InMemoryTableScanExec.scala

@@ -170,6 +170,8 @@ case class InMemoryTableScanExec(
  override def outputPartitioning: Partitioning = {
    relation.cachedPlan.outputPartitioning match {
      case h: HashPartitioning => updateAttribute(h).asInstanceOf[HashPartitioning]
+      case r: RangePartitioning =>
+        r.copy(ordering = r.ordering.map(updateAttribute(_).asInstanceOf[SortOrder]))


why not just updateAttribute(r)?

Moreover, in order to avoid the same issue in the future with other cases, have you considered doing something like:

updateAttribute(relation.cachedPlan.outputPartitioning)

?

Not all Partitioning are Expression. Only HashPartitioning and RangePartitioning are.

Good suggestion, thanks @mgaido91.

@viirya Do we need consider below:
PartitioningCollection in InMemoryTableScanExec.outputPartitioning, which is also Expression?
PartitioningCollection and BroadcastPartitioning in ReusedExchangeExec.outputPartitioning?

yes, you're right @viirya , thanks. Then, I'd propose something like:

relation.cachedPlan.outputPartitioning match { case e: Expression => updateAttribute(e) case other => other }

what do you think?

I think PartitioningCollection is for an operator that has multiple children. BroadcastPartitioning is not Expression.

Hmm, HashPartitioning and RangePartitioning can affect later sorting and shuffle. But for BroadcastPartitioning, seems to me no too much benefit.

PartitioningCollection should be considered. Like below case:

spark.conf.set("spark.sql.autoBroadcastJoinThreshold", -1) spark.conf.set("spark.sql.codegen.wholeStage", false) val df1 = Seq(1 -> "a", 3 -> "c", 2 -> "b").toDF("i", "j").as("t1") val df2 = Seq(1 -> "a", 3 -> "c", 2 -> "b").toDF("m", "n").as("t2") val d = df1.join(df2, $"t1.i" === $"t2.m") d.cache val d1 = d.as("t3") val d2 = d.as("t4") d1.join(d2, $"t3.i" === $"t4.i").explain

SortMergeJoin [i#5], [i#54], Inner :- InMemoryTableScan [i#5, j#6, m#15, n#16] : +- InMemoryRelation [i#5, j#6, m#15, n#16], CachedRDDBuilder : +- SortMergeJoin [i#5], [m#15], Inner : :- Sort [i#5 ASC NULLS FIRST], false, 0 : : +- Exchange hashpartitioning(i#5, 10) : : +- LocalTableScan [i#5, j#6] : +- Sort [m#15 ASC NULLS FIRST], false, 0 : +- Exchange hashpartitioning(m#15, 10) : +- LocalTableScan [m#15, n#16] +- Sort [i#54 ASC NULLS FIRST], false, 0 +- Exchange hashpartitioning(i#54, 10) +- InMemoryTableScan [i#54, j#55, m#58, n#59] +- InMemoryRelation [i#54, j#55, m#58, n#59], CachedRDDBuilder +- SortMergeJoin [i#5], [m#15], Inner :- Sort [i#5 ASC NULLS FIRST], false, 0 : +- Exchange hashpartitioning(i#5, 10) : +- LocalTableScan [i#5, j#6] +- Sort [m#15 ASC NULLS FIRST], false, 0 +- Exchange hashpartitioning(m#15, 10) +- LocalTableScan [m#15, n#16]

Exchange hashpartitioning(i#54, 10) is extra shuffle.

How do you think?

For PartitioningCollection, I think it is harder to treat it like HashPartitioning and RangePartitioning when replacing attributes.

In above example, PartitioningCollection contains HashPartitioning(i#5) and HashPartitioning(m#15), the output of InMemoryRelation is [i#54, j#55, m#58, n#59]. Can we still replace attributes based on the location of attribute in output?

@viirya From updateAttribute, relation.cachedPlan.output and relation.output one to one.

private def updateAttribute(expr: Expression): Expression = { .... val attrMap = AttributeMap(relation.cachedPlan.output.zip(relation.output)) .... }

It means "[i#54, j#55, m#58, n#59]" corresponds to "[i#5, j#6, m#15, n#16]", so we can always replace HashPartitioning(i#5) to HashPartitioning(i#54).
Any idea?

Looks correct.

SparkQA · 2018-06-14T12:29:40Z

Test build #91829 has finished for PR 21564 at commit f37139b.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

yucai · 2018-06-14T15:39:39Z

@viirya I thinkPartitioningCollection should be considered. Like below case:

spark.conf.set("spark.sql.autoBroadcastJoinThreshold", -1)
spark.conf.set("spark.sql.codegen.wholeStage", false)
val df1 = Seq(1 -> "a", 3 -> "c", 2 -> "b").toDF("i", "j").as("t1")
val df2 = Seq(1 -> "a", 3 -> "c", 2 -> "b").toDF("m", "n").as("t2")
val d = df1.join(df2, $"t1.i" === $"t2.m")
d.cache
val d1 = d.as("t3")
val d2 = d.as("t4")
d1.join(d2, $"t3.i" === $"t4.i").explain

SortMergeJoin [i#5], [i#54], Inner
:- InMemoryTableScan [i#5, j#6, m#15, n#16]
:     +- InMemoryRelation [i#5, j#6, m#15, n#16], CachedRDDBuilder
:           +- SortMergeJoin [i#5], [m#15], Inner
:              :- Sort [i#5 ASC NULLS FIRST], false, 0
:              :  +- Exchange hashpartitioning(i#5, 10)
:              :     +- LocalTableScan [i#5, j#6]
:              +- Sort [m#15 ASC NULLS FIRST], false, 0
:                 +- Exchange hashpartitioning(m#15, 10)
:                    +- LocalTableScan [m#15, n#16]
+- Sort [i#54 ASC NULLS FIRST], false, 0
   +- Exchange hashpartitioning(i#54, 10)
      +- InMemoryTableScan [i#54, j#55, m#58, n#59]
            +- InMemoryRelation [i#54, j#55, m#58, n#59], CachedRDDBuilder
                  +- SortMergeJoin [i#5], [m#15], Inner
                     :- Sort [i#5 ASC NULLS FIRST], false, 0
                     :  +- Exchange hashpartitioning(i#5, 10)
                     :     +- LocalTableScan [i#5, j#6]
                     +- Sort [m#15 ASC NULLS FIRST], false, 0
                        +- Exchange hashpartitioning(m#15, 10)
                           +- LocalTableScan [m#15, n#16]

Exchange hashpartitioning(i#54, 10) is extra shuffle.

yucai · 2018-06-14T15:40:45Z

@mgaido91 I update the codes as per your suggestion, thanks!

mgaido91 · 2018-06-14T15:56:27Z

@yucai thanks, can you please also add more UTs in order to cover all the possible cases? Thanks.

SparkQA · 2018-06-14T19:15:43Z

Test build #91855 has finished for PR 21564 at commit 0ef99cc.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2018-06-14T19:28:05Z

Test build #91856 has finished for PR 21564 at commit 405ba94.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2018-06-15T23:34:19Z

sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala

@@ -2270,4 +2270,15 @@ class DataFrameSuite extends QueryTest with SharedSQLContext {
    val mapWithBinaryKey = map(lit(Array[Byte](1.toByte)), lit(1))
    checkAnswer(spark.range(1).select(mapWithBinaryKey.getItem(Array[Byte](1.toByte))), Row(1))
  }
+
+  test("SPARK-24556: ReusedExchange should rewrite output partitioning for RangePartitioning") {


this is not an end-to-end test, let's put it in PlannerSuite and also test cached table.

please also mention cached table in PR title

cloud-fan · 2018-06-15T23:36:12Z

LGTM except the test

viirya · 2018-06-16T00:03:25Z

sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/Exchange.scala

@@ -70,7 +70,7 @@ case class ReusedExchangeExec(override val output: Seq[Attribute], child: Exchan
  }

  override def outputPartitioning: Partitioning = child.outputPartitioning match {
-    case h: HashPartitioning => h.copy(expressions = h.expressions.map(updateAttr))
+    case e: Expression => updateAttr(e).asInstanceOf[Partitioning]
    case other => other


viirya · 2018-06-16T07:58:40Z

sql/core/src/test/scala/org/apache/spark/sql/execution/PlannerSuite.scala

+
+    // ReusedExchange is RangePartitioning
+    val df9 = Seq(1 -> "a").toDF("i", "j").orderBy($"i")
+    val df10 = Seq(1 -> "a").toDF("i", "j").orderBy($"i")


Seems this test can be simplified. For example the difference between df3, df4 and df9, df10 is only persist. You can just define the dataframes and reuse them.

mgaido91 · 2018-06-16T08:38:24Z

sql/core/src/test/scala/org/apache/spark/sql/execution/PlannerSuite.scala

+    checkInMemoryTableScanOutputPartitioningRewrite(df3.union(df4), classOf[RangePartitioning])
+
+    // InMemoryTableScan is PartitioningCollection
+    withSQLConf("spark.sql.autoBroadcastJoinThreshold" -> "0") {


nit: please use SQLConf instead of the plain string (and the value here I think should be -1)

mgaido91 · 2018-06-16T08:38:56Z

sql/core/src/test/scala/org/apache/spark/sql/execution/PlannerSuite.scala

+    checkInMemoryTableScanOutputPartitioningRewrite(df1.union(df2), classOf[HashPartitioning])
+
+    // InMemoryTableScan is RangePartitioning
+    val df3 = Seq(1 -> "a").toDF("i", "j").orderBy($"i").persist()


probably a spark.range is enough instead of creating a df and ordering it

I want RangePartitioning here, so using orderBy.

I see, but if you use spark.range you have RangePartitioning as well without the need of a sort operation

I just have an update of tests, feel free to let me know if you are OK with the new version.

I am OK apart from this comment which is still unresolved in the new version. Instead of doing an unneeded sort, we can just simply have a Range operation which has RangePartitioning as output partitioning.

ok, updated.

why didn't you just set:

val df3 = spark.range ... val df4 = spark.range ...

but you let them as before and than you changed the other place where they were used?

They are different, in ReusedExchange we need Shuffle, so we need orderBy, while in InMemoryTableScan, we can use spark.range directly, right?

+ // ReusedExchange is RangePartitioning + val df3 = Seq(1 -> "a").toDF("i", "j").orderBy($"i") + val df4 = Seq(1 -> "a").toDF("i", "j").orderBy($"i") + checkReusedExchangeOutputPartitioningRewrite(df3.union(df4), classOf[RangePartitioning]) + + // InMemoryTableScan is RangePartitioning + val df7 = spark.range(1, 100, 1, 10).toDF().persist() + val df8 = spark.range(1, 100, 1, 10).toDF().persist() + checkInMemoryTableScanOutputPartitioningRewrite(df7.union(df8), classOf[RangePartitioning])

oh, sure, sorry, thanks.

SparkQA · 2018-06-16T11:10:21Z

Test build #91968 has finished for PR 21564 at commit 9da85e0.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

mgaido91 · 2018-06-16T11:52:47Z

LGTM

viirya · 2018-06-16T12:22:36Z

Thanks for fixing this! LGTM

SparkQA · 2018-06-16T12:53:08Z

Test build #91970 has finished for PR 21564 at commit 85dc1bc.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2018-06-16T15:08:46Z

Test build #91972 has finished for PR 21564 at commit 7d1c2f2.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2018-06-18T22:13:13Z

sql/core/src/test/scala/org/apache/spark/sql/execution/PlannerSuite.scala

+    // InMemoryTableScan is HashPartitioning
+    val df5 = df1.persist()
+    val df6 = df2.persist()
+    checkInMemoryTableScanOutputPartitioningRewrite(df5.union(df6), classOf[HashPartitioning])


why do we need to test table cache with union?

I want to make sure both cache have the right output partitioning, so test the second cache table only?

union is used to trigger exchange reuse, but it's unnecessary to test cache.

yucai · 2018-06-19T05:05:49Z

@cloud-fan thanks for reviewing, tests have been updated.

cloud-fan · 2018-06-19T05:35:33Z

sql/core/src/test/scala/org/apache/spark/sql/execution/PlannerSuite.scala

+  test("SPARK-24556: always rewrite output partitioning in ReusedExchangeExec " +
+    "and InMemoryTableScanExec") {
+    def checkOutputPartitioningRewrite(
+        plans: Seq[SparkPlan],


now we can take a single spark plan

How do you think if we merge check*OutputPartitioningRewrite together?

def checkPlanAndOutputPartitioningRewrite( df: DataFrame, expectedPlanClass: Class[_], expectedPartitioningClass: Class[_]): Unit = { val plans = df.queryExecution.executedPlan.collect { case r: ReusedExchangeExec => r case m: InMemoryTableScanExec => m } assert(plans.size == 1) val plan = plans.head assert(plan.getClass == expectedPlanClass) val partitioning = plan.outputPartitioning assert(partitioning.getClass == expectedPartitioningClass) val partitionedAttrs = partitioning.asInstanceOf[Expression].references assert(partitionedAttrs.subsetOf(plan.outputSet)) }

@cloud-fan I still use Seq, so I can make checkReusedExchangeOutputPartitioningRewrite and checkInMemoryTableScanOutputPartitioningRewrite simpler. Kindly let me know if you have better idea.

cloud-fan · 2018-06-19T05:37:09Z

sql/core/src/test/scala/org/apache/spark/sql/execution/PlannerSuite.scala

+    checkReusedExchangeOutputPartitioningRewrite(df3.union(df4), classOf[RangePartitioning])
+
+    // InMemoryTableScan is HashPartitioning
+    df1.persist()


I feel it's better to not reuse the dataframe that were used to test ReuseExchange

Agree, I also like a new one :).

SparkQA · 2018-06-19T07:05:01Z

Test build #92068 has finished for PR 21564 at commit 6744a9e.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds no public classes.

viirya · 2018-06-19T07:11:54Z

retest this please.

SparkQA · 2018-06-19T09:06:02Z

Test build #92074 has finished for PR 21564 at commit 6744a9e.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

yucai · 2018-06-19T09:12:00Z

retest this please.

SparkQA · 2018-06-19T12:14:47Z

Test build #92080 has finished for PR 21564 at commit dcd0ce9.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2018-06-19T12:51:49Z

Test build #92082 has finished for PR 21564 at commit dcd0ce9.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2018-06-19T17:53:03Z

thanks, merging to master!

[SPARK-24556][SQL] ReusedExchange should rewrite output partitioning …

f37139b

…also when child's partitioning is RangePartitioning

viirya reviewed Jun 14, 2018

View reviewed changes

mgaido91 reviewed Jun 14, 2018

View reviewed changes

yucai added 2 commits June 14, 2018 23:35

improve

0ef99cc

improve import

405ba94

yucai changed the title ~~[SPARK-24556][SQL] ReusedExchange should rewrite output partitioning also when child's partitioning is RangePartitioning~~ [SPARK-24556][SQL] ReusedExchange should rewrite output partitioning when child's partitioning is RangePartitioning Jun 14, 2018

cloud-fan reviewed Jun 15, 2018

View reviewed changes

viirya reviewed Jun 16, 2018

View reviewed changes

yucai changed the title ~~[SPARK-24556][SQL] ReusedExchange should rewrite output partitioning when child's partitioning is RangePartitioning~~ [SPARK-24556][SQL] Always rewrite output partitioning in InMemoryTableScanExec and ReusedExchangeExec Jun 16, 2018

rework unit tests

9da85e0

viirya reviewed Jun 16, 2018

View reviewed changes

mgaido91 reviewed Jun 16, 2018

View reviewed changes

upate tests

85dc1bc

yucai changed the title ~~[SPARK-24556][SQL] Always rewrite output partitioning in InMemoryTableScanExec and ReusedExchangeExec~~ [SPARK-24556][SQL] Always rewrite output partitioning in ReusedExchangeExec and InMemoryTableScanExec Jun 16, 2018

update tests

7d1c2f2

cloud-fan reviewed Jun 18, 2018

View reviewed changes

no need union in cache table

6744a9e

cloud-fan reviewed Jun 19, 2018

View reviewed changes

update tests

dcd0ce9

asfgit closed this in 9dbe53e Jun 19, 2018

[SPARK-24556][SQL] Always rewrite output partitioning in ReusedExchangeExec and InMemoryTableScanExec #21564

[SPARK-24556][SQL] Always rewrite output partitioning in ReusedExchangeExec and InMemoryTableScanExec #21564

Conversation

yucai commented Jun 14, 2018 • edited

What changes were proposed in this pull request?

How was this patch tested?

yucai commented Jun 14, 2018

Choose a reason for hiding this comment

mgaido91 Jun 14, 2018 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

viirya Jun 14, 2018 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SparkQA commented Jun 14, 2018

yucai commented Jun 14, 2018

yucai commented Jun 14, 2018

mgaido91 commented Jun 14, 2018

SparkQA commented Jun 14, 2018

SparkQA commented Jun 14, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cloud-fan commented Jun 15, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SparkQA commented Jun 16, 2018

mgaido91 commented Jun 16, 2018

viirya commented Jun 16, 2018

SparkQA commented Jun 16, 2018

SparkQA commented Jun 16, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yucai commented Jun 19, 2018

Choose a reason for hiding this comment

yucai Jun 19, 2018 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SparkQA commented Jun 19, 2018

viirya commented Jun 19, 2018

SparkQA commented Jun 19, 2018

yucai commented Jun 19, 2018

SparkQA commented Jun 19, 2018

SparkQA commented Jun 19, 2018

cloud-fan commented Jun 19, 2018

yucai commented Jun 14, 2018 •

edited

mgaido91 Jun 14, 2018 •

edited

viirya Jun 14, 2018 •

edited

yucai Jun 19, 2018 •

edited