[SPARK-22238] Fix plan resolution bug caused by EnsureStatefulOpPartitioning #19467

brkyvz · 2017-10-10T22:07:19Z

What changes were proposed in this pull request?

In EnsureStatefulOpPartitioning, we check that the inputRDD to a SparkPlan has the expected partitioning for Streaming Stateful Operators. The problem is that we are not allowed to access this information during planning.
The reason we added that check was because CoalesceExec could actually create RDDs with 0 partitions. We should fix it such that when CoalesceExec says that there is a SinglePartition, there is in fact an inputRDD of 1 partition instead of 0 partitions.

How was this patch tested?

Regression test in StreamingQuerySuite

SparkQA · 2017-10-11T00:42:58Z

Test build #82607 has finished for PR 19467 at commit 961ade1.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
class EmptyRDDWithPartitions(
case class SimplePartition(index: Int) extends Partition
case class EnsureStatefulOpPartitioning(conf: SQLConf) extends Rule[SparkPlan]

SparkQA · 2017-10-11T01:43:05Z

Test build #82609 has finished for PR 19467 at commit 549b882.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

brkyvz · 2017-10-11T17:36:19Z

cc @tdas

tdas · 2017-10-11T22:25:29Z

sql/core/src/main/scala/org/apache/spark/sql/execution/basicPhysicalOperators.scala

+    }
+  }
+
+  case class SimplePartition(index: Int) extends Partition


nit: EmptyPartition? isnt that more descriptive than "simple"

tdas

Added a few minor points.
Major point (offline discussion) that the right way to do this is the codify the requirement for fixed number of partitions as a require child distribution, and let EnsureRequirement take care of ti.

tdas · 2017-10-11T22:29:32Z

sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/IncrementalExecution.scala

  // Needs to be transformUp to avoid extra shuffles
  override def apply(plan: SparkPlan): SparkPlan = plan transformUp {
    case so: StatefulOperator =>
-      val numPartitions = plan.sqlContext.sessionState.conf.numShufflePartitions
+      val numPartitions = conf.numShufflePartitions


Why this change? Doesnt the plan have the same context and conf?

tdas · 2017-10-11T22:37:52Z

sql/core/src/main/scala/org/apache/spark/sql/execution/basicPhysicalOperators.scala

@@ -590,10 +590,33 @@ case class CoalesceExec(numPartitions: Int, child: SparkPlan) extends UnaryExecN
  }

  protected override def doExecute(): RDD[InternalRow] = {
-    child.execute().coalesce(numPartitions, shuffle = false)
+    if (numPartitions == 1 && child.execute().getNumPartitions < 1) {


Add a test in DatasetSuite that tests this empty rdd case. maybe in the same test as the existing coalesce test

the existing tests for the original problem should catch it

SparkQA · 2017-10-12T17:56:21Z

Test build #82689 has finished for PR 19467 at commit 70211ca.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
case class ClusteredDistribution(

SparkQA · 2017-10-12T17:56:30Z

Test build #82690 has finished for PR 19467 at commit 48d1f25.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
case class EmptyPartition(index: Int) extends Partition

SparkQA · 2017-10-12T20:12:00Z

Test build #82699 has finished for PR 19467 at commit 3f51c5c.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-10-12T23:14:19Z

Test build #82704 has finished for PR 19467 at commit 407d76c.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-10-12T23:24:08Z

Test build #82705 has finished for PR 19467 at commit 5122117.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
trait StatefulOperatorTest

tdas

Almost LGTM assuming tests pass. Just a few nits.

tdas · 2017-10-12T20:32:42Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/physical/partitioning.scala

-      expressions.forall(x => requiredClustering.exists(_.semanticEquals(x)))
+    case ClusteredDistribution(requiredClustering, desiredPartitions) =>
+      expressions.forall(x => requiredClustering.exists(_.semanticEquals(x))) &&
+        desiredPartitions.forall(_ == numPartitions) // if desiredPartition = true, returns true


// if desiredPartitions is None, return true

tdas · 2017-10-12T20:40:41Z

sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/EnsureRequirements.scala

@@ -50,7 +50,8 @@ case class EnsureRequirements(conf: SQLConf) extends Rule[SparkPlan] {
      numPartitions: Int): Partitioning = {
    requiredDistribution match {
      case AllTuples => SinglePartition
-      case ClusteredDistribution(clustering) => HashPartitioning(clustering, numPartitions)
+      case ClusteredDistribution(clustering, desiredPartitions) =>


Update scala docs saying that numPartitions param is only if the distribution does not specify it.

tdas · 2017-10-12T20:50:02Z

sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/statefulOperators.scala

@@ -43,10 +43,11 @@ case class StatefulOperatorStateInfo(
    checkpointLocation: String,
    queryRunId: UUID,
    operatorId: Long,
-    storeVersion: Long) {
+    storeVersion: Long,
+    numPartitions: Int) {


tdas · 2017-10-12T21:00:18Z

sql/core/src/test/scala/org/apache/spark/sql/streaming/EnsureStatefulOpPartitioningSuite.scala

@@ -53,7 +53,7 @@ class EnsureStatefulOpPartitioningSuite extends SparkPlanTest with SharedSQLCont
  test("ClusteredDistribution with coalesce(1) generates Exchange with HashPartitioning") {
    testEnsureStatefulOpPartitioning(
      baseDf.coalesce(1).queryExecution.sparkPlan,
-      requiredDistribution = keys => ClusteredDistribution(keys),


This test suite does not make sense as this rule does not exist anymore. So if we add tests in the related PlannerSuite to test the new addition in EnsureRequirements and Partitioning, then we will only need to test whether each stateful operator specifies the numPartitions in its required distribution.

tdas · 2017-10-12T23:19:59Z

sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/state/StateStoreRDDSuite.scala

@@ -214,7 +214,7 @@ class StateStoreRDDSuite extends SparkFunSuite with BeforeAndAfter with BeforeAn
      path: String,
      queryRunId: UUID = UUID.randomUUID,
      version: Int = 0): StatefulOperatorStateInfo = {
-    StatefulOperatorStateInfo(path, queryRunId, operatorId = 0, version)


super nit: numPartitions = 5

tdas · 2017-10-12T23:24:37Z

sql/core/src/test/scala/org/apache/spark/sql/streaming/StatefulOperatorTest.scala

+  protected def checkChildOutputPartitioning[T <: StatefulOperator](
+      sq: StreamingQuery,
+      colNames: Seq[String],
+      numPartitions: Option[Int] = None): Boolean = {


numPartitions is never used.

tdas · 2017-10-12T23:29:17Z

Whoops, i missed one comment. Not LGTM. Need tests in PlannerSuite that tests whether EnsureRequirements respects numPartitions in ClusteredDistribution.

tdas · 2017-10-13T21:39:42Z

LGTM, assuming tests pass.

SparkQA · 2017-10-13T22:56:00Z

Test build #82747 has finished for PR 19467 at commit 84ac2d8.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-10-14T02:30:00Z

Test build #82749 has finished for PR 19467 at commit 971f579.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

tdas · 2017-10-15T00:24:52Z

Merging to master, thanks for fixing this.

brkyvz added 3 commits October 10, 2017 15:02

Fix plan resolution bug caused by EnsureStatefulOpPartitioning

961ade1

minor additions

a8db9ad

add jira ticket to test

549b882

tdas reviewed Oct 11, 2017

View reviewed changes

tdas suggested changes Oct 12, 2017

View reviewed changes

brkyvz added 2 commits October 12, 2017 09:01

address comments

70211ca

add coalesce test

48d1f25

Fix test

3f51c5c

brkyvz added 2 commits October 12, 2017 13:27

savE

407d76c

refactor tests

5122117

tdas approved these changes Oct 12, 2017

View reviewed changes

address

84ac2d8

fix tests

971f579

asfgit closed this in e8547ff Oct 15, 2017

brkyvz deleted the stateful-op branch February 3, 2019 20:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-22238] Fix plan resolution bug caused by EnsureStatefulOpPartitioning #19467

[SPARK-22238] Fix plan resolution bug caused by EnsureStatefulOpPartitioning #19467

brkyvz commented Oct 10, 2017

SparkQA commented Oct 11, 2017

SparkQA commented Oct 11, 2017

brkyvz commented Oct 11, 2017

tdas Oct 11, 2017

tdas left a comment

tdas Oct 11, 2017

tdas Oct 11, 2017

brkyvz Oct 12, 2017

SparkQA commented Oct 12, 2017

SparkQA commented Oct 12, 2017

SparkQA commented Oct 12, 2017

SparkQA commented Oct 12, 2017

SparkQA commented Oct 12, 2017

tdas left a comment

tdas Oct 12, 2017

tdas Oct 12, 2017

tdas Oct 12, 2017

tdas Oct 12, 2017

tdas Oct 12, 2017

tdas Oct 12, 2017

tdas commented Oct 12, 2017

tdas commented Oct 13, 2017

SparkQA commented Oct 13, 2017

SparkQA commented Oct 14, 2017

tdas commented Oct 15, 2017 •

edited

Loading

[SPARK-22238] Fix plan resolution bug caused by EnsureStatefulOpPartitioning #19467

[SPARK-22238] Fix plan resolution bug caused by EnsureStatefulOpPartitioning #19467

Conversation

brkyvz commented Oct 10, 2017

What changes were proposed in this pull request?

How was this patch tested?

SparkQA commented Oct 11, 2017

SparkQA commented Oct 11, 2017

brkyvz commented Oct 11, 2017

Choose a reason for hiding this comment

tdas left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SparkQA commented Oct 12, 2017

SparkQA commented Oct 12, 2017

SparkQA commented Oct 12, 2017

SparkQA commented Oct 12, 2017

SparkQA commented Oct 12, 2017

tdas left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tdas commented Oct 12, 2017

tdas commented Oct 13, 2017

SparkQA commented Oct 13, 2017

SparkQA commented Oct 14, 2017

tdas commented Oct 15, 2017 • edited Loading

tdas commented Oct 15, 2017 •

edited

Loading