[SPARK-28560][SQL][followup] change the local shuffle reader from leaf node to unary node #26250

JkSelf · 2019-10-25T04:03:51Z

What changes were proposed in this pull request?

Why are the changes needed?

When make the LocalShuffleReaderExec to leaf node, there exists a potential issue: the leaf node will hide the running query stage and make the unfinished query stage as finished query stage when creating its parent query stage.
This PR make the leaf node to unary node.

Does this PR introduce any user-facing change?

No

How was this patch tested?

Existing tests

JkSelf · 2019-10-25T04:09:38Z

@cloud-fan @maryannxue Please help me review. Thanks.

SparkQA · 2019-10-25T06:52:00Z

Test build #112645 has finished for PR 26250 at commit 06847fe.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
case class LocalShuffleReaderExec(child: QueryStageExec) extends UnaryExecNode

JkSelf · 2019-10-25T07:04:25Z

The failed test may be not related. Please help to re-test. Thanks.

cloud-fan · 2019-10-25T13:33:06Z

retest it please

cloud-fan · 2019-10-25T13:34:17Z

sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/QueryStageExec.scala

@@ -129,7 +129,8 @@ abstract class QueryStageExec extends LeafExecNode {
 */
 case class ShuffleQueryStageExec(
    override val id: Int,
-    override val plan: ShuffleExchangeExec) extends QueryStageExec {
+    override val plan: ShuffleExchangeExec,
+    var isLocalShuffle: Boolean = false) extends QueryStageExec {


let's avoid using mutable states when not necessary.

cloud-fan · 2019-10-25T13:41:42Z

...core/src/main/scala/org/apache/spark/sql/execution/adaptive/ReduceNumShufflePartitions.scala

@@ -70,7 +70,8 @@ case class ReduceNumShufflePartitions(conf: SQLConf) extends Rule[SparkPlan] {
    }
    // ShuffleExchanges introduced by repartition do not support changing the number of partitions.
    // We change the number of partitions in the stage only if all the ShuffleExchanges support it.
-    if (!shuffleStages.forall(_.plan.canChangeNumPartitions)) {


We can change the logic of collecting shuffle stages:

def collectShuffleStages(plan: SparkPlan): Seq[ShuffleQueryStageExec] = plan match { case _: LocalShuffleReaderExec = Nil case stage: ShuffleQueryStageExec => Seq(stage) case ReusedQueryStageExec(_, stage: ShuffleQueryStageExec, _) => Seq(stage) case _ => plan.children.flatMap(collectShuffleStages) } val shuffleStages = collectShuffleStages(plan)

maryannxue · 2019-10-25T17:17:09Z

We should also remove the changes to AdaptiveSparkPlanHelper: https://github.com/apache/spark/pull/25295/files#diff-ece2c98b3d95a3adcb9e08f8b9d45f11R128

JkSelf · 2019-10-28T02:25:02Z

@cloud-fan @maryannxue update the coments. Please help me review. Thanks.

SparkQA · 2019-10-28T05:59:11Z

Test build #112748 has finished for PR 26250 at commit b5eb977.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2019-10-28T06:24:05Z

thanks, merging to master!

change the leaf node to unary node about local shuffle reader exec

06847fe

JkSelf mentioned this pull request Oct 25, 2019

[SPARK-28560][SQL] Optimize shuffle reader to local shuffle reader when smj converted to bhj in adaptive execution #25295

Closed

cloud-fan reviewed Oct 25, 2019

View reviewed changes

dongjoon-hyun added the SQL label Oct 25, 2019

resolve the comments

b5eb977

cloud-fan closed this in 50cf484 Oct 28, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-28560][SQL][followup] change the local shuffle reader from leaf node to unary node #26250

[SPARK-28560][SQL][followup] change the local shuffle reader from leaf node to unary node #26250

JkSelf commented Oct 25, 2019

JkSelf commented Oct 25, 2019

SparkQA commented Oct 25, 2019

JkSelf commented Oct 25, 2019

cloud-fan commented Oct 25, 2019

cloud-fan Oct 25, 2019

cloud-fan Oct 25, 2019

maryannxue commented Oct 25, 2019

JkSelf commented Oct 28, 2019

SparkQA commented Oct 28, 2019

cloud-fan commented Oct 28, 2019

[SPARK-28560][SQL][followup] change the local shuffle reader from leaf node to unary node #26250

[SPARK-28560][SQL][followup] change the local shuffle reader from leaf node to unary node #26250

Conversation

JkSelf commented Oct 25, 2019

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

JkSelf commented Oct 25, 2019

SparkQA commented Oct 25, 2019

JkSelf commented Oct 25, 2019

cloud-fan commented Oct 25, 2019

cloud-fan Oct 25, 2019

Choose a reason for hiding this comment

cloud-fan Oct 25, 2019

Choose a reason for hiding this comment

maryannxue commented Oct 25, 2019

JkSelf commented Oct 28, 2019

SparkQA commented Oct 28, 2019

cloud-fan commented Oct 28, 2019