[SPARK-23707][SQL] Don't need shuffle exchange with single partition for 'spark.range' #20844

ConeyLiu · 2018-03-16T08:12:54Z

What changes were proposed in this pull request?

Just like #20726. There is no need 'Exchange' when spark.range produce only one partition.

How was this patch tested?

New UT.

ConeyLiu · 2018-03-16T08:13:55Z

@cloud-fan pls take a look, this is a small change. Thanks a lot.

cloud-fan · 2018-03-16T22:37:17Z

sql/core/src/main/scala/org/apache/spark/sql/execution/basicPhysicalOperators.scala

+    val initRangeFuncName = ctx.addNewFunction(initRange,
      s"""
-        | private void initRange(int idx) {
+        | private void ${initRange}(int idx) {


can you add a test? with this bug seems we can't join 2 ranges.

@cloud-fan thanks for reviewing. Both BroadCastExchange and ShuffleExchange don't support CodegenSupport, so there should be two WholeStageCodegen.

Whole-stage codegen is pull-model. I don't think we will have 2 leaf nodes in one stage. Maybe we can just add some comments to explain it and don't change the code.

OK, I can just some comments and keep the code unchanged. I changed it here just for better code robustness.

Hi @cloud-fan , before adding the comments, I have a question about why we still need exchange if we join two spark.range(1, 10, 1, 1). Because of both of the range are only one partition, does the exchange really needed?

I think you can apply #20726 to the RangeExec operator and fix this.

Thanks for your suggestion, let me take a try.

ConeyLiu · 2018-03-21T12:38:42Z

@cloud-fan, pls take a look, thanks a lot.

cloud-fan · 2018-03-21T16:54:48Z

ok to test

cloud-fan · 2018-03-21T17:54:29Z

Actually, range is mostly for testing. I don't think we should do too match logical optimization for it, like avoid shuffle/sort. cc @hvanhovell

SparkQA · 2018-03-21T19:20:57Z

Test build #88474 has finished for PR 20844 at commit b52703c.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

ConeyLiu · 2018-03-22T03:54:43Z

This change is very simple, and just make it consistent with other LeafNode.

cloud-fan · 2018-03-22T04:47:06Z

sql/core/src/main/scala/org/apache/spark/sql/execution/basicPhysicalOperators.scala

    "numOutputRows" -> SQLMetrics.createMetric(sparkContext, "number of output rows"))

+  /** Specifies how data is partitioned across different nodes in the cluster. */
+  override def outputPartitioning: Partitioning = if (numSlices == 1 && numElements != 0) {


why numElements != 0?

This related to the UT error. spark.range(-10, -9, -20, 1).count() faild when codegen set to true and RangeExec.outputPartitioning set to SinglePartition. I try to found the root reason, but failed.

SparkQA · 2018-03-22T07:05:02Z

Test build #88500 has finished for PR 20844 at commit 15165d7.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds no public classes.

gczsjdy · 2018-03-27T05:29:25Z

sql/core/src/test/scala/org/apache/spark/sql/ConfigBehaviorSuite.scala

      val n = 10000
      // Trigger a sort
-      val data = spark.range(0, n, 1, 1).sort('id)
+      val data = spark.range(0, n, 1, 2).sort('id)


Why change this?

: ) Know this now

AmplabJenkins · 2018-06-09T00:12:48Z

Can one of the admins verify this patch?

cloud-fan · 2018-06-09T01:32:35Z

I think it has been fixed?

HyukjinKwon · 2018-08-26T14:58:09Z

Yup, the regression tests pass at the very least.

ConeyLiu · 2018-08-27T00:59:02Z

thanks for all. Closes it.

Fresh 'initRange' name to avoid method name conflicts

d69f32c

cloud-fan reviewed Mar 16, 2018

View reviewed changes

ConeyLiu changed the title ~~[SPARK-23707][SQL] Fresh 'initRange' name to avoid method name conflicts~~ [SPARK-23707][SQL][WIP] Fresh 'initRange' name to avoid method name conflicts Mar 21, 2018

ConeyLiu added 2 commits March 21, 2018 20:09

Merge remote-tracking branch 'spark/master' into range

a40b53f

no shuffle exchange with single partition

b52703c

ConeyLiu changed the title ~~[SPARK-23707][SQL][WIP] Fresh 'initRange' name to avoid method name conflicts~~ [SPARK-23707][SQL] No shuffle exchange with single partition for 'spark.range' Mar 21, 2018

ConeyLiu changed the title ~~[SPARK-23707][SQL] No shuffle exchange with single partition for 'spark.range'~~ [SPARK-23707][SQL] Don't need shuffle exchange with single partition for 'spark.range' Mar 21, 2018

fix UT errors

15165d7

cloud-fan reviewed Mar 22, 2018

View reviewed changes

gczsjdy reviewed Mar 27, 2018

View reviewed changes

ConeyLiu closed this Aug 27, 2018

ConeyLiu deleted the range branch September 4, 2018 01:12

[SPARK-23707][SQL] Don't need shuffle exchange with single partition for 'spark.range' #20844

[SPARK-23707][SQL] Don't need shuffle exchange with single partition for 'spark.range' #20844

Uh oh!

Conversation

ConeyLiu commented Mar 16, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

ConeyLiu commented Mar 16, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ConeyLiu commented Mar 21, 2018

Uh oh!

cloud-fan commented Mar 21, 2018

Uh oh!

cloud-fan commented Mar 21, 2018

Uh oh!

SparkQA commented Mar 21, 2018

Uh oh!

ConeyLiu commented Mar 22, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ConeyLiu Mar 22, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Mar 22, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

AmplabJenkins commented Jun 9, 2018

Uh oh!

cloud-fan commented Jun 9, 2018

Uh oh!

HyukjinKwon commented Aug 26, 2018

Uh oh!

ConeyLiu commented Aug 27, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

ConeyLiu commented Mar 16, 2018 •

edited

Loading

ConeyLiu Mar 22, 2018 •

edited

Loading