Skip to content

Conversation

@ConeyLiu
Copy link
Contributor

@ConeyLiu ConeyLiu commented Mar 16, 2018

What changes were proposed in this pull request?

Just like #20726. There is no need 'Exchange' when spark.range produce only one partition.

How was this patch tested?

New UT.

@ConeyLiu
Copy link
Contributor Author

@cloud-fan pls take a look, this is a small change. Thanks a lot.

val initRangeFuncName = ctx.addNewFunction(initRange,
s"""
| private void initRange(int idx) {
| private void ${initRange}(int idx) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a test? with this bug seems we can't join 2 ranges.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cloud-fan thanks for reviewing. Both BroadCastExchange and ShuffleExchange don't support CodegenSupport, so there should be two WholeStageCodegen.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whole-stage codegen is pull-model. I don't think we will have 2 leaf nodes in one stage. Maybe we can just add some comments to explain it and don't change the code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I can just some comments and keep the code unchanged. I changed it here just for better code robustness.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @cloud-fan , before adding the comments, I have a question about why we still need exchange if we join two spark.range(1, 10, 1, 1). Because of both of the range are only one partition, does the exchange really needed?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can apply #20726 to the RangeExec operator and fix this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your suggestion, let me take a try.

@ConeyLiu ConeyLiu changed the title [SPARK-23707][SQL] Fresh 'initRange' name to avoid method name conflicts [SPARK-23707][SQL][WIP] Fresh 'initRange' name to avoid method name conflicts Mar 21, 2018
@ConeyLiu ConeyLiu changed the title [SPARK-23707][SQL][WIP] Fresh 'initRange' name to avoid method name conflicts [SPARK-23707][SQL] No shuffle exchange with single partition for 'spark.range' Mar 21, 2018
@ConeyLiu
Copy link
Contributor Author

@cloud-fan, pls take a look, thanks a lot.

@ConeyLiu ConeyLiu changed the title [SPARK-23707][SQL] No shuffle exchange with single partition for 'spark.range' [SPARK-23707][SQL] Don't need shuffle exchange with single partition for 'spark.range' Mar 21, 2018
@cloud-fan
Copy link
Contributor

ok to test

@cloud-fan
Copy link
Contributor

Actually, range is mostly for testing. I don't think we should do too match logical optimization for it, like avoid shuffle/sort. cc @hvanhovell

@SparkQA
Copy link

SparkQA commented Mar 21, 2018

Test build #88474 has finished for PR 20844 at commit b52703c.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@ConeyLiu
Copy link
Contributor Author

This change is very simple, and just make it consistent with other LeafNode.

"numOutputRows" -> SQLMetrics.createMetric(sparkContext, "number of output rows"))

/** Specifies how data is partitioned across different nodes in the cluster. */
override def outputPartitioning: Partitioning = if (numSlices == 1 && numElements != 0) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why numElements != 0?

Copy link
Contributor Author

@ConeyLiu ConeyLiu Mar 22, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This related to the UT error. spark.range(-10, -9, -20, 1).count() faild when codegen set to true and RangeExec.outputPartitioning set to SinglePartition. I try to found the root reason, but failed.

@SparkQA
Copy link

SparkQA commented Mar 22, 2018

Test build #88500 has finished for PR 20844 at commit 15165d7.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

val n = 10000
// Trigger a sort
val data = spark.range(0, n, 1, 1).sort('id)
val data = spark.range(0, n, 1, 2).sort('id)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why change this?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

: ) Know this now

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@cloud-fan
Copy link
Contributor

I think it has been fixed?

@HyukjinKwon
Copy link
Member

Yup, the regression tests pass at the very least.

@ConeyLiu
Copy link
Contributor Author

thanks for all. Closes it.

@ConeyLiu ConeyLiu closed this Aug 27, 2018
@ConeyLiu ConeyLiu deleted the range branch September 4, 2018 01:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants