Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-19658] [SQL] Set NumPartitions of RepartitionByExpression In Parser #16988

Closed
wants to merge 9 commits into from

Conversation

gatorsmile
Copy link
Member

@gatorsmile gatorsmile commented Feb 19, 2017

What changes were proposed in this pull request?

Currently, if NumPartitions is not set in RepartitionByExpression, we will set it using spark.sql.shuffle.partitions during Planner. However, this is not following the general resolution process. This PR is to set it in Parser and then Optimizer can use the value for plan optimization.

How was this patch tested?

Added a test case.

@SparkQA
Copy link

SparkQA commented Feb 19, 2017

Test build #73125 has finished for PR 16988 at commit c276917.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Feb 19, 2017

Test build #73130 has finished for PR 16988 at commit ec12258.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member Author

gatorsmile commented Feb 20, 2017

cc @cloud-fan @brkyvz

@cloud-fan
Copy link
Contributor

how about we set the numPartitions when we build RepartitionByExpression? the parser can also access the SQLConf.

(" sort by a, b desc", basePlan.sortBy('a.asc, 'b.desc)),
(" distribute by a, b", basePlan.distribute('a, 'b)()),
(" distribute by a sort by b", basePlan.distribute('a)().sortBy('b.asc)),
(" cluster by a, b", basePlan.distribute('a, 'b)().sortBy('a.asc, 'b.asc))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These three test cases are moved to SparkSqlParserSuite.scala

@gatorsmile gatorsmile changed the title [SPARK-19658] [SQL] Set NumPartitions of RepartitionByExpression In Analyzer [SPARK-19658] [SQL] Set NumPartitions of RepartitionByExpression In Parser Feb 22, 2017
@gatorsmile
Copy link
Member Author

@cloud-fan Great idea!

@SparkQA
Copy link

SparkQA commented Feb 22, 2017

Test build #73286 has finished for PR 16988 at commit b561a1c.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Feb 22, 2017

Test build #73288 has finished for PR 16988 at commit f3adf10.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

ctx: QueryOrganizationContext,
expressions: Seq[Expression],
query: LogicalPlan): LogicalPlan = {
RepartitionByExpression(expressions, query, conf.numShufflePartitions)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: indent is wrong here

@SparkQA
Copy link

SparkQA commented Feb 22, 2017

Test build #73289 has finished for PR 16988 at commit dd2b717.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • trait Logging
  • case class UnresolvedRelation(tableIdentifier: TableIdentifier) extends LeafNode

@cloud-fan
Copy link
Contributor

LGTM

@SparkQA
Copy link

SparkQA commented Feb 22, 2017

Test build #73293 has finished for PR 16988 at commit b106abf.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

thanks, merging to master!

@asfgit asfgit closed this in dc005ed Feb 23, 2017
Yunni pushed a commit to Yunni/spark that referenced this pull request Feb 27, 2017
…rser

### What changes were proposed in this pull request?

Currently, if `NumPartitions` is not set in RepartitionByExpression, we will set it using `spark.sql.shuffle.partitions` during Planner. However, this is not following the general resolution process. This PR is to set it in `Parser` and then `Optimizer` can use the value for plan optimization.

### How was this patch tested?

Added a test case.

Author: Xiao Li <gatorsmile@gmail.com>

Closes apache#16988 from gatorsmile/resolveRepartition.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants