[SPARK-19658] [SQL] Set NumPartitions of RepartitionByExpression In Parser #16988

gatorsmile · 2017-02-19T07:57:44Z

What changes were proposed in this pull request?

Currently, if NumPartitions is not set in RepartitionByExpression, we will set it using spark.sql.shuffle.partitions during Planner. However, this is not following the general resolution process. This PR is to set it in Parser and then Optimizer can use the value for plan optimization.

How was this patch tested?

Added a test case.

SparkQA · 2017-02-19T09:31:49Z

Test build #73125 has finished for PR 16988 at commit c276917.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-02-19T18:19:36Z

Test build #73130 has finished for PR 16988 at commit ec12258.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

gatorsmile · 2017-02-20T00:34:31Z

cc @cloud-fan @brkyvz

cloud-fan · 2017-02-22T02:42:11Z

how about we set the numPartitions when we build RepartitionByExpression? the parser can also access the SQLConf.

gatorsmile · 2017-02-22T18:09:34Z

sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/PlanParserSuite.scala

-      (" sort by a, b desc", basePlan.sortBy('a.asc, 'b.desc)),
-      (" distribute by a, b", basePlan.distribute('a, 'b)()),
-      (" distribute by a sort by b", basePlan.distribute('a)().sortBy('b.asc)),
-      (" cluster by a, b", basePlan.distribute('a, 'b)().sortBy('a.asc, 'b.asc))


These three test cases are moved to SparkSqlParserSuite.scala

gatorsmile · 2017-02-22T18:10:36Z

@cloud-fan Great idea!

SparkQA · 2017-02-22T18:14:18Z

Test build #73286 has finished for PR 16988 at commit b561a1c.

This patch fails to build.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-02-22T18:19:11Z

Test build #73288 has finished for PR 16988 at commit f3adf10.

This patch fails to build.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2017-02-22T18:40:39Z

sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala

+      ctx: QueryOrganizationContext,
+      expressions: Seq[Expression],
+      query: LogicalPlan): LogicalPlan = {
+      RepartitionByExpression(expressions, query, conf.numShufflePartitions)


nit: indent is wrong here

SparkQA · 2017-02-22T20:32:01Z

Test build #73289 has finished for PR 16988 at commit dd2b717.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
trait Logging
case class UnresolvedRelation(tableIdentifier: TableIdentifier) extends LeafNode

cloud-fan · 2017-02-22T20:36:20Z

LGTM

SparkQA · 2017-02-22T20:54:00Z

Test build #73293 has finished for PR 16988 at commit b106abf.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2017-02-23T01:27:16Z

thanks, merging to master!

…rser ### What changes were proposed in this pull request? Currently, if `NumPartitions` is not set in RepartitionByExpression, we will set it using `spark.sql.shuffle.partitions` during Planner. However, this is not following the general resolution process. This PR is to set it in `Parser` and then `Optimizer` can use the value for plan optimization. ### How was this patch tested? Added a test case. Author: Xiao Li <gatorsmile@gmail.com> Closes apache#16988 from gatorsmile/resolveRepartition.

gatorsmile added 2 commits February 18, 2017 20:28

Merge remote-tracking branch 'upstream/master' into resolveRepartition

b3df296

fix.

c276917

gatorsmile mentioned this pull request Feb 19, 2017

[SPARK-19601] [SQL] Fix CollapseRepartition rule to preserve shuffle-enabled Repartition #16933

Closed

fix test case

ec12258

gatorsmile added 3 commits February 22, 2017 10:06

move it to parser

b561a1c

style fix.

2a026bd

clean test cases

f3adf10

gatorsmile commented Feb 22, 2017

View reviewed changes

gatorsmile changed the title ~~[SPARK-19658] [SQL] Set NumPartitions of RepartitionByExpression In Analyzer~~ [SPARK-19658] [SQL] Set NumPartitions of RepartitionByExpression In Parser Feb 22, 2017

Merge remote-tracking branch 'upstream/master' into resolveRepartition

dd2b717

cloud-fan reviewed Feb 22, 2017

View reviewed changes

gatorsmile added 2 commits February 22, 2017 10:53

fix the style

fdb781c

fix the style

b106abf

asfgit closed this in dc005ed Feb 23, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-19658] [SQL] Set NumPartitions of RepartitionByExpression In Parser #16988

[SPARK-19658] [SQL] Set NumPartitions of RepartitionByExpression In Parser #16988

gatorsmile commented Feb 19, 2017 •

edited

Loading

SparkQA commented Feb 19, 2017

SparkQA commented Feb 19, 2017

gatorsmile commented Feb 20, 2017 •

edited

Loading

cloud-fan commented Feb 22, 2017

gatorsmile Feb 22, 2017

gatorsmile commented Feb 22, 2017

SparkQA commented Feb 22, 2017

SparkQA commented Feb 22, 2017

cloud-fan Feb 22, 2017

SparkQA commented Feb 22, 2017

cloud-fan commented Feb 22, 2017

SparkQA commented Feb 22, 2017

cloud-fan commented Feb 23, 2017

[SPARK-19658] [SQL] Set NumPartitions of RepartitionByExpression In Parser #16988

[SPARK-19658] [SQL] Set NumPartitions of RepartitionByExpression In Parser #16988

Conversation

gatorsmile commented Feb 19, 2017 • edited Loading

What changes were proposed in this pull request?

How was this patch tested?

SparkQA commented Feb 19, 2017

SparkQA commented Feb 19, 2017

gatorsmile commented Feb 20, 2017 • edited Loading

cloud-fan commented Feb 22, 2017

gatorsmile Feb 22, 2017

Choose a reason for hiding this comment

gatorsmile commented Feb 22, 2017

SparkQA commented Feb 22, 2017

SparkQA commented Feb 22, 2017

cloud-fan Feb 22, 2017

Choose a reason for hiding this comment

SparkQA commented Feb 22, 2017

cloud-fan commented Feb 22, 2017

SparkQA commented Feb 22, 2017

cloud-fan commented Feb 23, 2017

gatorsmile commented Feb 19, 2017 •

edited

Loading

gatorsmile commented Feb 20, 2017 •

edited

Loading