Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-20311][SQL] Support aliases for table value functions #17666

Closed
wants to merge 4 commits into from

Conversation

maropu
Copy link
Member

@maropu maropu commented Apr 18, 2017

What changes were proposed in this pull request?

This pr added parsing rules to support aliases in table value functions.

How was this patch tested?

Added tests in PlanParserSuite.

@SparkQA
Copy link

SparkQA commented Apr 18, 2017

Test build #75876 has finished for PR 17666 at commit a611a13.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • case class UnresolvedTableValuedFunction(

@SparkQA
Copy link

SparkQA commented Apr 18, 2017

Test build #75888 has started for PR 17666 at commit 539a9e8.

@maropu
Copy link
Member Author

maropu commented Apr 18, 2017

Jenkins, retest this please.

@SparkQA
Copy link

SparkQA commented Apr 18, 2017

Test build #75889 has finished for PR 17666 at commit 539a9e8.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • case class UnresolvedTableValuedFunction(

| '(' queryNoWith ')' sample? (AS? strictIdentifier)? #aliasedQuery
| '(' relation ')' sample? (AS? strictIdentifier)? #aliasedRelation
| inlineTable #inlineTableDefault2
| identifier '(' (expression (',' expression)*)? ')' (AS? identifier identifierList?)? #tableValuedFunction
Copy link
Contributor

@hvanhovell hvanhovell Apr 18, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we should put the multi-alias in a separate rule? Since it is also used by inline table.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you add the multi-alias anyway?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I missed the point. okay, I'll reconsider this. Thanks!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It makes sense to add this. Lets keep the multi-alias for now.

Copy link
Member Author

@maropu maropu Apr 18, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, it seems I misunderstood what you pointed out. You meant should we need to support a query like SELECT * FROM [[tvf]] AS t(a, b, ...) in this pr? Yea, I know we currently support range only as a table value function though, I also think it'd be better to put a more general rule in this file. So, +1 for keeping this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then, I'll update this pr to separate this rule and share it with the inline table rule.

@SparkQA
Copy link

SparkQA commented Apr 18, 2017

Test build #75903 has finished for PR 17666 at commit 8d3a037.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@maropu
Copy link
Member Author

maropu commented Apr 18, 2017

@hvanhovell Could you check again?

@maropu
Copy link
Member Author

maropu commented Apr 21, 2017

ping

@maropu
Copy link
Member Author

maropu commented Apr 24, 2017

@hvanhovell ping

@maropu
Copy link
Member Author

maropu commented May 8, 2017

@hvanhovell @gatorsmile ping

@@ -69,7 +69,10 @@ case class UnresolvedInlineTable(
* select * from range(10);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please update this example.

| '(' queryNoWith ')' sample? (AS? strictIdentifier)? #aliasedQuery
| '(' relation ')' sample? (AS? strictIdentifier)? #aliasedRelation
| inlineTable #inlineTableDefault2
| identifier '(' (expression (',' expression)*)? ')' tableAlias #tableValuedFunction
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please follow inlineTable to add a separate rule.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

})
)

private def validateInputDimension(tvf: UnresolvedTableValuedFunction, expectedNumCols: Int)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Conceptually, expectedNumCols needs to be part of type/class TVF. This is the num of TVF's output arguments.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, we need to add comments to this function.

Range(0, end, 1, None)
tvf("end" -> LongType) { case (tvf, args @ Seq(end: Long)) =>
validateInputDimension(tvf, 1)
Range(0, end, 1, None, tvf.outputNames.headOption)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of changing the output of Range, I think we can simply add a Project above Range for column alias, like what we do in the Dataframe API: spark.range(100).toDF("i")

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

def apply(start: Long, end: Long, step: Long, numSlices: Option[Int]): Range = {
val output = StructType(StructField("id", LongType, nullable = false) :: Nil).toAttributes

def apply(start: Long, end: Long, step: Long, numSlices: Option[Int], outputName: Option[String])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why make this an option? It is not optional right? The next function should either call Range(start, end, step, Some(numSlices), "id") or this function should have a default parameter.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea, ok. I'll update. Thanks!


test("SPARK-20311 range(N) as alias") {
def rangeWithAliases(outputNames: Seq[String]): LogicalPlan = {
SubqueryAlias("t", UnresolvedTableValuedFunction("range", Literal(7) :: Nil, outputNames))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also test different cases?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, I'll add more tests.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I a bit worry though, this fix satisfies your intention? 625dbda

private def tvf(args: (String, DataType)*)(pf: PartialFunction[Seq[Any], LogicalPlan])
: (ArgumentList, Seq[Any] => LogicalPlan) = {
private def tvf(args: (String, DataType)*)(
pf: PartialFunction[(UnresolvedTableValuedFunction, Seq[Any]), LogicalPlan])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm... this signature is kind of complex. Can we try to use some kind of class/case class that encapsulates this?

Copy link
Contributor

@hvanhovell hvanhovell May 8, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also think that we should separate the aliasing from constructing the table valued function. See @gatorsmile's earlier comment.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, I'll re-consider the signature.

@SparkQA
Copy link

SparkQA commented May 9, 2017

Test build #76604 has finished for PR 17666 at commit 399d823.

  • This patch fails SparkR unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@felixcheung
Copy link
Member

Please rebase to pick up fix for the R tests.

Though again, I'm not sure why it is running R tests for this PR - is the change detection logic broken somehow?

@maropu
Copy link
Member Author

maropu commented May 9, 2017

ok, thanks! no, I think I do not touch on that.

@cloud-fan
Copy link
Contributor

LGTM

@SparkQA
Copy link

SparkQA commented May 9, 2017

Test build #76602 has finished for PR 17666 at commit f494e41.

  • This patch passes all tests.
  • This patch does not merge cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 9, 2017

Test build #76608 has finished for PR 17666 at commit 625dbda.

  • This patch fails SparkR unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member

LGTM.

Thank you! @maropu

@SparkQA
Copy link

SparkQA commented May 9, 2017

Test build #76615 has finished for PR 17666 at commit 81bef3b.

  • This patch fails SparkR unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member

retest this please

@SparkQA
Copy link

SparkQA commented May 9, 2017

Test build #76650 has finished for PR 17666 at commit 81bef3b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

asfgit pushed a commit that referenced this pull request May 9, 2017
## What changes were proposed in this pull request?
This pr added parsing rules to support aliases in table value functions.

## How was this patch tested?
Added tests in `PlanParserSuite`.

Author: Takeshi Yamamuro <yamamuro@apache.org>

Closes #17666 from maropu/SPARK-20311.

(cherry picked from commit 714811d)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
@cloud-fan
Copy link
Contributor

thanks, merging to master/2.2!

@asfgit asfgit closed this in 714811d May 9, 2017
@yhuai
Copy link
Contributor

yhuai commented May 9, 2017

@maropu Sorry. I think this PR introduces a regression.

scala> spark.sql("select * from range(1, 10) cross join range(1, 10)").explain
== Physical Plan ==
org.apache.spark.sql.AnalysisException: Detected cartesian product for INNER join between logical plans
Range (1, 10, step=1, splits=None)
and
Range (1, 10, step=1, splits=None)
Join condition is missing or trivial.
Use the CROSS JOIN syntax to allow cartesian products between these relations.;

I think we are taking the cross as the alias.

I reverted your change locally and the query worked. I am attaching the expected analyzed plan below.

scala> spark.sql("select * from range(1, 10) cross join range(1, 10)").queryExecution.analyzed
res1: org.apache.spark.sql.catalyst.plans.logical.LogicalPlan =
Project [id#8L, id#9L]
+- Join Cross
   :- Range (1, 10, step=1, splits=None)
   +- Range (1, 10, step=1, splits=None)

@yhuai
Copy link
Contributor

yhuai commented May 9, 2017

I am going to revert this PR from master and branch-2.2. I need to revert it because it is in branch-2.2 and 2.2 is in the RC staging.

@yhuai
Copy link
Contributor

yhuai commented May 9, 2017

I have reverted this change from both master and branch-2.2. I have reopened the jira.

@maropu
Copy link
Member Author

maropu commented May 9, 2017

@yhuai okay, thanks for letting me know! I'll make a new pr to fix.

ghost pushed a commit to dbtsai/spark that referenced this pull request May 11, 2017
## What changes were proposed in this pull request?
This pr added parsing rules to support aliases in table value functions.
The previous pr (apache#17666) has been reverted because of the regression. This new pr fixed the regression and add tests in `SQLQueryTestSuite`.

## How was this patch tested?
Added tests in `PlanParserSuite` and `SQLQueryTestSuite`.

Author: Takeshi Yamamuro <yamamuro@apache.org>

Closes apache#17928 from maropu/SPARK-20311-3.
liyichao pushed a commit to liyichao/spark that referenced this pull request May 24, 2017
## What changes were proposed in this pull request?
This pr added parsing rules to support aliases in table value functions.

## How was this patch tested?
Added tests in `PlanParserSuite`.

Author: Takeshi Yamamuro <yamamuro@apache.org>

Closes apache#17666 from maropu/SPARK-20311.
liyichao pushed a commit to liyichao/spark that referenced this pull request May 24, 2017
## What changes were proposed in this pull request?
This pr added parsing rules to support aliases in table value functions.
The previous pr (apache#17666) has been reverted because of the regression. This new pr fixed the regression and add tests in `SQLQueryTestSuite`.

## How was this patch tested?
Added tests in `PlanParserSuite` and `SQLQueryTestSuite`.

Author: Takeshi Yamamuro <yamamuro@apache.org>

Closes apache#17928 from maropu/SPARK-20311-3.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants