-
Notifications
You must be signed in to change notification settings - Fork 28.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-20311][SQL] Support aliases for table value functions #17666
Conversation
Test build #75876 has finished for PR 17666 at commit
|
Test build #75888 has started for PR 17666 at commit |
Jenkins, retest this please. |
Test build #75889 has finished for PR 17666 at commit
|
| '(' queryNoWith ')' sample? (AS? strictIdentifier)? #aliasedQuery | ||
| '(' relation ')' sample? (AS? strictIdentifier)? #aliasedRelation | ||
| inlineTable #inlineTableDefault2 | ||
| identifier '(' (expression (',' expression)*)? ')' (AS? identifier identifierList?)? #tableValuedFunction |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps we should put the multi-alias in a separate rule? Since it is also used by inline table.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did you add the multi-alias anyway?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I missed the point. okay, I'll reconsider this. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It makes sense to add this. Lets keep the multi-alias for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, it seems I misunderstood what you pointed out. You meant should we need to support a query like SELECT * FROM [[tvf]] AS t(a, b, ...)
in this pr? Yea, I know we currently support range
only as a table value function though, I also think it'd be better to put a more general rule in this file. So, +1 for keeping this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then, I'll update this pr to separate this rule and share it with the inline table rule.
Test build #75903 has finished for PR 17666 at commit
|
@hvanhovell Could you check again? |
ping |
@hvanhovell ping |
@hvanhovell @gatorsmile ping |
@@ -69,7 +69,10 @@ case class UnresolvedInlineTable( | |||
* select * from range(10); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please update this example.
| '(' queryNoWith ')' sample? (AS? strictIdentifier)? #aliasedQuery | ||
| '(' relation ')' sample? (AS? strictIdentifier)? #aliasedRelation | ||
| inlineTable #inlineTableDefault2 | ||
| identifier '(' (expression (',' expression)*)? ')' tableAlias #tableValuedFunction |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please follow inlineTable
to add a separate rule.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
}) | ||
) | ||
|
||
private def validateInputDimension(tvf: UnresolvedTableValuedFunction, expectedNumCols: Int) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Conceptually, expectedNumCols
needs to be part of type/class TVF
. This is the num of TVF's output arguments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, we need to add comments to this function.
Range(0, end, 1, None) | ||
tvf("end" -> LongType) { case (tvf, args @ Seq(end: Long)) => | ||
validateInputDimension(tvf, 1) | ||
Range(0, end, 1, None, tvf.outputNames.headOption) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of changing the output of Range
, I think we can simply add a Project
above Range for column alias, like what we do in the Dataframe API: spark.range(100).toDF("i")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
great suggestion! I could make it simpler (https://github.com/apache/spark/pull/17666/files#diff-e2023817ad13201adeb45b74e99d29f8R129). Thought?
def apply(start: Long, end: Long, step: Long, numSlices: Option[Int]): Range = { | ||
val output = StructType(StructField("id", LongType, nullable = false) :: Nil).toAttributes | ||
|
||
def apply(start: Long, end: Long, step: Long, numSlices: Option[Int], outputName: Option[String]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why make this an option? It is not optional right? The next function should either call Range(start, end, step, Some(numSlices), "id") or this function should have a default parameter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yea, ok. I'll update. Thanks!
|
||
test("SPARK-20311 range(N) as alias") { | ||
def rangeWithAliases(outputNames: Seq[String]): LogicalPlan = { | ||
SubqueryAlias("t", UnresolvedTableValuedFunction("range", Literal(7) :: Nil, outputNames)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we also test different cases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, I'll add more tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I a bit worry though, this fix satisfies your intention? 625dbda
private def tvf(args: (String, DataType)*)(pf: PartialFunction[Seq[Any], LogicalPlan]) | ||
: (ArgumentList, Seq[Any] => LogicalPlan) = { | ||
private def tvf(args: (String, DataType)*)( | ||
pf: PartialFunction[(UnresolvedTableValuedFunction, Seq[Any]), LogicalPlan]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm... this signature is kind of complex. Can we try to use some kind of class/case class that encapsulates this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also think that we should separate the aliasing from constructing the table valued function. See @gatorsmile's earlier comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, I'll re-consider the signature.
Test build #76604 has finished for PR 17666 at commit
|
Please rebase to pick up fix for the R tests. Though again, I'm not sure why it is running R tests for this PR - is the change detection logic broken somehow? |
ok, thanks! no, I think I do not touch on that. |
LGTM |
Test build #76602 has finished for PR 17666 at commit
|
Test build #76608 has finished for PR 17666 at commit
|
LGTM. Thank you! @maropu |
Test build #76615 has finished for PR 17666 at commit
|
retest this please |
Test build #76650 has finished for PR 17666 at commit
|
## What changes were proposed in this pull request? This pr added parsing rules to support aliases in table value functions. ## How was this patch tested? Added tests in `PlanParserSuite`. Author: Takeshi Yamamuro <yamamuro@apache.org> Closes #17666 from maropu/SPARK-20311. (cherry picked from commit 714811d) Signed-off-by: Wenchen Fan <wenchen@databricks.com>
thanks, merging to master/2.2! |
@maropu Sorry. I think this PR introduces a regression.
I think we are taking the cross as the alias. I reverted your change locally and the query worked. I am attaching the expected analyzed plan below.
|
I am going to revert this PR from master and branch-2.2. I need to revert it because it is in branch-2.2 and 2.2 is in the RC staging. |
I have reverted this change from both master and branch-2.2. I have reopened the jira. |
@yhuai okay, thanks for letting me know! I'll make a new pr to fix. |
## What changes were proposed in this pull request? This pr added parsing rules to support aliases in table value functions. The previous pr (apache#17666) has been reverted because of the regression. This new pr fixed the regression and add tests in `SQLQueryTestSuite`. ## How was this patch tested? Added tests in `PlanParserSuite` and `SQLQueryTestSuite`. Author: Takeshi Yamamuro <yamamuro@apache.org> Closes apache#17928 from maropu/SPARK-20311-3.
## What changes were proposed in this pull request? This pr added parsing rules to support aliases in table value functions. ## How was this patch tested? Added tests in `PlanParserSuite`. Author: Takeshi Yamamuro <yamamuro@apache.org> Closes apache#17666 from maropu/SPARK-20311.
## What changes were proposed in this pull request? This pr added parsing rules to support aliases in table value functions. The previous pr (apache#17666) has been reverted because of the regression. This new pr fixed the regression and add tests in `SQLQueryTestSuite`. ## How was this patch tested? Added tests in `PlanParserSuite` and `SQLQueryTestSuite`. Author: Takeshi Yamamuro <yamamuro@apache.org> Closes apache#17928 from maropu/SPARK-20311-3.
What changes were proposed in this pull request?
This pr added parsing rules to support aliases in table value functions.
How was this patch tested?
Added tests in
PlanParserSuite
.