New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-28310][SQL] Support (FIRST_VALUE|LAST_VALUE)(expr[ (IGNORE|RESPECT) NULLS]?) syntax #25082
Conversation
ok to test |
Thank you for contribution, @lipzhu .
|
Oops. I got it. Thank you for update. If PostgreSQL doesn't support this, this cannot be part of SPARK-27764. |
Test build #107384 has finished for PR 25082 at commit
|
docs/sql-keywords.md
Outdated
@@ -159,6 +160,7 @@ Below is a list of all the keywords in Spark SQL. | |||
<tr><td>LIMIT</td><td>non-reserved</td><td>non-reserved</td><td>non-reserved</td></tr> | |||
<tr><td>LINES</td><td>non-reserved</td><td>non-reserved</td><td>non-reserved</td></tr> | |||
<tr><td>LIST</td><td>non-reserved</td><td>non-reserved</td><td>non-reserved</td></tr> | |||
<tr><td>LIST_VALUE</td><td>non-reserved</td><td>non-reserved</td><td>reserved</td></tr> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is wrong because you want LAST_VALUE
. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the two words should be reserved in spark (ansi=true). Also, you need to update TableIdentifierParserSuite
.
@@ -737,6 +737,15 @@ class ExpressionParserSuite extends AnalysisTest { | |||
assertEqual("last(a)", Last('a, Literal(false)).toAggregateExpression()) | |||
} | |||
|
|||
test("SPARK-28310 Support respect nulls keywords for first_value and last_value") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For a new feature and improvement test case, we don't use SPARK-28310
.
- test("SPARK-28310 Support respect nulls keywords for first_value and last_value") {
+ test("Support respect nulls keywords for first_value and last_value") {
cc @maropu |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left a minor comment and LGTM except for it.
docs/sql-keywords.md
Outdated
@@ -159,6 +160,7 @@ Below is a list of all the keywords in Spark SQL. | |||
<tr><td>LIMIT</td><td>non-reserved</td><td>non-reserved</td><td>non-reserved</td></tr> | |||
<tr><td>LINES</td><td>non-reserved</td><td>non-reserved</td><td>non-reserved</td></tr> | |||
<tr><td>LIST</td><td>non-reserved</td><td>non-reserved</td><td>non-reserved</td></tr> | |||
<tr><td>LIST_VALUE</td><td>non-reserved</td><td>non-reserved</td><td>reserved</td></tr> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the two words should be reserved in spark (ansi=true). Also, you need to update TableIdentifierParserSuite
.
Test build #107428 has finished for PR 25082 at commit
|
retest this please |
Test build #107426 has finished for PR 25082 at commit
|
Test build #107435 has finished for PR 25082 at commit
|
Test build #107446 has finished for PR 25082 at commit
|
I have checked PostgreSQL, Vertica, Oracle, Redshift, Presto, Teradata, FIRST_VALUE|LAST_VALUE is always used as a window function, not as an aggregate function. |
Yes. We need to revert this commit and then submit a proper support later. @maropu @dongjoon-hyun WDYT? |
I've checked the oracle document. FIRST_VALUE is not simply an alias of FIRST. FIRST can be used as aggregate functions and can omit the OVER clause: https://docs.oracle.com/en/database/oracle/oracle-database/18/sqlrf/FIRST.html#GUID-85AB9246-0E0A-44A1-A7E6-4E57502E9238 FIRST_VALUE can only be a window function and can't omit the OVER clause: https://docs.oracle.com/en/database/oracle/oracle-database/18/sqlrf/FIRST_VALUE.html#GUID-D454EC3F-370C-4C64-9B11-33FCB10D95EC I think we should revert it and rethink it. |
Ur, I missed the behaviour. Yea, +1 for the revert. |
Oops. Got it. Let me revert this. Thank you. |
I created a reverting PR for this. If a test passes, I'll merge that into |
I have created two ticket about them. |
@beliefer . Why not reusing SPARK-28310? After reverting, SPARK-28310 will be reopen. |
…NORE|RESPECT) NULLS]?) syntax" ### What changes were proposed in this pull request? This reverts commit b89c3de. ### Why are the changes needed? `FIRST_VALUE` is used only for window expression. Please see the discussion on #25082 . ### Does this PR introduce any user-facing change? Yes. ### How was this patch tested? Pass the Jenkins. Closes #27458 from dongjoon-hyun/SPARK-28310. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
…NORE|RESPECT) NULLS]?) syntax" ### What changes were proposed in this pull request? This reverts commit b89c3de. ### Why are the changes needed? `FIRST_VALUE` is used only for window expression. Please see the discussion on #25082 . ### Does this PR introduce any user-facing change? Yes. ### How was this patch tested? Pass the Jenkins. Closes #27458 from dongjoon-hyun/SPARK-28310. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com> (cherry picked from commit 8987169) Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
@dongjoon-hyun You said right. We should reuse SPARK-28310. I will remove the two ticket I created. |
Yes. Reusing the same JIRA ID is good for traceability. |
What changes were proposed in this pull request?
According to the ANSI SQL 2011
Below are Teradata, Oracle, Redshift which already support this grammar.
Teradata - https://docs.teradata.com/reader/756LNiPSFdY~4JcCCcR5Cw/SUwCpTupqmlBJvi2mipOaA
Oracle - https://docs.oracle.com/en/database/oracle/oracle-database/18/sqlrf/FIRST_VALUE.html#GUID-D454EC3F-370C-4C64-9B11-33FCB10D95EC
Redshift – https://docs.aws.amazon.com/redshift/latest/dg/r_WF_first_value.html
Postgresql didn't implement this grammar:
https://www.postgresql.org/docs/devel/functions-window.html
How was this patch tested?
UT.