[SPARK-28310][SQL] Support (FIRST_VALUE|LAST_VALUE)(expr[ (IGNORE|RESPECT) NULLS]?) syntax #25082

lipzhu · 2019-07-09T02:43:48Z

What changes were proposed in this pull request?

According to the ANSI SQL 2011

Below are Teradata, Oracle, Redshift which already support this grammar.

Teradata - https://docs.teradata.com/reader/756LNiPSFdY~4JcCCcR5Cw/SUwCpTupqmlBJvi2mipOaA
Oracle - https://docs.oracle.com/en/database/oracle/oracle-database/18/sqlrf/FIRST_VALUE.html#GUID-D454EC3F-370C-4C64-9B11-33FCB10D95EC
Redshift – https://docs.aws.amazon.com/redshift/latest/dg/r_WF_first_value.html
Postgresql didn't implement this grammar:
https://www.postgresql.org/docs/devel/functions-window.html

The SQL standard defines a RESPECT NULLS or IGNORE NULLS option for lead, lag, first_value, last_value, and nth_value. This is not implemented in PostgreSQL: the behavior is always the same as the standard's default, namely RESPECT NULLS.

How was this patch tested?

UT.

…spect nulls])

dongjoon-hyun · 2019-07-09T03:09:09Z

ok to test

dongjoon-hyun · 2019-07-09T03:11:23Z

Thank you for contribution, @lipzhu .

Please add PostgreSQL status into the PR description.
Make SPARK-28310 as a subtask of SPARK-27764 .

dongjoon-hyun · 2019-07-09T04:06:24Z

Oops. I got it. Thank you for update. If PostgreSQL doesn't support this, this cannot be part of SPARK-27764.

SparkQA · 2019-07-09T07:04:42Z

Test build #107384 has finished for PR 25082 at commit 678ccb7.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

dongjoon-hyun · 2019-07-09T17:16:25Z

docs/sql-keywords.md

@@ -159,6 +160,7 @@ Below is a list of all the keywords in Spark SQL.
  <tr><td>LIMIT</td><td>non-reserved</td><td>non-reserved</td><td>non-reserved</td></tr>
  <tr><td>LINES</td><td>non-reserved</td><td>non-reserved</td><td>non-reserved</td></tr>
  <tr><td>LIST</td><td>non-reserved</td><td>non-reserved</td><td>non-reserved</td></tr>
+  <tr><td>LIST_VALUE</td><td>non-reserved</td><td>non-reserved</td><td>reserved</td></tr>


This is wrong because you want LAST_VALUE. :)

I think the two words should be reserved in spark (ansi=true). Also, you need to update TableIdentifierParserSuite.

dongjoon-hyun · 2019-07-09T18:34:38Z

sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/ExpressionParserSuite.scala

@@ -737,6 +737,15 @@ class ExpressionParserSuite extends AnalysisTest {
    assertEqual("last(a)", Last('a, Literal(false)).toAggregateExpression())
  }

+  test("SPARK-28310 Support respect nulls keywords for first_value and last_value") {


For a new feature and improvement test case, we don't use SPARK-28310.

- test("SPARK-28310 Support respect nulls keywords for first_value and last_value") { + test("Support respect nulls keywords for first_value and last_value") {

dongjoon-hyun · 2019-07-09T18:36:07Z

cc @maropu

maropu

I left a minor comment and LGTM except for it.

maropu · 2019-07-09T23:24:42Z

docs/sql-keywords.md

@@ -159,6 +160,7 @@ Below is a list of all the keywords in Spark SQL.
  <tr><td>LIMIT</td><td>non-reserved</td><td>non-reserved</td><td>non-reserved</td></tr>
  <tr><td>LINES</td><td>non-reserved</td><td>non-reserved</td><td>non-reserved</td></tr>
  <tr><td>LIST</td><td>non-reserved</td><td>non-reserved</td><td>non-reserved</td></tr>
+  <tr><td>LIST_VALUE</td><td>non-reserved</td><td>non-reserved</td><td>reserved</td></tr>


I think the two words should be reserved in spark (ansi=true). Also, you need to update TableIdentifierParserSuite.

SparkQA · 2019-07-10T04:40:48Z

Test build #107428 has finished for PR 25082 at commit b3787f5.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

wangyum · 2019-07-10T04:51:36Z

retest this please

SparkQA · 2019-07-10T06:02:21Z

Test build #107426 has finished for PR 25082 at commit b052522.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2019-07-10T06:27:33Z

Test build #107435 has finished for PR 25082 at commit b3787f5.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2019-07-10T10:56:50Z

Test build #107446 has finished for PR 25082 at commit 353becd.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

dongjoon-hyun · 2019-07-10T14:38:00Z

Thank you, @lipzhu and @maropu . Merged to master.

beliefer · 2020-02-04T08:30:23Z

I have checked PostgreSQL, Vertica, Oracle, Redshift, Presto, Teradata, FIRST_VALUE|LAST_VALUE is always used as a window function, not as an aggregate function.
cc @gatorsmile @cloud-fan @dongjoon-hyun @wangyum

gatorsmile · 2020-02-04T08:33:09Z

Yes. We need to revert this commit and then submit a proper support later.

@maropu @dongjoon-hyun WDYT?

cloud-fan · 2020-02-04T12:15:28Z

I've checked the oracle document. FIRST_VALUE is not simply an alias of FIRST.

FIRST can be used as aggregate functions and can omit the OVER clause: https://docs.oracle.com/en/database/oracle/oracle-database/18/sqlrf/FIRST.html#GUID-85AB9246-0E0A-44A1-A7E6-4E57502E9238

FIRST_VALUE can only be a window function and can't omit the OVER clause: https://docs.oracle.com/en/database/oracle/oracle-database/18/sqlrf/FIRST_VALUE.html#GUID-D454EC3F-370C-4C64-9B11-33FCB10D95EC

I think we should revert it and rethink it.

maropu · 2020-02-04T14:28:23Z

Ur, I missed the behaviour. Yea, +1 for the revert.

dongjoon-hyun · 2020-02-04T21:12:57Z

Oops. Got it. Let me revert this. Thank you.

dongjoon-hyun · 2020-02-04T21:21:04Z

I created a reverting PR for this. If a test passes, I'll merge that into master/3.0.

Revert "[SPARK-28310][SQL] Support (FIRST_VALUE|LAST_VALUE)(expr[ (IGNORE|RESPECT) NULLS]?) syntax" #27458

beliefer · 2020-02-05T00:41:50Z

I have created two ticket about them.
https://issues.apache.org/jira/browse/SPARK-30726
https://issues.apache.org/jira/browse/SPARK-30727

dongjoon-hyun · 2020-02-05T01:14:47Z

@beliefer . Why not reusing SPARK-28310? After reverting, SPARK-28310 will be reopen.

…NORE|RESPECT) NULLS]?) syntax" ### What changes were proposed in this pull request? This reverts commit b89c3de. ### Why are the changes needed? `FIRST_VALUE` is used only for window expression. Please see the discussion on #25082 . ### Does this PR introduce any user-facing change? Yes. ### How was this patch tested? Pass the Jenkins. Closes #27458 from dongjoon-hyun/SPARK-28310. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>

…NORE|RESPECT) NULLS]?) syntax" ### What changes were proposed in this pull request? This reverts commit b89c3de. ### Why are the changes needed? `FIRST_VALUE` is used only for window expression. Please see the discussion on #25082 . ### Does this PR introduce any user-facing change? Yes. ### How was this patch tested? Pass the Jenkins. Closes #27458 from dongjoon-hyun/SPARK-28310. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com> (cherry picked from commit 8987169) Signed-off-by: Dongjoon Hyun <dhyun@apple.com>

beliefer · 2020-02-05T03:12:19Z

@dongjoon-hyun You said right. We should reuse SPARK-28310. I will remove the two ticket I created.

dongjoon-hyun · 2020-02-05T04:22:50Z

Yes. Reusing the same JIRA ID is good for traceability.

supprt ansi sql grammar:first_value/last_value(expression, [ignore/re…

678ccb7

…spect nulls])

lipzhu changed the title ~~[SPARK-28310][SQL]Support ansi sql grammar:first_value/last_value(expression, [ignore/respect nulls])~~ [SPARK-28310][SQL]Support ANSI SQL grammar:first_value/last_value(expression, [ignore/respect nulls]) Jul 9, 2019

dongjoon-hyun added the SQL label Jul 9, 2019

dongjoon-hyun changed the title ~~[SPARK-28310][SQL]Support ANSI SQL grammar:first_value/last_value(expression, [ignore/respect nulls])~~ [SPARK-28310][SQL] Support ANSI SQL grammar:first_value/last_value(expression, [ignore/respect nulls]) Jul 9, 2019

dongjoon-hyun reviewed Jul 9, 2019

View reviewed changes

maropu requested changes Jul 9, 2019

View reviewed changes

Zhu, Lipeng added 2 commits July 10, 2019 10:40

typo and keywords list update

b052522

remove from ansi mode

b3787f5

first_value, last_value ansi is true

353becd

dongjoon-hyun changed the title ~~[SPARK-28310][SQL] Support ANSI SQL grammar:first_value/last_value(expression, [ignore/respect nulls])~~ [SPARK-28310][SQL] Support (FIRST_VALUE|LAST_VALUE)(expression, [(IGNORE|RESPECT) nulls]) syntax Jul 10, 2019

dongjoon-hyun changed the title ~~[SPARK-28310][SQL] Support (FIRST_VALUE|LAST_VALUE)(expression, [(IGNORE|RESPECT) nulls]) syntax~~ [SPARK-28310][SQL] Support (FIRST_VALUE|LAST_VALUE)(expression[, (IGNORE|RESPECT) nulls]?) syntax Jul 10, 2019

dongjoon-hyun changed the title ~~[SPARK-28310][SQL] Support (FIRST_VALUE|LAST_VALUE)(expression[, (IGNORE|RESPECT) nulls]?) syntax~~ [SPARK-28310][SQL] Support (FIRST_VALUE|LAST_VALUE)(expr[, (IGNORE|RESPECT) nulls]?) syntax Jul 10, 2019

dongjoon-hyun changed the title ~~[SPARK-28310][SQL] Support (FIRST_VALUE|LAST_VALUE)(expr[, (IGNORE|RESPECT) nulls]?) syntax~~ [SPARK-28310][SQL] Support (FIRST_VALUE|LAST_VALUE)(expr[ (IGNORE|RESPECT) nulls]?) syntax Jul 10, 2019

dongjoon-hyun changed the title ~~[SPARK-28310][SQL] Support (FIRST_VALUE|LAST_VALUE)(expr[ (IGNORE|RESPECT) nulls]?) syntax~~ [SPARK-28310][SQL] Support (FIRST_VALUE|LAST_VALUE)(expr[ (IGNORE|RESPECT) NULLS]?) syntax Jul 10, 2019

dongjoon-hyun closed this in b89c3de Jul 10, 2019

dongjoon-hyun mentioned this pull request Feb 4, 2020

Revert "[SPARK-28310][SQL] Support (FIRST_VALUE|LAST_VALUE)(expr[ (IGNORE|RESPECT) NULLS]?) syntax" #27458

Closed

This was referenced Feb 5, 2020

[SPARK-30708][SQL] Fix parse exception for first_value and last_value #27442

Closed

[SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function #27440

Closed

beliefer mentioned this pull request Jun 28, 2020

[WIP][SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function #28685

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-28310][SQL] Support (FIRST_VALUE|LAST_VALUE)(expr[ (IGNORE|RESPECT) NULLS]?) syntax #25082

[SPARK-28310][SQL] Support (FIRST_VALUE|LAST_VALUE)(expr[ (IGNORE|RESPECT) NULLS]?) syntax #25082

lipzhu commented Jul 9, 2019 •

edited

dongjoon-hyun commented Jul 9, 2019

dongjoon-hyun commented Jul 9, 2019

dongjoon-hyun commented Jul 9, 2019

SparkQA commented Jul 9, 2019

dongjoon-hyun Jul 9, 2019

maropu Jul 9, 2019

dongjoon-hyun Jul 9, 2019

dongjoon-hyun commented Jul 9, 2019

maropu left a comment

maropu Jul 9, 2019

SparkQA commented Jul 10, 2019

wangyum commented Jul 10, 2019

SparkQA commented Jul 10, 2019

SparkQA commented Jul 10, 2019

SparkQA commented Jul 10, 2019

dongjoon-hyun commented Jul 10, 2019

beliefer commented Feb 4, 2020

gatorsmile commented Feb 4, 2020 •

edited

cloud-fan commented Feb 4, 2020

maropu commented Feb 4, 2020 •

edited

dongjoon-hyun commented Feb 4, 2020

dongjoon-hyun commented Feb 4, 2020

beliefer commented Feb 5, 2020

dongjoon-hyun commented Feb 5, 2020

beliefer commented Feb 5, 2020

dongjoon-hyun commented Feb 5, 2020

[SPARK-28310][SQL] Support (FIRST_VALUE|LAST_VALUE)(expr[ (IGNORE|RESPECT) NULLS]?) syntax #25082

[SPARK-28310][SQL] Support (FIRST_VALUE|LAST_VALUE)(expr[ (IGNORE|RESPECT) NULLS]?) syntax #25082

Conversation

lipzhu commented Jul 9, 2019 • edited

What changes were proposed in this pull request?

How was this patch tested?

dongjoon-hyun commented Jul 9, 2019

dongjoon-hyun commented Jul 9, 2019

dongjoon-hyun commented Jul 9, 2019

SparkQA commented Jul 9, 2019

dongjoon-hyun Jul 9, 2019

Choose a reason for hiding this comment

maropu Jul 9, 2019

Choose a reason for hiding this comment

dongjoon-hyun Jul 9, 2019

Choose a reason for hiding this comment

dongjoon-hyun commented Jul 9, 2019

maropu left a comment

Choose a reason for hiding this comment

maropu Jul 9, 2019

Choose a reason for hiding this comment

SparkQA commented Jul 10, 2019

wangyum commented Jul 10, 2019

SparkQA commented Jul 10, 2019

SparkQA commented Jul 10, 2019

SparkQA commented Jul 10, 2019

dongjoon-hyun commented Jul 10, 2019

beliefer commented Feb 4, 2020

gatorsmile commented Feb 4, 2020 • edited

cloud-fan commented Feb 4, 2020

maropu commented Feb 4, 2020 • edited

dongjoon-hyun commented Feb 4, 2020

dongjoon-hyun commented Feb 4, 2020

beliefer commented Feb 5, 2020

dongjoon-hyun commented Feb 5, 2020

beliefer commented Feb 5, 2020

dongjoon-hyun commented Feb 5, 2020

lipzhu commented Jul 9, 2019 •

edited

gatorsmile commented Feb 4, 2020 •

edited

maropu commented Feb 4, 2020 •

edited