Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-28310][SQL] Support (FIRST_VALUE|LAST_VALUE)(expr[ (IGNORE|RESPECT) NULLS]?) syntax #25082

Closed
wants to merge 4 commits into from

Conversation

lipzhu
Copy link
Contributor

@lipzhu lipzhu commented Jul 9, 2019

What changes were proposed in this pull request?

According to the ANSI SQL 2011
image

Below are Teradata, Oracle, Redshift which already support this grammar.

How was this patch tested?

UT.

@lipzhu lipzhu changed the title [SPARK-28310][SQL]Support ansi sql grammar:first_value/last_value(expression, [ignore/respect nulls]) [SPARK-28310][SQL]Support ANSI SQL grammar:first_value/last_value(expression, [ignore/respect nulls]) Jul 9, 2019
@dongjoon-hyun
Copy link
Member

ok to test

@dongjoon-hyun dongjoon-hyun changed the title [SPARK-28310][SQL]Support ANSI SQL grammar:first_value/last_value(expression, [ignore/respect nulls]) [SPARK-28310][SQL] Support ANSI SQL grammar:first_value/last_value(expression, [ignore/respect nulls]) Jul 9, 2019
@dongjoon-hyun
Copy link
Member

Thank you for contribution, @lipzhu .

@dongjoon-hyun
Copy link
Member

Oops. I got it. Thank you for update. If PostgreSQL doesn't support this, this cannot be part of SPARK-27764.

@SparkQA
Copy link

SparkQA commented Jul 9, 2019

Test build #107384 has finished for PR 25082 at commit 678ccb7.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@@ -159,6 +160,7 @@ Below is a list of all the keywords in Spark SQL.
<tr><td>LIMIT</td><td>non-reserved</td><td>non-reserved</td><td>non-reserved</td></tr>
<tr><td>LINES</td><td>non-reserved</td><td>non-reserved</td><td>non-reserved</td></tr>
<tr><td>LIST</td><td>non-reserved</td><td>non-reserved</td><td>non-reserved</td></tr>
<tr><td>LIST_VALUE</td><td>non-reserved</td><td>non-reserved</td><td>reserved</td></tr>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is wrong because you want LAST_VALUE. :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the two words should be reserved in spark (ansi=true). Also, you need to update TableIdentifierParserSuite.

@@ -737,6 +737,15 @@ class ExpressionParserSuite extends AnalysisTest {
assertEqual("last(a)", Last('a, Literal(false)).toAggregateExpression())
}

test("SPARK-28310 Support respect nulls keywords for first_value and last_value") {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For a new feature and improvement test case, we don't use SPARK-28310.

-  test("SPARK-28310 Support respect nulls keywords for first_value and last_value") {
+  test("Support respect nulls keywords for first_value and last_value") {

@dongjoon-hyun
Copy link
Member

cc @maropu

Copy link
Member

@maropu maropu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a minor comment and LGTM except for it.

@@ -159,6 +160,7 @@ Below is a list of all the keywords in Spark SQL.
<tr><td>LIMIT</td><td>non-reserved</td><td>non-reserved</td><td>non-reserved</td></tr>
<tr><td>LINES</td><td>non-reserved</td><td>non-reserved</td><td>non-reserved</td></tr>
<tr><td>LIST</td><td>non-reserved</td><td>non-reserved</td><td>non-reserved</td></tr>
<tr><td>LIST_VALUE</td><td>non-reserved</td><td>non-reserved</td><td>reserved</td></tr>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the two words should be reserved in spark (ansi=true). Also, you need to update TableIdentifierParserSuite.

@SparkQA
Copy link

SparkQA commented Jul 10, 2019

Test build #107428 has finished for PR 25082 at commit b3787f5.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@wangyum
Copy link
Member

wangyum commented Jul 10, 2019

retest this please

@SparkQA
Copy link

SparkQA commented Jul 10, 2019

Test build #107426 has finished for PR 25082 at commit b052522.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 10, 2019

Test build #107435 has finished for PR 25082 at commit b3787f5.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 10, 2019

Test build #107446 has finished for PR 25082 at commit 353becd.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun dongjoon-hyun changed the title [SPARK-28310][SQL] Support ANSI SQL grammar:first_value/last_value(expression, [ignore/respect nulls]) [SPARK-28310][SQL] Support (FIRST_VALUE|LAST_VALUE)(expression, [(IGNORE|RESPECT) nulls]) syntax Jul 10, 2019
@dongjoon-hyun
Copy link
Member

Thank you, @lipzhu and @maropu . Merged to master.

@dongjoon-hyun dongjoon-hyun changed the title [SPARK-28310][SQL] Support (FIRST_VALUE|LAST_VALUE)(expression, [(IGNORE|RESPECT) nulls]) syntax [SPARK-28310][SQL] Support (FIRST_VALUE|LAST_VALUE)(expression[, (IGNORE|RESPECT) nulls]?) syntax Jul 10, 2019
@dongjoon-hyun dongjoon-hyun changed the title [SPARK-28310][SQL] Support (FIRST_VALUE|LAST_VALUE)(expression[, (IGNORE|RESPECT) nulls]?) syntax [SPARK-28310][SQL] Support (FIRST_VALUE|LAST_VALUE)(expr[, (IGNORE|RESPECT) nulls]?) syntax Jul 10, 2019
@dongjoon-hyun dongjoon-hyun changed the title [SPARK-28310][SQL] Support (FIRST_VALUE|LAST_VALUE)(expr[, (IGNORE|RESPECT) nulls]?) syntax [SPARK-28310][SQL] Support (FIRST_VALUE|LAST_VALUE)(expr[ (IGNORE|RESPECT) nulls]?) syntax Jul 10, 2019
@dongjoon-hyun dongjoon-hyun changed the title [SPARK-28310][SQL] Support (FIRST_VALUE|LAST_VALUE)(expr[ (IGNORE|RESPECT) nulls]?) syntax [SPARK-28310][SQL] Support (FIRST_VALUE|LAST_VALUE)(expr[ (IGNORE|RESPECT) NULLS]?) syntax Jul 10, 2019
@beliefer
Copy link
Contributor

beliefer commented Feb 4, 2020

I have checked PostgreSQL, Vertica, Oracle, Redshift, Presto, Teradata, FIRST_VALUE|LAST_VALUE is always used as a window function, not as an aggregate function.
cc @gatorsmile @cloud-fan @dongjoon-hyun @wangyum

@gatorsmile
Copy link
Member

gatorsmile commented Feb 4, 2020

Yes. We need to revert this commit and then submit a proper support later.

@maropu @dongjoon-hyun WDYT?

@cloud-fan
Copy link
Contributor

I've checked the oracle document. FIRST_VALUE is not simply an alias of FIRST.

FIRST can be used as aggregate functions and can omit the OVER clause: https://docs.oracle.com/en/database/oracle/oracle-database/18/sqlrf/FIRST.html#GUID-85AB9246-0E0A-44A1-A7E6-4E57502E9238

FIRST_VALUE can only be a window function and can't omit the OVER clause: https://docs.oracle.com/en/database/oracle/oracle-database/18/sqlrf/FIRST_VALUE.html#GUID-D454EC3F-370C-4C64-9B11-33FCB10D95EC

I think we should revert it and rethink it.

@maropu
Copy link
Member

maropu commented Feb 4, 2020

Ur, I missed the behaviour. Yea, +1 for the revert.

@dongjoon-hyun
Copy link
Member

Oops. Got it. Let me revert this. Thank you.

@dongjoon-hyun
Copy link
Member

I created a reverting PR for this. If a test passes, I'll merge that into master/3.0.

@beliefer
Copy link
Contributor

beliefer commented Feb 5, 2020

@dongjoon-hyun
Copy link
Member

@beliefer . Why not reusing SPARK-28310? After reverting, SPARK-28310 will be reopen.

dongjoon-hyun added a commit that referenced this pull request Feb 5, 2020
…NORE|RESPECT) NULLS]?) syntax"

### What changes were proposed in this pull request?

This reverts commit b89c3de.

### Why are the changes needed?

`FIRST_VALUE` is used only for window expression. Please see the discussion on #25082 .

### Does this PR introduce any user-facing change?

Yes.

### How was this patch tested?

Pass the Jenkins.

Closes #27458 from dongjoon-hyun/SPARK-28310.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
dongjoon-hyun added a commit that referenced this pull request Feb 5, 2020
…NORE|RESPECT) NULLS]?) syntax"

### What changes were proposed in this pull request?

This reverts commit b89c3de.

### Why are the changes needed?

`FIRST_VALUE` is used only for window expression. Please see the discussion on #25082 .

### Does this PR introduce any user-facing change?

Yes.

### How was this patch tested?

Pass the Jenkins.

Closes #27458 from dongjoon-hyun/SPARK-28310.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
(cherry picked from commit 8987169)
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
@beliefer
Copy link
Contributor

beliefer commented Feb 5, 2020

@dongjoon-hyun You said right. We should reuse SPARK-28310. I will remove the two ticket I created.

@dongjoon-hyun
Copy link
Member

Yes. Reusing the same JIRA ID is good for traceability.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
9 participants