Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-27578][SQL] Support INTERVAL ... HOUR TO SECOND syntax #24472

Closed
wants to merge 6 commits into from

Conversation

lipzhu
Copy link
Contributor

@lipzhu lipzhu commented Apr 26, 2019

What changes were proposed in this pull request?

Currently, SparkSQL can support interval format like this.

SELECT INTERVAL '0 23:59:59.155' DAY TO SECOND

Like Presto/Teradata, this PR aims to support grammar like below.

SELECT INTERVAL '23:59:59.155' HOUR TO SECOND

Although we can add a new function for this pattern, we had better extend the existing code to handle a missing day case. So, the following is also supported.

SELECT INTERVAL '23:59:59.155' DAY TO SECOND
SELECT INTERVAL '1 23:59:59.155' HOUR TO SECOND

Currently Vertica/Teradata/Postgresql/SQL Server have fully support of below interval functions.

  • interval ... year to month
  • interval ... day to hour
  • interval ... day to minute
  • interval ... day to second
  • interval ... hour to minute
  • interval ... hour to second
  • interval ... minute to second

https://www.vertica.com/docs/9.2.x/HTML/Content/Authoring/SQLReferenceManual/LanguageElements/Literals/interval-qualifier.htm
https://github.com/postgres/postgres/blob/df1a699e5ba3232f373790b2c9485ddf720c4a70/src/test/regress/sql/interval.sql#L180-L203
https://docs.teradata.com/reader/S0Fw2AVH8ff3MDA0wDOHlQ/KdCtT3pYFo~_enc8~kGKVw
https://docs.microsoft.com/en-us/sql/odbc/reference/appendixes/interval-literals?view=sql-server-2017

How was this patch tested?

Pass the Jenkins with the updated test cases.

@wangyum
Copy link
Member

wangyum commented May 21, 2019

cc @MaxGekk

Copy link
Member

@srowen srowen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems reasonable but I don't know this part well

@@ -56,6 +56,9 @@ private static String unitRegex(String unit) {
private static Pattern dayTimePattern =
Pattern.compile("^(?:['|\"])?([+|-])?(\\d+) (\\d+):(\\d+):(\\d+)(\\.(\\d+))?(?:['|\"])?$");

private static Pattern hourTimePattern =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW all of these Patterns could be final.

*
*/
public static CalendarInterval fromHourTimeString(String s) throws IllegalArgumentException {
CalendarInterval result;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

result is really superfluous here

@lipzhu
Copy link
Contributor Author

lipzhu commented May 24, 2019

@srowen Thanks for your suggestion.

@dongjoon-hyun
Copy link
Member

ok to test

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, @lipzhu .
Thank you for the contribution. I left a few comments. Could you update the PR. We can proceed the next round review after fixing the basic stuffs. Thanks!

@dongjoon-hyun dongjoon-hyun changed the title [SPARK-27578][SQL]Add support for "interval '23:59:59' hour to second" [SPARK-27578][SQL] Add support for "interval '23:59:59' hour to second" May 26, 2019
@SparkQA
Copy link

SparkQA commented May 26, 2019

Test build #105784 has finished for PR 24472 at commit 688c53b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@lipzhu
Copy link
Contributor Author

lipzhu commented May 27, 2019

@dongjoon-hyun Thanks for your review and suggestions. I just did some changes according to your suggestions.

@SparkQA
Copy link

SparkQA commented May 27, 2019

Test build #105833 has finished for PR 24472 at commit 424fa85.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for updating, @lipzhu . I did another round of reviews and realized that we don't need to add a new function of 30 lines. What we need is a pattern change. I made a PR to you. Please review and merge that.

@lipzhu
Copy link
Contributor Author

lipzhu commented May 28, 2019

@dongjoon-hyun Thanks for your update to remove the duplicate codes.

@dongjoon-hyun dongjoon-hyun changed the title [SPARK-27578][SQL] Add support for "interval '23:59:59' hour to second" [SPARK-27578][SQL] Support INTERVAL ... HOUR TO SECOND syntax May 28, 2019
@dongjoon-hyun
Copy link
Member

dongjoon-hyun commented May 28, 2019

Hi, @gatorsmile and @cloud-fan .

Could you give us some directional advice, please?

  • First, this PR wants to support INTERVAL ... HOUR TO SECOND like INTERVAL ... DAY TO SECOND like Presto/Terradata. It looks reasonable to me, too.
  • Second, originally, this PR added a new pattern and new function (which is similar to the existing one). To avoid maintaining two similar functions, I recommended to extend the existing pattern and handling DAY and HOUR with the same function. To sum up, we will support 2~4 additionally.
  1. SELECT INTERVAL '0 23:59:59.155' DAY TO SECOND (Current Spark)
  2. SELECT INTERVAL '23:59:59.155' HOUR TO SECOND
  3. SELECT INTERVAL '23:59:59.155' DAY TO SECOND
  4. SELECT INTERVAL '1 23:59:59.155' HOUR TO SECOND

If you think these are okay, I want to merge this PR. How do you think about this?

@SparkQA
Copy link

SparkQA commented May 28, 2019

Test build #105852 has finished for PR 24472 at commit 29fcc08.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member

Retest this please.

@SparkQA
Copy link

SparkQA commented May 30, 2019

Test build #105948 has finished for PR 24472 at commit 0ee3bc9.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@wangyum
Copy link
Member

wangyum commented May 30, 2019

retest this please

@SparkQA
Copy link

SparkQA commented May 30, 2019

Test build #105958 has finished for PR 24472 at commit 0ee3bc9.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@wangyum
Copy link
Member

wangyum commented May 31, 2019

retest this please

@SparkQA
Copy link

SparkQA commented May 31, 2019

Test build #105993 has finished for PR 24472 at commit 0ee3bc9.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@wangyum
Copy link
Member

wangyum commented May 31, 2019

retest this please

@SparkQA
Copy link

SparkQA commented May 31, 2019

Test build #106001 has finished for PR 24472 at commit 0ee3bc9.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@lipzhu
Copy link
Contributor Author

lipzhu commented May 31, 2019

List more patterns which are supported in Teradata and PostgreSQL but not supported yet in SparkSQL.

  • interval ... day to hour
  • interval ... day to minute
  • interval ... day to second
  • interval ... hour to minute
  • interval ... minute to second
    ...

@dongjoon-hyun
Copy link
Member

Yep. I know. @lipzhu :) There is a trade-off always. Although Spark SQL was designed to be compatible with Hive, but we officially have unsupported features, too.

Since each SQL engine works differently, Apache Spark adds features based on the trade-off. Apache Spark cannot embrace all SQL engine behavior in terms of compatibility, and it's technically impossible.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, @cloud-fan and @gatorsmile . As I asked before, I support this feature.

We can accept in this way or the original way (by reverting the last two commit in order to keep two patterns).

For now, I prefer the current one with the minimal changes, but also am open to both ways. Please let us know PMC's opinions (positive or negative). If there is no further comments, I'll proceed to merge this into master branch in the next week.

@gatorsmile
Copy link
Member

In the next one month, Wenchen might not be able to respond your pings very soon. Normally, when we add a support like this, we need to know whether all the other SQL engines have similar supports?

@lipzhu Could you help do an investigation? For example, Oracle, DB2, SQL Server, MySQL, PostgreSQL, Hive?

@@ -1770,7 +1770,7 @@ class AstBuilder(conf: SQLConf) extends SqlBaseBaseVisitor[AnyRef] with Logging
* Create a [[CalendarInterval]] for a unit value pair. Two unit configuration types are
* supported:
* - Single unit.
* - From-To unit (only 'YEAR TO MONTH' and 'DAY TO SECOND' are supported).
* - From-To unit (only 'YEAR TO MONTH' and 'DAY TO SECOND' and 'HOUR to SECOND' are supported).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It sounds like we only added a very specific case. https://docs.teradata.com/reader/S0Fw2AVH8ff3MDA0wDOHlQ/RADkJCor1nDoyeD2T1VX5A

Any reason?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is because INTERVAL ... HOUR to SECOND are most used in our existing scripts. If needed, maybe other cases like INTERVAL .. X to X can be contributed.

@lipzhu
Copy link
Contributor Author

lipzhu commented Jun 3, 2019

@gatorsmile
Just as I list before. Postgresql/Teradata/SQL Server have fully support of below interval functions.
Postgresqlhttps://github.com/postgres/postgres/blob/df1a699e5ba3232f373790b2c9485ddf720c4a70/src/test/regress/sql/interval.sql#L180-L203
Teradatahttps://docs.teradata.com/reader/S0Fw2AVH8ff3MDA0wDOHlQ/KdCtT3pYFo~_enc8~kGKVw
SqlServerhttps://docs.microsoft.com/en-us/sql/odbc/reference/appendixes/interval-literals?view=sql-server-2017

  • interval ... year to month
  • interval ... day to hour
  • interval ... day to minute
  • interval ... day to second
  • interval ... hour to minute
  • interval ... minute to second
    ...

The other DB engines

  • Hive/ORACLE are partial support the interval functions I list.
  • DB2/MySQL are totally not supported.

@wangyum
Copy link
Member

wangyum commented Jun 4, 2019

@lipzhu
Copy link
Contributor Author

lipzhu commented Jun 5, 2019

Vertica also support HOUR TO SECOND:
https://www.vertica.com/docs/9.2.x/HTML/Content/Authoring/SQLReferenceManual/LanguageElements/Literals/interval-qualifier.htm

@lipzhu Could you add #24472 (comment) and this to PR description?

@wangyum Just update the PR description.

@dongjoon-hyun
Copy link
Member

Retest this please.

@dongjoon-hyun
Copy link
Member

So, could you give us some further direction, @gatorsmile ?

@SparkQA
Copy link

SparkQA commented Jun 9, 2019

Test build #106319 has finished for PR 24472 at commit 0ee3bc9.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member

Hi, @gatorsmile . If you want a full support like PostgreSQL as a part of SPARK-27764, please make a decision for @lipzhu to go for it. We don't have much time for Spark 3.0.0. And, 3.1 will be next year.

@@ -1770,7 +1770,7 @@ class AstBuilder(conf: SQLConf) extends SqlBaseBaseVisitor[AnyRef] with Logging
* Create a [[CalendarInterval]] for a unit value pair. Two unit configuration types are
* supported:
* - Single unit.
* - From-To unit (only 'YEAR TO MONTH' and 'DAY TO SECOND' are supported).
* - From-To unit (only 'YEAR TO MONTH' and 'DAY TO SECOND' and 'HOUR to SECOND' are supported).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: 'YEAR TO MONTH' and 'DAY TO SECOND' -> 'YEAR TO MONTH', 'DAY TO SECOND'

@HyukjinKwon
Copy link
Member

Will we add interval ... minute to second support too to match it to PostgreSQL?

Copy link
Member

@HyukjinKwon HyukjinKwon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change looks good

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for review and approval, @HyukjinKwon .
I'll merge this to the master to move forward. We can support more syntax gradually.

Thank you, @lipzhu , @gatorsmile , @wangyum , too.

emanuelebardelli pushed a commit to emanuelebardelli/spark that referenced this pull request Jun 15, 2019
## What changes were proposed in this pull request?

Currently, SparkSQL can support interval format like this.
```sql
SELECT INTERVAL '0 23:59:59.155' DAY TO SECOND
 ```

Like Presto/Teradata, this PR aims to support grammar like below.
```sql
SELECT INTERVAL '23:59:59.155' HOUR TO SECOND
```

Although we can add a new function for this pattern, we had better extend the existing code to handle a missing day case. So, the following is also supported.
```sql
SELECT INTERVAL '23:59:59.155' DAY TO SECOND
SELECT INTERVAL '1 23:59:59.155' HOUR TO SECOND
```
Currently Vertica/Teradata/Postgresql/SQL Server have fully support of below interval functions.
- interval ... year to month
- interval ... day to hour
- interval ... day to minute
- interval ... day to second
- interval ... hour to minute
- interval ... hour to second
- interval ... minute to second

https://www.vertica.com/docs/9.2.x/HTML/Content/Authoring/SQLReferenceManual/LanguageElements/Literals/interval-qualifier.htm
https://github.com/postgres/postgres/blob/df1a699e5ba3232f373790b2c9485ddf720c4a70/src/test/regress/sql/interval.sql#L180-L203
https://docs.teradata.com/reader/S0Fw2AVH8ff3MDA0wDOHlQ/KdCtT3pYFo~_enc8~kGKVw
https://docs.microsoft.com/en-us/sql/odbc/reference/appendixes/interval-literals?view=sql-server-2017

## How was this patch tested?

Pass the Jenkins with the updated test cases.

Closes apache#24472 from lipzhu/SPARK-27578.

Lead-authored-by: Zhu, Lipeng <lipzhu@ebay.com>
Co-authored-by: Dongjoon Hyun <dhyun@apple.com>
Co-authored-by: Lipeng Zhu <lipzhu@icloud.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
@lipzhu
Copy link
Contributor Author

lipzhu commented Jun 19, 2019

@dongjoon-hyun , Thanks for your patience on this.

I'll merge this to the master to move forward. We can support more syntax gradually.

According to the 03 ANSI SQL rule
I just create a JIRA to follow the other interval type conversion.

@dongjoon-hyun
Copy link
Member

dongjoon-hyun commented Jun 19, 2019

Yep. Go for it please, @lipzhu !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
7 participants