-
Notifications
You must be signed in to change notification settings - Fork 28k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-27578][SQL] Support INTERVAL ... HOUR TO SECOND syntax #24472
Conversation
cc @MaxGekk |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems reasonable but I don't know this part well
@@ -56,6 +56,9 @@ private static String unitRegex(String unit) { | |||
private static Pattern dayTimePattern = | |||
Pattern.compile("^(?:['|\"])?([+|-])?(\\d+) (\\d+):(\\d+):(\\d+)(\\.(\\d+))?(?:['|\"])?$"); | |||
|
|||
private static Pattern hourTimePattern = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW all of these Patterns could be final.
* | ||
*/ | ||
public static CalendarInterval fromHourTimeString(String s) throws IllegalArgumentException { | ||
CalendarInterval result; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
result is really superfluous here
@srowen Thanks for your suggestion. |
ok to test |
common/unsafe/src/main/java/org/apache/spark/unsafe/types/CalendarInterval.java
Outdated
Show resolved
Hide resolved
common/unsafe/src/main/java/org/apache/spark/unsafe/types/CalendarInterval.java
Outdated
Show resolved
Hide resolved
common/unsafe/src/main/java/org/apache/spark/unsafe/types/CalendarInterval.java
Outdated
Show resolved
Hide resolved
common/unsafe/src/main/java/org/apache/spark/unsafe/types/CalendarInterval.java
Outdated
Show resolved
Hide resolved
common/unsafe/src/main/java/org/apache/spark/unsafe/types/CalendarInterval.java
Outdated
Show resolved
Hide resolved
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/ExpressionParserSuite.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/ExpressionParserSuite.scala
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, @lipzhu .
Thank you for the contribution. I left a few comments. Could you update the PR. We can proceed the next round review after fixing the basic stuffs. Thanks!
Test build #105784 has finished for PR 24472 at commit
|
@dongjoon-hyun Thanks for your review and suggestions. I just did some changes according to your suggestions. |
Test build #105833 has finished for PR 24472 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for updating, @lipzhu . I did another round of reviews and realized that we don't need to add a new function of 30 lines. What we need is a pattern change. I made a PR to you. Please review and merge that.
common/unsafe/src/main/java/org/apache/spark/unsafe/types/CalendarInterval.java
Show resolved
Hide resolved
common/unsafe/src/main/java/org/apache/spark/unsafe/types/CalendarInterval.java
Outdated
Show resolved
Hide resolved
@dongjoon-hyun Thanks for your update to remove the duplicate codes. |
Hi, @gatorsmile and @cloud-fan . Could you give us some directional advice, please?
If you think these are okay, I want to merge this PR. How do you think about this? |
Test build #105852 has finished for PR 24472 at commit
|
Retest this please. |
Test build #105948 has finished for PR 24472 at commit
|
retest this please |
Test build #105958 has finished for PR 24472 at commit
|
retest this please |
Test build #105993 has finished for PR 24472 at commit
|
retest this please |
Test build #106001 has finished for PR 24472 at commit
|
List more patterns which are supported in Teradata and PostgreSQL but not supported yet in SparkSQL.
|
Yep. I know. @lipzhu :) There is a trade-off always. Although Spark SQL was designed to be compatible with Hive, but we officially have unsupported features, too. Since each SQL engine works differently, Apache Spark adds features based on the trade-off. Apache Spark cannot embrace all SQL engine behavior in terms of compatibility, and it's technically impossible. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, @cloud-fan and @gatorsmile . As I asked before, I support this feature.
We can accept in this way or the original way (by reverting the last two commit in order to keep two patterns).
For now, I prefer the current one with the minimal changes, but also am open to both ways. Please let us know PMC's opinions (positive or negative). If there is no further comments, I'll proceed to merge this into master
branch in the next week.
In the next one month, Wenchen might not be able to respond your pings very soon. Normally, when we add a support like this, we need to know whether all the other SQL engines have similar supports? @lipzhu Could you help do an investigation? For example, Oracle, DB2, SQL Server, MySQL, PostgreSQL, Hive? |
@@ -1770,7 +1770,7 @@ class AstBuilder(conf: SQLConf) extends SqlBaseBaseVisitor[AnyRef] with Logging | |||
* Create a [[CalendarInterval]] for a unit value pair. Two unit configuration types are | |||
* supported: | |||
* - Single unit. | |||
* - From-To unit (only 'YEAR TO MONTH' and 'DAY TO SECOND' are supported). | |||
* - From-To unit (only 'YEAR TO MONTH' and 'DAY TO SECOND' and 'HOUR to SECOND' are supported). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It sounds like we only added a very specific case. https://docs.teradata.com/reader/S0Fw2AVH8ff3MDA0wDOHlQ/RADkJCor1nDoyeD2T1VX5A
Any reason?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is because INTERVAL ... HOUR to SECOND
are most used in our existing scripts. If needed, maybe other cases like INTERVAL .. X to X
can be contributed.
@gatorsmile
The other DB engines
|
Vertica also support @lipzhu Could you add #24472 (comment) and this to PR description? |
@wangyum Just update the PR description. |
Retest this please. |
So, could you give us some further direction, @gatorsmile ? |
Test build #106319 has finished for PR 24472 at commit
|
Hi, @gatorsmile . If you want a full support like |
@@ -1770,7 +1770,7 @@ class AstBuilder(conf: SQLConf) extends SqlBaseBaseVisitor[AnyRef] with Logging | |||
* Create a [[CalendarInterval]] for a unit value pair. Two unit configuration types are | |||
* supported: | |||
* - Single unit. | |||
* - From-To unit (only 'YEAR TO MONTH' and 'DAY TO SECOND' are supported). | |||
* - From-To unit (only 'YEAR TO MONTH' and 'DAY TO SECOND' and 'HOUR to SECOND' are supported). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: 'YEAR TO MONTH' and 'DAY TO SECOND'
-> 'YEAR TO MONTH', 'DAY TO SECOND'
Will we add |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
change looks good
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for review and approval, @HyukjinKwon .
I'll merge this to the master to move forward. We can support more syntax gradually.
Thank you, @lipzhu , @gatorsmile , @wangyum , too.
## What changes were proposed in this pull request? Currently, SparkSQL can support interval format like this. ```sql SELECT INTERVAL '0 23:59:59.155' DAY TO SECOND ``` Like Presto/Teradata, this PR aims to support grammar like below. ```sql SELECT INTERVAL '23:59:59.155' HOUR TO SECOND ``` Although we can add a new function for this pattern, we had better extend the existing code to handle a missing day case. So, the following is also supported. ```sql SELECT INTERVAL '23:59:59.155' DAY TO SECOND SELECT INTERVAL '1 23:59:59.155' HOUR TO SECOND ``` Currently Vertica/Teradata/Postgresql/SQL Server have fully support of below interval functions. - interval ... year to month - interval ... day to hour - interval ... day to minute - interval ... day to second - interval ... hour to minute - interval ... hour to second - interval ... minute to second https://www.vertica.com/docs/9.2.x/HTML/Content/Authoring/SQLReferenceManual/LanguageElements/Literals/interval-qualifier.htm https://github.com/postgres/postgres/blob/df1a699e5ba3232f373790b2c9485ddf720c4a70/src/test/regress/sql/interval.sql#L180-L203 https://docs.teradata.com/reader/S0Fw2AVH8ff3MDA0wDOHlQ/KdCtT3pYFo~_enc8~kGKVw https://docs.microsoft.com/en-us/sql/odbc/reference/appendixes/interval-literals?view=sql-server-2017 ## How was this patch tested? Pass the Jenkins with the updated test cases. Closes apache#24472 from lipzhu/SPARK-27578. Lead-authored-by: Zhu, Lipeng <lipzhu@ebay.com> Co-authored-by: Dongjoon Hyun <dhyun@apple.com> Co-authored-by: Lipeng Zhu <lipzhu@icloud.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
@dongjoon-hyun , Thanks for your patience on this.
According to the 03 ANSI SQL rule |
Yep. Go for it please, @lipzhu ! |
What changes were proposed in this pull request?
Currently, SparkSQL can support interval format like this.
Like Presto/Teradata, this PR aims to support grammar like below.
Although we can add a new function for this pattern, we had better extend the existing code to handle a missing day case. So, the following is also supported.
Currently Vertica/Teradata/Postgresql/SQL Server have fully support of below interval functions.
https://www.vertica.com/docs/9.2.x/HTML/Content/Authoring/SQLReferenceManual/LanguageElements/Literals/interval-qualifier.htm
https://github.com/postgres/postgres/blob/df1a699e5ba3232f373790b2c9485ddf720c4a70/src/test/regress/sql/interval.sql#L180-L203
https://docs.teradata.com/reader/S0Fw2AVH8ff3MDA0wDOHlQ/KdCtT3pYFo~_enc8~kGKVw
https://docs.microsoft.com/en-us/sql/odbc/reference/appendixes/interval-literals?view=sql-server-2017
How was this patch tested?
Pass the Jenkins with the updated test cases.