-
Notifications
You must be signed in to change notification settings - Fork 28.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-29393][SQL] Add make_interval
function
#26446
Conversation
Test build #113499 has finished for PR 26446 at commit
|
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/intervalExpressions.scala
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/intervalExpressions.scala
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/intervalExpressions.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/intervalExpressions.scala
Outdated
Show resolved
Hide resolved
Thank you for adding this, @MaxGekk ! Only minor comments. |
Test build #113530 has finished for PR 26446 at commit
|
Could you resolve the conflicts, @MaxGekk ? |
# Conflicts: # sql/core/src/test/resources/sql-tests/inputs/interval.sql # sql/core/src/test/resources/sql-tests/results/interval.sql.out
Test build #113546 has finished for PR 26446 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM. Merged to master.
Thank you always for keeping progress on Apache Spark, @MaxGekk !
// Accept `secs` as DecimalType to avoid loosing precision of microseconds while converting | ||
// them to the fractional part of `secs`. | ||
override def inputTypes: Seq[AbstractDataType] = Seq(IntegerType, IntegerType, IntegerType, | ||
IntegerType, IntegerType, IntegerType, DecimalType(8, 6)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@MaxGekk DecimalType(8, 6)
for seconds makes it overflow to null if the input expression has values > 100.
Simba reported it when testing using this function to implement TIMESTAMPADD ODBC translation: {fn TIMESTAMPADD(SECONDS, integer_exp, timestamp)} -> timestamp + make_interval(0, 0, 0, 0, 0, 0, integer_exp)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for filing a JIRA, @juliuszsompolski .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is the PR #28873 in which I bumped precision of seconds+fractions
### What changes were proposed in this pull request? This pull request exposes the `make_interval` function, [as suggested here](#31000 (review)), and as agreed to [here](#31000 (comment)) and [here](#31000 (comment)). This powerful little function allows for idiomatic datetime arithmetic via the Scala API: ```scala // add two hours df.withColumn("plus_2_hours", col("first_datetime") + make_interval(hours = lit(2))) // subtract one week and 30 seconds col("d") - make_interval(weeks = lit(1), secs = lit(30)) ``` The `make_interval` [SQL function](#26446) already exists. Here is [the JIRA ticket](https://issues.apache.org/jira/browse/SPARK-33995) for this PR. ### Why are the changes needed? The Spark API makes it easy to perform datetime addition / subtraction with months (`add_months`) and days (`date_add`). Users need to write code like this to perform datetime addition with years, weeks, hours, minutes, or seconds: ```scala df.withColumn("plus_2_hours", expr("first_datetime + INTERVAL 2 hours")) ``` We don't want to force users to manipulate SQL strings when they're using the Scala API. ### Does this PR introduce _any_ user-facing change? Yes, this PR adds `make_interval` to the `org.apache.spark.sql.functions` API. This single function will benefit a lot of users. It's a small increase in the surface of the API for a big gain. ### How was this patch tested? This was tested via unit tests. cc: MaxGekk Closes #31073 from MrPowers/SPARK-33995. Authored-by: MrPowers <matthewkevinpowers@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
What changes were proposed in this pull request?
In the PR, I propose new expression
MakeInterval
and register it as the functionmake_interval
. The function accepts the following parameters:years
- the number of years in the interval, positive or negative. The parameter is multiplied by 12, and added to interval'smonths
.months
- the number of months in the interval, positive or negative.weeks
- the number of months in the interval, positive or negative. The parameter is multiplied by 7, and added to interval'sdays
.hours
,mins
- the number of hours and minutes. The parameters can be negative or positive. They are converted to microseconds and added to interval'smicroseconds
.seconds
- the number of seconds with the fractional part in microseconds precision. It is converted to microseconds, and added to total interval'smicroseconds
ashours
andminutes
.For example:
Why are the changes needed?
INTERVAL
columns from other columns containingyears
,months
...seconds
. Currently, users can make anINTERVAL
column from other columns only by constructing aSTRING
column and cast it toINTERVAL
. Have a look at theIntervalBenchmark
as an example.Does this PR introduce any user-facing change?
No
How was this patch tested?
MakeInterval
expression toIntervalExpressionsSuite
interval.sql