Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-29393][SQL] Add make_interval function #26446

Closed
wants to merge 7 commits into from

Conversation

MaxGekk
Copy link
Member

@MaxGekk MaxGekk commented Nov 9, 2019

What changes were proposed in this pull request?

In the PR, I propose new expression MakeInterval and register it as the function make_interval. The function accepts the following parameters:

  • years - the number of years in the interval, positive or negative. The parameter is multiplied by 12, and added to interval's months.
  • months - the number of months in the interval, positive or negative.
  • weeks - the number of months in the interval, positive or negative. The parameter is multiplied by 7, and added to interval's days.
  • hours, mins - the number of hours and minutes. The parameters can be negative or positive. They are converted to microseconds and added to interval's microseconds.
  • seconds - the number of seconds with the fractional part in microseconds precision. It is converted to microseconds, and added to total interval's microseconds as hours and minutes.

For example:

spark-sql> select make_interval(2019, 11, 1, 1, 12, 30, 01.001001);
2019 years 11 months 8 days 12 hours 30 minutes 1.001001 seconds

Why are the changes needed?

  • To improve user experience with Spark SQL, and allow users making INTERVAL columns from other columns containing years, months ... seconds. Currently, users can make an INTERVAL column from other columns only by constructing a STRING column and cast it to INTERVAL. Have a look at the IntervalBenchmark as an example.
  • To maintain feature parity with PostgreSQL which provides such function:
# SELECT make_interval(2019, 11);
   make_interval
--------------------
 2019 years 11 mons

Does this PR introduce any user-facing change?

No

How was this patch tested?

  • By new tests for the MakeInterval expression to IntervalExpressionsSuite
  • By tests in interval.sql

@SparkQA
Copy link

SparkQA commented Nov 9, 2019

Test build #113499 has finished for PR 26446 at commit 312bd45.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member

Thank you for adding this, @MaxGekk ! Only minor comments.

@SparkQA
Copy link

SparkQA commented Nov 10, 2019

Test build #113530 has finished for PR 26446 at commit 95bbfbc.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member

Could you resolve the conflicts, @MaxGekk ?

# Conflicts:
#	sql/core/src/test/resources/sql-tests/inputs/interval.sql
#	sql/core/src/test/resources/sql-tests/results/interval.sql.out
@SparkQA
Copy link

SparkQA commented Nov 10, 2019

Test build #113546 has finished for PR 26446 at commit 0f9a3bb.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • case class AlterViewAsStatement(

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM. Merged to master.
Thank you always for keeping progress on Apache Spark, @MaxGekk !

// Accept `secs` as DecimalType to avoid loosing precision of microseconds while converting
// them to the fractional part of `secs`.
override def inputTypes: Seq[AbstractDataType] = Seq(IntegerType, IntegerType, IntegerType,
IntegerType, IntegerType, IntegerType, DecimalType(8, 6))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MaxGekk DecimalType(8, 6) for seconds makes it overflow to null if the input expression has values > 100.
Simba reported it when testing using this function to implement TIMESTAMPADD ODBC translation: {fn TIMESTAMPADD(SECONDS, integer_exp, timestamp)} -> timestamp + make_interval(0, 0, 0, 0, 0, 0, integer_exp)

I rased https://issues.apache.org/jira/browse/SPARK-32021

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for filing a JIRA, @juliuszsompolski .

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is the PR #28873 in which I bumped precision of seconds+fractions

cloud-fan pushed a commit that referenced this pull request Feb 10, 2021
### What changes were proposed in this pull request?

This pull request exposes the `make_interval` function, [as suggested here](#31000 (review)), and as agreed to [here](#31000 (comment)) and [here](#31000 (comment)).

This powerful little function allows for idiomatic datetime arithmetic via the Scala API:

```scala
// add two hours
df.withColumn("plus_2_hours", col("first_datetime") + make_interval(hours = lit(2)))

// subtract one week and 30 seconds
col("d") - make_interval(weeks = lit(1), secs = lit(30))
```

The `make_interval` [SQL function](#26446) already exists.

Here is [the JIRA ticket](https://issues.apache.org/jira/browse/SPARK-33995) for this PR.

### Why are the changes needed?

The Spark API makes it easy to perform datetime addition / subtraction with months (`add_months`) and days (`date_add`).  Users need to write code like this to perform datetime addition with years, weeks, hours, minutes, or seconds:

```scala
df.withColumn("plus_2_hours", expr("first_datetime + INTERVAL 2 hours"))
```

We don't want to force users to manipulate SQL strings when they're using the Scala API.

### Does this PR introduce _any_ user-facing change?

Yes, this PR adds `make_interval` to the `org.apache.spark.sql.functions` API.

This single function will benefit a lot of users.  It's a small increase in the surface of the API for a big gain.

### How was this patch tested?

This was tested via unit tests.

cc: MaxGekk

Closes #31073 from MrPowers/SPARK-33995.

Authored-by: MrPowers <matthewkevinpowers@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants