Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-34761][SQL] Support add/subtract of a day-time interval to/from a timestamp #31855

Closed
wants to merge 11 commits into from

Conversation

MaxGekk
Copy link
Member

@MaxGekk MaxGekk commented Mar 16, 2021

What changes were proposed in this pull request?

Support timestamp +/- day-time interval. In the PR, I propose to extend the TimeAdd expression and support DayTimeIntervalType as the interval parameter. The expression invokes the new method DateTimeUtils.timestampAddDayTime() which splits the input day-time interval to days and microsecond adjustment of a day, and adds days (and the microseconds) to a local timestamp derived from the given timestamp at the given time zone. The resulted local timestamp is converted back to the offset in microseconds since the epoch.

Also I updated the rules that handle CalendarIntervalType and produce TimeAdd to take into account new type DateTimeIntervalType for the interval parameter of TimeAdd.

Why are the changes needed?

To conform the ANSI SQL standard which requires to support such operation over timestamps and intervals:
Screenshot 2021-03-12 at 11 36 14

Does this PR introduce any user-facing change?

Should not since new intervals have not been released yet.

How was this patch tested?

By running new tests:

$ build/sbt "test:testOnly *DateTimeUtilsSuite"
$ build/sbt "test:testOnly *DateExpressionsSuite"
$ build/sbt "test:testOnly *ColumnExpressionSuite"

@AmplabJenkins
Copy link

Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/136124/

@AmplabJenkins
Copy link

Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/40706/

@github-actions github-actions bot added the SQL label Mar 16, 2021
@AmplabJenkins
Copy link

Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/40711/

@AmplabJenkins
Copy link

Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/136129/

@MaxGekk
Copy link
Member Author

MaxGekk commented Mar 17, 2021

@cloud-fan @yaooqinn Could you review this PR, please.

@MaxGekk
Copy link
Member Author

MaxGekk commented Mar 17, 2021

@cloud-fan @srielau I doubt about semantic of the operation. Should we add the interval as "physical" duration like in this PR by just shifting timestamp offset in micros, or convert the timestamp to a local timestamp and add days and times?

@cloud-fan
Copy link
Contributor

I think we should add days and times separately, to be consistent with the legacy interval type which has days as a field.

@yaooqinn
Copy link
Member

I think we should add days and times separately, to be consistent with the legacy interval type which has days as a field.

+1 for this suggestion

@@ -342,11 +342,12 @@ class Analyzer(override val catalogManager: CatalogManager)
case (YearMonthIntervalType, DateType) => DateAddYMInterval(r, l)
case (TimestampType, YearMonthIntervalType) => TimestampAddYMInterval(l, r)
case (YearMonthIntervalType, TimestampType) => TimestampAddYMInterval(r, l)
case (CalendarIntervalType, CalendarIntervalType) => a
case (CalendarIntervalType, CalendarIntervalType) |
(DayTimeIntervalType, DayTimeIntervalType) => a
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall we allow adding year-month interval as well?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The op is allowed already, see the test https://github.com/apache/spark/pull/31789/files#diff-bff10c8a3182aa943d8927135e0c14b02a338d9bcca94ddcd77670ee01fea0f3R2380 . It should still pass after the changes.

I have to add this cases because I reuse the TimeAdd expression, and associated rules here.

The year-month interval +/- year-month interval expr is handled by the default case case _ => a/case _ => s.

@@ -356,10 +357,11 @@ class Analyzer(override val catalogManager: CatalogManager)
DatetimeSub(l, r, DateAddYMInterval(l, UnaryMinus(r, f)))
case (TimestampType, YearMonthIntervalType) =>
DatetimeSub(l, r, TimestampAddYMInterval(l, UnaryMinus(r, f)))
case (CalendarIntervalType, CalendarIntervalType) => s
case (CalendarIntervalType, CalendarIntervalType) |
(DayTimeIntervalType, DayTimeIntervalType) => s
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The YearMonthIntervalType is handled by the default case case _ => s. We need to handle CalendarIntervalType and DayTimeIntervalType especially otherwise they will be casted to some unexpected type in the base arithmetic ops +/-

@AmplabJenkins
Copy link

Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/136216/

@AmplabJenkins
Copy link

Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/40798/

nanoAdjustment <- Gen.choose(-999999000, 999999000)
} yield {
Literal.create(Duration.ofSeconds(seconds, nanoAdjustment), DayTimeIntervalType)
calendarIntervalLiterGen.map { calendarIntervalLiteral =>
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This generator is used to test expressions like random timestamp + random day-time interval. I have to adjust the generator to have reasonable intervals, and prevent overflows.

@AmplabJenkins
Copy link

Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/136219/

@AmplabJenkins
Copy link

Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/40801/

@cloud-fan
Copy link
Contributor

thanks, merging to master!

@cloud-fan cloud-fan closed this in a48b208 Mar 19, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
4 participants