Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-35727][SQL] Return INTERVAL DAY from dates subtraction #32999

Closed
wants to merge 3 commits into from

Conversation

Peng-Lei
Copy link
Contributor

@Peng-Lei Peng-Lei commented Jun 21, 2021

What changes were proposed in this pull request?

Change the return value type from DayTimeIntervalType(DAY, SECOND) to DayTimeIntervalType(DAY, DAY) of SubtractDates.

Why are the changes needed?

SPARK-35727

Does this PR introduce any user-facing change?

No

How was this patch tested?

existed ut test

@github-actions github-actions bot added the SQL label Jun 21, 2021
@Peng-Lei
Copy link
Contributor Author

cc @MaxGekk

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

if (legacyInterval) {
CalendarIntervalType
} else {
DayTimeIntervalType(DayTimeIntervalType.DAY, DayTimeIntervalType.DAY)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added apply() with single arg recently #32997, you can use it here as:

DayTimeIntervalType(DAY)

checkAnswer(df.select($"end" - $"start"), Row(daysBetween))
val r = df.select($"end" - $"start").toDF("diff")
checkAnswer(r, Row(daysBetween))
assert(r.schema === new StructType().add("diff", DayTimeIntervalType(0, 0)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, replace DayTimeIntervalType(0, 0) by DayTimeIntervalType(DAY)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Peng-Lei This change seems to include the change not only for Date - Date but also Date + Interval and Interval + Date. If you'd like to include them, could you add tests for them too?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's exclude the Date +/- Interval changes from this PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you add tests for them too

@sarutak ,Sorry I am wrong. First I should exclude the Date +/- Interval from this PR. Second Even if I wanted to change Date +/- Interval, I should convert the interval to the number of days ExtractANSIIntervalDays) and then use DateAdd expression as cloud-fan said.

@MaxGekk
Copy link
Member

MaxGekk commented Jun 21, 2021

Also cc @sarutak @AngersZhuuuu and @cloud-fan

@sarutak
Copy link
Member

sarutak commented Jun 21, 2021

@Peng-Lei
Aside from the problem raised in SPARK-35727, this PR seems to include other fixes which are related to Add/Sub between Date and Interval. Could you file these issues in JIRA?

@@ -349,8 +349,18 @@ class Analyzer(override val catalogManager: CatalogManager)
case p: LogicalPlan => p.transformExpressionsUpWithPruning(
_.containsPattern(BINARY_ARITHMETIC), ruleId) {
case a @ Add(l, r, f) if a.childrenResolved => (l.dataType, r.dataType) match {
case (DateType, _: DayTimeIntervalType) => TimeAdd(Cast(l, TimestampType), r)
case (_: DayTimeIntervalType, DateType) => TimeAdd(Cast(r, TimestampType), l)
case (DateType, dit: DayTimeIntervalType) =>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change in Analyzer.scala doesn't seem related to the issue raised in SPARK-35727.
If we define the arithmetic operations between DateType and DayTimeIntervalType, we should define such operators for YearMonthIntervalType right?

Copy link
Contributor

@cloud-fan cloud-fan Jun 21, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with change but I think it's a separate thing and can be done in a new PR.

The impl detail can also be improved: we can convert the interval to the number of days (ExtractANSIIntervalDays) and then use DateAdd expression.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with change but I think it's a separate thing and can be done in a new PR.

The impl detail can also be improved: we can convert the interval to the number of days (ExtractANSIIntervalDays) and then use DateAdd expression.

@cloud-fan Could I propose a ticket for improved for DateType +/- DayTimeIntervalType(DAY) and try to resolve it ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes please

@Peng-Lei Peng-Lei force-pushed the SPARK-35727 branch 3 times, most recently from 734b6e2 to 525eb9e Compare June 22, 2021 03:14
Copy link
Contributor

@cloud-fan cloud-fan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM if all checks pass (scalastyle check will fail I think)

@Peng-Lei
Copy link
Contributor Author

@cloud-fan Thank you very much. Your review comments are always inspiring.

@MaxGekk
Copy link
Member

MaxGekk commented Jun 22, 2021

@Peng-Lei Could you fix markdown in PR's description. Please, keep the original formatting for sections.

Comment on lines 2705 to 2709
if (legacyInterval) {
CalendarIntervalType
} else {
DayTimeIntervalType(DAY)
}
Copy link
Member

@MaxGekk MaxGekk Jun 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: minimal changes would be:

    if (legacyInterval) CalendarIntervalType else DayTimeIntervalType(DAY)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MaxGekk Thank you very much, I fix it. Thank you.

@MaxGekk
Copy link
Member

MaxGekk commented Jun 22, 2021

@Peng-Lei The failed tests:

[info] - typeCoercion/native/promoteStrings.sql *** FAILED *** (3 seconds, 735 milliseconds)
[info]   typeCoercion/native/promoteStrings.sql
[info]   Expected "... DATE)):interval day[ to second]>", but got "... DATE)):interval day[]>" Schema did not match for query #24
[info]   SELECT '1' - cast('2017-12-11 09:30:00' as date)        FROM t: -- !query
[info]   SELECT '1' - cast('2017-12-11 09:30:00' as date)        FROM t
[info]   -- !query schema
[info]   struct<(1 - CAST(2017-12-11 09:30:00 AS DATE)):interval day>
[info]   -- !query output

are related to the changes. Could you regenerate the golden files, please.

@Peng-Lei
Copy link
Contributor Author

Could you regenerate the golden files

@MaxGekk ok. I'm working on how to do this,

@cloud-fan
Copy link
Contributor

@Peng-Lei you can check the classdoc of SQLQueryTestSuite

@Peng-Lei Peng-Lei requested a review from cloud-fan June 22, 2021 13:26
@cloud-fan
Copy link
Contributor

thanks, merging to master!

@cloud-fan cloud-fan closed this in bc61b62 Jun 22, 2021
@MaxGekk
Copy link
Member

MaxGekk commented Jun 22, 2021

@Peng-Lei Could you change formatting of PR's description (see #32999 (comment)), please. It has been already late for the commit but in any case.

@HyukjinKwon
Copy link
Member

Yeah, let's keep the PR description properly.

@Peng-Lei
Copy link
Contributor Author

@Peng-Lei Could you change formatting of PR's description (see #32999 (comment)), please. It has been already late for the commit but in any case.

@MaxGekk Sorry, I missed that comment #32999 (comment), I will pay more attention to PR description and deal with each review comments of PR in the future. Thank you.

@cloud-fan
Copy link
Contributor

@Peng-Lei although it's merged, you can still update the PR description, in case other people come to this PR in the future

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
6 participants