Skip to content

[SPARK-35737][SQL] Parse day-time interval literals to tightest types#32892

Closed
sarutak wants to merge 5 commits into
apache:masterfrom
sarutak:tight-daytime-interval
Closed

[SPARK-35737][SQL] Parse day-time interval literals to tightest types#32892
sarutak wants to merge 5 commits into
apache:masterfrom
sarutak:tight-daytime-interval

Conversation

@sarutak
Copy link
Copy Markdown
Member

@sarutak sarutak commented Jun 12, 2021

What changes were proposed in this pull request?

This PR add a feature which parse day-time interval literals to tightest type.

Why are the changes needed?

To comply with the ANSI behavior.
For example, INTERVAL '10 20:30' DAY TO MINUTE should be parsed as DayTimeIntervalType(DAY, MINUTE) but not as DayTimeIntervalType(DAY, SECOND).

Does this PR introduce any user-facing change?

No because DayTimeIntervalType will be introduced in 3.2.0.

How was this patch tested?

New tests.

@github-actions github-actions Bot added the SQL label Jun 12, 2021
@SparkQA
Copy link
Copy Markdown

SparkQA commented Jun 12, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44267/

@SparkQA
Copy link
Copy Markdown

SparkQA commented Jun 12, 2021

Test build #139742 has finished for PR 32892 at commit c6906d8.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link
Copy Markdown

SparkQA commented Jun 12, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44267/

@HyukjinKwon
Copy link
Copy Markdown
Member

cc @MaxGekk @yaooqinn FYI

@sarutak sarutak changed the title [SPARK-35737][SQL] Parse day-time interval literals to tightest types [WIP][SPARK-35737][SQL] Parse day-time interval literals to tightest types Jun 13, 2021
@sarutak sarutak force-pushed the tight-daytime-interval branch from b4fdd39 to 32cce06 Compare June 13, 2021 14:35
@sarutak sarutak changed the title [WIP][SPARK-35737][SQL] Parse day-time interval literals to tightest types [SPARK-35737][SQL] Parse day-time interval literals to tightest types Jun 13, 2021
@SparkQA
Copy link
Copy Markdown

SparkQA commented Jun 13, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44279/

@SparkQA
Copy link
Copy Markdown

SparkQA commented Jun 13, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44279/

@SparkQA
Copy link
Copy Markdown

SparkQA commented Jun 13, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44280/

@SparkQA
Copy link
Copy Markdown

SparkQA commented Jun 13, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44280/

@SparkQA
Copy link
Copy Markdown

SparkQA commented Jun 13, 2021

Test build #139753 has finished for PR 32892 at commit 32cce06.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@sarutak sarutak force-pushed the tight-daytime-interval branch from a41fd2b to eef88fc Compare June 13, 2021 18:53
@SparkQA
Copy link
Copy Markdown

SparkQA commented Jun 13, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44282/

@MaxGekk
Copy link
Copy Markdown
Member

MaxGekk commented Jun 13, 2021

@sarutak We should support single field as well there:

IntervalUtils.fromDayTimeString(value, IntervalUnit.DAY, IntervalUnit.HOUR)
case ("day", "minute") =>
IntervalUtils.fromDayTimeString(value, IntervalUnit.DAY, IntervalUnit.MINUTE)
case ("day", "second") =>
IntervalUtils.fromDayTimeString(value, IntervalUnit.DAY, IntervalUnit.SECOND)
case ("hour", "minute") =>
IntervalUtils.fromDayTimeString(value, IntervalUnit.HOUR, IntervalUnit.MINUTE)
case ("hour", "second") =>
IntervalUtils.fromDayTimeString(value, IntervalUnit.HOUR, IntervalUnit.SECOND)
case ("minute", "second") =>
IntervalUtils.fromDayTimeString(value, IntervalUnit.MINUTE, IntervalUnit.SECOND)

Please, open a sub-task in JIRA to don't forget about it if it is hard to do in this PR.

@SparkQA
Copy link
Copy Markdown

SparkQA commented Jun 13, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44282/

Literal(calendarInterval.months, YearMonthIntervalType)
} else {
assert(calendarInterval.months == 0)
val strToFieldIndex = DayTimeIntervalType.dayTimeFields.map(i =>
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have seen this already in other PR. Does it make sense to put it to a common place case object DayTimeIntervalType?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it seems a common logic. After this and #32893 are merged, I'll open a followup PR.

Copy link
Copy Markdown
Member

@MaxGekk MaxGekk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Waiting for GA.

@SparkQA
Copy link
Copy Markdown

SparkQA commented Jun 13, 2021

Test build #139754 has finished for PR 32892 at commit 6f0c88c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@sarutak
Copy link
Copy Markdown
Member Author

sarutak commented Jun 13, 2021

Please, open a sub-task in JIRA to don't forget about it if it is hard to do in this PR.

Yes, I've noticed that we should support single field but single field is currently parsed as CalendarIntervalType so we need change the behavior. So I separated that from this issue.
Sure, I'll open a sub task.

BTW, what do you think of multiple fields like 1 day 2 hours? They should be parsed as DayTimeIntervalType rather than CalendarType?

@MaxGekk
Copy link
Copy Markdown
Member

MaxGekk commented Jun 14, 2021

BTW, what do you think of multiple fields like 1 day 2 hours? They should be parsed as DayTimeIntervalType rather than CalendarType?

I think we should take into account the SQL config spark.sql.legacy.interval.enabled. If it is false (default), we should parse 1 day 2 hours as DayTimeIntervalType. There is a problem with mixed intervals like 1 month 2 hours. In that case, we should throw an exception, and ask users either to change the config or interval (1 month 2 hours -> 30 days 2 hours).

@MaxGekk
Copy link
Copy Markdown
Member

MaxGekk commented Jun 14, 2021

+1, LGTM. Merging to master.
Thank you, @sarutak .

MaxGekk pushed a commit that referenced this pull request Jun 14, 2021
…imeIntervalType

### What changes were proposed in this pull request?

This is a followup PR for SPARK-35736(#32893) and SPARK-35737(#32892).
This PR moves a common logic to `object DayTimeIntervalType`.
That logic is like `val strToFieldIndex = DayTimeIntervalType.dayTimeFields.map(i => DayTimeIntervalType.fieldToString(i) -> (i).toMap`, a `Map` which maps each time unit to the corresponding day-time field index.

### Why are the changes needed?

That logic appeared in the change in SPARK-35736 and SPARK-35737 so it can be a common logic and it's better to avoid the similar logic scattered.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Existing tests.

Closes #32905 from sarutak/followup-SPARK-35736-35737.

Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com>
Signed-off-by: Max Gekk <max.gekk@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants