Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-35091][SPARK-35090][SQL] Support extract from ANSI Intervals #32351

Closed
wants to merge 6 commits into from

Conversation

yaooqinn
Copy link
Member

@yaooqinn yaooqinn commented Apr 26, 2021

What changes were proposed in this pull request?

In this PR, we add extract/date_part support for ANSI Intervals

The extract is an ANSI expression and date_part is NON-ANSI but exists as an equivalence for extract

expression

<extract expression> ::=
  EXTRACT <left paren> <extract field> FROM <extract source> <right paren>

for interval source


<primary datetime field> ::=
    <non-second primary datetime field>
| SECOND
<non-second primary datetime field> ::=
    YEAR
  | MONTH
  | DAY
  | HOUR
  | MINUTE

dataType

If <extract field> is a <primary datetime field> that does not specify SECOND or <extract field> is not a <primary datetime field>, then the declared type of the result is an implementation-defined exact numeric type with scale 0 (zero)

Otherwise, the declared type of the result is an implementation-defined exact numeric type with scale not less than the specified or implied <time fractional seconds precision> or <interval fractional seconds precision>, as appropriate, of the SECOND <primary datetime field> of the <extract source>.

Why are the changes needed?

Subtask of ANSI Intervals Support

Does this PR introduce any user-facing change?

Yes

  1. extract/date_part support ANSI intervals
  2. for non-ansi intervals, the return type is changed from long to byte when extracting hours

How was this patch tested?

new added tests

@github-actions github-actions bot added the SQL label Apr 26, 2021
@yaooqinn
Copy link
Member Author

yaooqinn commented Apr 26, 2021

cc @MaxGekk @cloud-fan @maropu thanks very much

Copy link
Member

@MaxGekk MaxGekk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add examples with ANSI intervals at

Examples:
> SELECT _FUNC_('YEAR', TIMESTAMP '2019-08-12 01:00:00.123456');
2019
> SELECT _FUNC_('week', timestamp'2019-08-12 01:00:00.123456');
33
> SELECT _FUNC_('doy', DATE'2019-08-12');
224
> SELECT _FUNC_('SECONDS', timestamp'2019-10-01 00:00:01.000001');
1.000001
> SELECT _FUNC_('days', interval 1 year 10 months 5 days);
5
> SELECT _FUNC_('seconds', interval 5 hours 30 seconds 1 milliseconds 1 microseconds);
30.001001

@yaooqinn
Copy link
Member Author

thanks for reminding @MaxGekk

@SparkQA
Copy link

SparkQA commented Apr 26, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42484/

@SparkQA
Copy link

SparkQA commented Apr 26, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42484/

@SparkQA
Copy link

SparkQA commented Apr 26, 2021

Kubernetes integration test unable to build dist.

exiting with code: 1
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42485/

override protected def withNewChildInternal(newChild: Expression): ExtractIntervalSeconds =
copy(child = newChild)
}

case class YearsOfYMInterval(child: Expression)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name is a bit confusing. How about ExtractANSIIntervalYears

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make sense, updated

@@ -98,6 +130,19 @@ object ExtractIntervalPart {
case "SECOND" | "S" | "SEC" | "SECONDS" | "SECS" => ExtractIntervalSeconds(source)
case _ => errorHandleFunc
}

def parseExtractFieldANSI(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we merge this into parseExtractField?

case "YEAR" | "Y" | "YEARS" | "YR" | "YRS" =>
  if (source.dataType == YearMonthIntervalType) {
    ExtractANSIIntervalYears(source)
  } else {
    ExtractIntervalYears(source)
  }

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need another branch for DayTimeIntervalType case

@SparkQA
Copy link

SparkQA commented Apr 26, 2021

Test build #137963 has finished for PR 32351 at commit e0ac000.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • abstract class ExtractIntervalPart[T](
  • case class YearsOfYMInterval(child: Expression)
  • case class MonthsOfYMInterval(child: Expression)
  • case class DaysOfDTInterval(child: Expression)
  • case class HoursOfDTInterval(child: Expression)
  • case class MinutesOfDTInterval(child: Expression)
  • case class SecondsOfDTInterval(child: Expression)

@SparkQA
Copy link

SparkQA commented Apr 26, 2021

Test build #137964 has finished for PR 32351 at commit c4d957b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 27, 2021

Test build #137975 has finished for PR 32351 at commit db74496.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • case class ExtractANSIIntervalYears(child: Expression)
  • case class ExtractANSIIntervalMonths(child: Expression)
  • case class ExtractANSIIntervalDays(child: Expression)
  • case class ExtractANSIIntervalHours(child: Expression)
  • case class ExtractANSIIntervalMinutes(child: Expression)
  • case class ExtractANSIIntervalSeconds(child: Expression)

@SparkQA
Copy link

SparkQA commented Apr 27, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42495/

@SparkQA
Copy link

SparkQA commented Apr 27, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42495/

object ExtractIntervalPart {

def parseExtractField(
extractField: String,
source: Expression,
errorHandleFunc: => Nothing): Expression = extractField.toUpperCase(Locale.ROOT) match {
case "YEAR" if source.dataType == YearMonthIntervalType => ExtractANSIIntervalYears(source)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why don't we support all the shortcuts "YEAR" | "Y" | "YEARS" | "YR" | "YRS"? Can we merge the case?

case "YEAR" | "Y" | "YEARS" | "YR" | "YRS" => if (source.dataType == YearMonthIntervalType) ... else ...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For ANSI compliance, I didn't add the abbreviations. For inner consistency, I am OK to add them

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's add them. We will use the new interval types by default and this is a breaking change.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

@SparkQA
Copy link

SparkQA commented Apr 27, 2021

Kubernetes integration test unable to build dist.

exiting with code: 1
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42499/

case _ => errorHandleFunc
errorHandleFunc: => Nothing): Expression = {
(extractField.toUpperCase(Locale.ROOT), source.dataType) match {
case ("YEAR" | "Y" | "YEARS" | "YR" | "YRS", YearMonthIntervalType) =>
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

match the field and type both, so that we don't need to do type checking in all the ExtractXXX implementations, and reduce the code diff when removing CalendarIntervalType

@SparkQA
Copy link

SparkQA commented Apr 27, 2021

Test build #137979 has finished for PR 32351 at commit e4bc0f6.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 27, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42505/

@SparkQA
Copy link

SparkQA commented Apr 27, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42505/

@SparkQA
Copy link

SparkQA commented Apr 27, 2021

Test build #137985 has finished for PR 32351 at commit 687a384.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

thanks, merging to master!

@cloud-fan cloud-fan closed this in 16d223e Apr 27, 2021
@yaooqinn yaooqinn deleted the SPARK-35091 branch April 27, 2021 13:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
4 participants