[SPARK-35091][SPARK-35090][SQL] Support extract from ANSI Intervals #32351

yaooqinn · 2021-04-26T13:40:59Z

What changes were proposed in this pull request?

In this PR, we add extract/date_part support for ANSI Intervals

The extract is an ANSI expression and date_part is NON-ANSI but exists as an equivalence for extract

expression

<extract expression> ::=
  EXTRACT <left paren> <extract field> FROM <extract source> <right paren>

for interval source


<primary datetime field> ::=
    <non-second primary datetime field>
| SECOND
<non-second primary datetime field> ::=
    YEAR
  | MONTH
  | DAY
  | HOUR
  | MINUTE

dataType

If <extract field> is a <primary datetime field> that does not specify SECOND or <extract field> is not a <primary datetime field>, then the declared type of the result is an implementation-defined exact numeric type with scale 0 (zero)

Otherwise, the declared type of the result is an implementation-defined exact numeric type with scale not less than the specified or implied <time fractional seconds precision> or <interval fractional seconds precision>, as appropriate, of the SECOND <primary datetime field> of the <extract source>.

Why are the changes needed?

Subtask of ANSI Intervals Support

Does this PR introduce any user-facing change?

Yes

extract/date_part support ANSI intervals
for non-ansi intervals, the return type is changed from long to byte when extracting hours

How was this patch tested?

new added tests

yaooqinn · 2021-04-26T13:46:19Z

cc @MaxGekk @cloud-fan @maropu thanks very much

MaxGekk

Could you add examples with ANSI intervals at

spark/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala

Lines 2404 to 2416 in e8d6992

    
               Examples: 
        
                 > SELECT _FUNC_('YEAR', TIMESTAMP '2019-08-12 01:00:00.123456'); 
        
                  2019 
        
                 > SELECT _FUNC_('week', timestamp'2019-08-12 01:00:00.123456'); 
        
                  33 
        
                 > SELECT _FUNC_('doy', DATE'2019-08-12'); 
        
                  224 
        
                 > SELECT _FUNC_('SECONDS', timestamp'2019-10-01 00:00:01.000001'); 
        
                  1.000001 
        
                 > SELECT _FUNC_('days', interval 1 year 10 months 5 days); 
        
                  5 
        
                 > SELECT _FUNC_('seconds', interval 5 hours 30 seconds 1 milliseconds 1 microseconds); 
        
                  30.001001

yaooqinn · 2021-04-26T13:57:06Z

thanks for reminding @MaxGekk

SparkQA · 2021-04-26T14:36:55Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42484/

SparkQA · 2021-04-26T14:36:56Z

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42484/

SparkQA · 2021-04-26T15:15:52Z

Kubernetes integration test unable to build dist.

exiting with code: 1
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42485/

cloud-fan · 2021-04-26T16:30:09Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/intervalExpressions.scala

  override protected def withNewChildInternal(newChild: Expression): ExtractIntervalSeconds =
    copy(child = newChild)
 }

+case class YearsOfYMInterval(child: Expression)


The name is a bit confusing. How about ExtractANSIIntervalYears

make sense, updated

cloud-fan · 2021-04-26T16:32:10Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/intervalExpressions.scala

@@ -98,6 +130,19 @@ object ExtractIntervalPart {
    case "SECOND" | "S" | "SEC" | "SECONDS" | "SECS" => ExtractIntervalSeconds(source)
    case _ => errorHandleFunc
  }
+
+  def parseExtractFieldANSI(


Can we merge this into parseExtractField?

case "YEAR" | "Y" | "YEARS" | "YR" | "YRS" => if (source.dataType == YearMonthIntervalType) { ExtractANSIIntervalYears(source) } else { ExtractIntervalYears(source) }

we need another branch for DayTimeIntervalType case

SparkQA · 2021-04-26T18:06:38Z

Test build #137963 has finished for PR 32351 at commit e0ac000.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
abstract class ExtractIntervalPart[T](
case class YearsOfYMInterval(child: Expression)
case class MonthsOfYMInterval(child: Expression)
case class DaysOfDTInterval(child: Expression)
case class HoursOfDTInterval(child: Expression)
case class MinutesOfDTInterval(child: Expression)
case class SecondsOfDTInterval(child: Expression)

SparkQA · 2021-04-26T18:58:39Z

Test build #137964 has finished for PR 32351 at commit c4d957b.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2021-04-27T02:32:57Z

Test build #137975 has finished for PR 32351 at commit db74496.

This patch fails Scala style tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
case class ExtractANSIIntervalYears(child: Expression)
case class ExtractANSIIntervalMonths(child: Expression)
case class ExtractANSIIntervalDays(child: Expression)
case class ExtractANSIIntervalHours(child: Expression)
case class ExtractANSIIntervalMinutes(child: Expression)
case class ExtractANSIIntervalSeconds(child: Expression)

SparkQA · 2021-04-27T03:16:33Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42495/

SparkQA · 2021-04-27T03:21:14Z

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42495/

cloud-fan · 2021-04-27T03:24:05Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/intervalExpressions.scala

 object ExtractIntervalPart {

  def parseExtractField(
      extractField: String,
      source: Expression,
      errorHandleFunc: => Nothing): Expression = extractField.toUpperCase(Locale.ROOT) match {
+    case "YEAR" if source.dataType == YearMonthIntervalType => ExtractANSIIntervalYears(source)


Why don't we support all the shortcuts "YEAR" | "Y" | "YEARS" | "YR" | "YRS"? Can we merge the case?

case "YEAR" | "Y" | "YEARS" | "YR" | "YRS" => if (source.dataType == YearMonthIntervalType) ... else ...

For ANSI compliance, I didn't add the abbreviations. For inner consistency, I am OK to add them

let's add them. We will use the new interval types by default and this is a breaking change.

SparkQA · 2021-04-27T04:14:45Z

Kubernetes integration test unable to build dist.

exiting with code: 1
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42499/

yaooqinn · 2021-04-27T06:20:15Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/intervalExpressions.scala

-    case _ => errorHandleFunc
+      errorHandleFunc: => Nothing): Expression = {
+    (extractField.toUpperCase(Locale.ROOT), source.dataType) match {
+      case ("YEAR" | "Y" | "YEARS" | "YR" | "YRS", YearMonthIntervalType) =>


match the field and type both, so that we don't need to do type checking in all the ExtractXXX implementations, and reduce the code diff when removing CalendarIntervalType

SparkQA · 2021-04-27T07:01:42Z

Test build #137979 has finished for PR 32351 at commit e4bc0f6.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2021-04-27T07:26:48Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42505/

SparkQA · 2021-04-27T07:32:00Z

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42505/

SparkQA · 2021-04-27T10:53:36Z

Test build #137985 has finished for PR 32351 at commit 687a384.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2021-04-27T13:06:52Z

thanks, merging to master!

[SPARK-35091][SPARK-35090][SQL] Support extract from ANSI Intervals

e0ac000

github-actions bot added the SQL label Apr 26, 2021

MaxGekk reviewed Apr 26, 2021

View reviewed changes

[SPARK-35091][SPARK-35090][SQL] Support extract from ANSI Intervals

c4d957b

cloud-fan reviewed Apr 26, 2021

View reviewed changes

naming

db74496

style

e4bc0f6

cloud-fan reviewed Apr 27, 2021

View reviewed changes

[SPARK-35091][SPARK-35090][SQL] Support extract from ANSI Intervals

14777c9

refine

687a384

yaooqinn commented Apr 27, 2021

View reviewed changes

cloud-fan closed this in 16d223e Apr 27, 2021

yaooqinn deleted the SPARK-35091 branch April 27, 2021 13:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-35091][SPARK-35090][SQL] Support extract from ANSI Intervals #32351

[SPARK-35091][SPARK-35090][SQL] Support extract from ANSI Intervals #32351

yaooqinn commented Apr 26, 2021 •

edited

yaooqinn commented Apr 26, 2021 •

edited

MaxGekk left a comment

yaooqinn commented Apr 26, 2021

SparkQA commented Apr 26, 2021

SparkQA commented Apr 26, 2021

SparkQA commented Apr 26, 2021

cloud-fan Apr 26, 2021

yaooqinn Apr 27, 2021

cloud-fan Apr 26, 2021

yaooqinn Apr 27, 2021

yaooqinn Apr 27, 2021

SparkQA commented Apr 26, 2021

SparkQA commented Apr 26, 2021

SparkQA commented Apr 27, 2021

SparkQA commented Apr 27, 2021

SparkQA commented Apr 27, 2021

cloud-fan Apr 27, 2021

yaooqinn Apr 27, 2021

cloud-fan Apr 27, 2021

yaooqinn Apr 27, 2021

SparkQA commented Apr 27, 2021

yaooqinn Apr 27, 2021

SparkQA commented Apr 27, 2021

SparkQA commented Apr 27, 2021

SparkQA commented Apr 27, 2021

SparkQA commented Apr 27, 2021

cloud-fan commented Apr 27, 2021

	Examples:
	> SELECT _FUNC_('YEAR', TIMESTAMP '2019-08-12 01:00:00.123456');
	2019
	> SELECT _FUNC_('week', timestamp'2019-08-12 01:00:00.123456');
	33
	> SELECT _FUNC_('doy', DATE'2019-08-12');
	224
	> SELECT _FUNC_('SECONDS', timestamp'2019-10-01 00:00:01.000001');
	1.000001
	> SELECT _FUNC_('days', interval 1 year 10 months 5 days);
	5
	> SELECT _FUNC_('seconds', interval 5 hours 30 seconds 1 milliseconds 1 microseconds);
	30.001001

[SPARK-35091][SPARK-35090][SQL] Support extract from ANSI Intervals #32351

[SPARK-35091][SPARK-35090][SQL] Support extract from ANSI Intervals #32351

Conversation

yaooqinn commented Apr 26, 2021 • edited

What changes were proposed in this pull request?

expression

for interval source

dataType

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

yaooqinn commented Apr 26, 2021 • edited

MaxGekk left a comment

Choose a reason for hiding this comment

yaooqinn commented Apr 26, 2021

SparkQA commented Apr 26, 2021

SparkQA commented Apr 26, 2021

SparkQA commented Apr 26, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SparkQA commented Apr 26, 2021

SparkQA commented Apr 26, 2021

SparkQA commented Apr 27, 2021

SparkQA commented Apr 27, 2021

SparkQA commented Apr 27, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SparkQA commented Apr 27, 2021

Choose a reason for hiding this comment

SparkQA commented Apr 27, 2021

SparkQA commented Apr 27, 2021

SparkQA commented Apr 27, 2021

SparkQA commented Apr 27, 2021

cloud-fan commented Apr 27, 2021

yaooqinn commented Apr 26, 2021 •

edited

yaooqinn commented Apr 26, 2021 •

edited