New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-35091][SPARK-35090][SQL] Support extract from ANSI Intervals #32351
Conversation
cc @MaxGekk @cloud-fan @maropu thanks very much |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add examples with ANSI intervals at
Lines 2404 to 2416 in e8d6992
Examples: | |
> SELECT _FUNC_('YEAR', TIMESTAMP '2019-08-12 01:00:00.123456'); | |
2019 | |
> SELECT _FUNC_('week', timestamp'2019-08-12 01:00:00.123456'); | |
33 | |
> SELECT _FUNC_('doy', DATE'2019-08-12'); | |
224 | |
> SELECT _FUNC_('SECONDS', timestamp'2019-10-01 00:00:01.000001'); | |
1.000001 | |
> SELECT _FUNC_('days', interval 1 year 10 months 5 days); | |
5 | |
> SELECT _FUNC_('seconds', interval 5 hours 30 seconds 1 milliseconds 1 microseconds); | |
30.001001 |
thanks for reminding @MaxGekk |
Kubernetes integration test starting |
Kubernetes integration test status failure |
Kubernetes integration test unable to build dist. exiting with code: 1 |
override protected def withNewChildInternal(newChild: Expression): ExtractIntervalSeconds = | ||
copy(child = newChild) | ||
} | ||
|
||
case class YearsOfYMInterval(child: Expression) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The name is a bit confusing. How about ExtractANSIIntervalYears
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make sense, updated
@@ -98,6 +130,19 @@ object ExtractIntervalPart { | |||
case "SECOND" | "S" | "SEC" | "SECONDS" | "SECS" => ExtractIntervalSeconds(source) | |||
case _ => errorHandleFunc | |||
} | |||
|
|||
def parseExtractFieldANSI( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we merge this into parseExtractField
?
case "YEAR" | "Y" | "YEARS" | "YR" | "YRS" =>
if (source.dataType == YearMonthIntervalType) {
ExtractANSIIntervalYears(source)
} else {
ExtractIntervalYears(source)
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we need another branch for DayTimeIntervalType case
Test build #137963 has finished for PR 32351 at commit
|
Test build #137964 has finished for PR 32351 at commit
|
Test build #137975 has finished for PR 32351 at commit
|
Kubernetes integration test starting |
Kubernetes integration test status failure |
object ExtractIntervalPart { | ||
|
||
def parseExtractField( | ||
extractField: String, | ||
source: Expression, | ||
errorHandleFunc: => Nothing): Expression = extractField.toUpperCase(Locale.ROOT) match { | ||
case "YEAR" if source.dataType == YearMonthIntervalType => ExtractANSIIntervalYears(source) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why don't we support all the shortcuts "YEAR" | "Y" | "YEARS" | "YR" | "YRS"
? Can we merge the case?
case "YEAR" | "Y" | "YEARS" | "YR" | "YRS" => if (source.dataType == YearMonthIntervalType) ... else ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For ANSI compliance, I didn't add the abbreviations. For inner consistency, I am OK to add them
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's add them. We will use the new interval types by default and this is a breaking change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
Kubernetes integration test unable to build dist. exiting with code: 1 |
case _ => errorHandleFunc | ||
errorHandleFunc: => Nothing): Expression = { | ||
(extractField.toUpperCase(Locale.ROOT), source.dataType) match { | ||
case ("YEAR" | "Y" | "YEARS" | "YR" | "YRS", YearMonthIntervalType) => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
match the field and type both, so that we don't need to do type checking in all the ExtractXXX
implementations, and reduce the code diff when removing CalendarIntervalType
Test build #137979 has finished for PR 32351 at commit
|
Kubernetes integration test starting |
Kubernetes integration test status failure |
Test build #137985 has finished for PR 32351 at commit
|
thanks, merging to master! |
What changes were proposed in this pull request?
In this PR, we add extract/date_part support for ANSI Intervals
The
extract
is an ANSI expression anddate_part
is NON-ANSI but exists as an equivalence forextract
expression
for interval source
dataType
Why are the changes needed?
Subtask of ANSI Intervals Support
Does this PR introduce any user-facing change?
Yes
How was this patch tested?
new added tests