[SPARK-36445][SQL] ANSI type coercion rule for date time operations#33666
[SPARK-36445][SQL] ANSI type coercion rule for date time operations#33666gengliangwang wants to merge 8 commits intoapache:masterfrom
Conversation
|
Test build #142155 has finished for PR 33666 at commit
|
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/AnsiTypeCoercion.scala
Show resolved
Hide resolved
| select date_sub('2011-11-11', str) from v; | ||
|
|
||
| -- non-literal string column add/sub with integer | ||
| create or replace temp view v2 as select '2011-11-11' str; |
There was a problem hiding this comment.
nit: let's remove or replace. It's always error-prone if we replace a temp view in the test, as we may change the rest of the test cases.
| select '2011-11-11 11:11:11' - interval '2' second; | ||
| select '1' - interval '2' second; | ||
| select 1 - interval '2' second; | ||
| create or replace temp view v as select '2011-11-11' str; |
| select '2011-11-11 11:11:11' - interval '2' second; | ||
| select '1' - interval '2' second; | ||
| select 1 - interval '2' second; | ||
| select '2011-11-11 11:11:11' - date'2011-11-11'; |
There was a problem hiding this comment.
how about string - date/timestamp?
I have added a new test case. But it seems that string - date will be cast(string as date) - date. Should we move it to date.sql or just leave it here?
There was a problem hiding this comment.
let's put it in date.sql, as we don't need to test this query 4 times with different timestamp settings.
|
Kubernetes integration test starting |
|
Test build #142160 has finished for PR 33666 at commit
|
|
Kubernetes integration test status failure |
|
Kubernetes integration test starting |
|
Kubernetes integration test unable to build dist. exiting with code: 1 |
|
Kubernetes integration test status failure |
|
Test build #142168 has finished for PR 33666 at commit
|
| struct<> | ||
| -- !query output | ||
| org.apache.spark.sql.AnalysisException | ||
| cannot resolve '('2011-11-11 11:11:11' - TIMESTAMP '2011-11-11 11:11:10')' due to data type mismatch: argument 1 requires (timestamp or timestamp without time zone) type, however, ''2011-11-11 11:11:11'' is of string type.; line 1 pos 7 |
There was a problem hiding this comment.
why does this fail under ansi mode?
There was a problem hiding this comment.
We don't have a rule for this one
There was a problem hiding this comment.
shall we fix it under ansi mode? this is inconsistent: string_literal - date works but string_literal - timestamp does not.
| struct<> | ||
| -- !query output | ||
| org.apache.spark.sql.catalyst.analysis.TempTableAlreadyExistsException | ||
| Temporary view 'v' already exists |
There was a problem hiding this comment.
seems we need to change the temp view name
|
Test build #142230 has started for PR 33666 at commit |
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
|
Test build #142261 has finished for PR 33666 at commit
|
|
jenkins, retest this please |
| select null - date '2019-10-06'; | ||
| select date '2001-10-01' - date '2001-09-28'; | ||
| select '2011-11-11 11:11:11' - date'2011-11-11'; | ||
|
|
There was a problem hiding this comment.
nit: we can put select str - date'2011-11-11' from v2; here.
|
|
||
|
|
||
| -- !query | ||
| select date_add(timestamp_ntz'2011-11-11 12:12:12', 1) |
There was a problem hiding this comment.
if timestamp_ntz becomes the default in the future, it's better to support this query as well to not break anything.
There was a problem hiding this comment.
The LTZ one will fail in ANSI mode. We tend to make the NTZ behavior consistent in default/ANSI mode, right?
There was a problem hiding this comment.
It's a different topic. Breaking change means it's very hard to enable NTZ by default in the future.
There was a problem hiding this comment.
I think we tend to make LTZ/NTZ behavior consistent within default mode, or within ANSI mode, but not between default and ANSI mode.
There was a problem hiding this comment.
I will have a follow-up for NTZ behavior.
|
Kubernetes integration test unable to build dist. exiting with code: 1 |
|
Kubernetes integration test unable to build dist. exiting with code: 1 |
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
|
Test build #142262 has finished for PR 33666 at commit
|
|
Test build #142270 has finished for PR 33666 at commit
|
|
Merging to master/3.2 |
### What changes were proposed in this pull request? Implement a new rule for the date-time operations in the ANSI type coercion system: 1. Date will be converted to Timestamp when it is in the subtraction with Timestmap. 2. Promote string literals in date_add/date_sub/time_add ### Why are the changes needed? Currently the type coercion rule `DateTimeOperations` doesn't match the design of the ANSI type coercion system: 1. For date_add/date_sub, if the input is timestamp type, Spark should not convert it into date type since date type is narrower than the timestamp type. 2. For date_add/date_sub/time_add, string value can be implicit cast to date/timestamp only when it is literal. Thus, we need to have a new rule for the date-time operations in the ANSI type coercion system. ### Does this PR introduce _any_ user-facing change? No, the ANSI type coercion rules are not releaesd. ### How was this patch tested? New UT Closes #33666 from gengliangwang/datetimeOp. Authored-by: Gengliang Wang <gengliang@apache.org> Signed-off-by: Gengliang Wang <gengliang@apache.org> (cherry picked from commit 3029e62) Signed-off-by: Gengliang Wang <gengliang@apache.org>
What changes were proposed in this pull request?
Implement a new rule for the date-time operations in the ANSI type coercion system:
Why are the changes needed?
Currently the type coercion rule
DateTimeOperationsdoesn't match the design of the ANSI type coercion system:Thus, we need to have a new rule for the date-time operations in the ANSI type coercion system.
Does this PR introduce any user-facing change?
No, the ANSI type coercion rules are not releaesd.
How was this patch tested?
New UT