Skip to content

[SPARK-36445][SQL] ANSI type coercion rule for date time operations#33666

Closed
gengliangwang wants to merge 8 commits intoapache:masterfrom
gengliangwang:datetimeOp
Closed

[SPARK-36445][SQL] ANSI type coercion rule for date time operations#33666
gengliangwang wants to merge 8 commits intoapache:masterfrom
gengliangwang:datetimeOp

Conversation

@gengliangwang
Copy link
Member

What changes were proposed in this pull request?

Implement a new rule for the date-time operations in the ANSI type coercion system:

  1. Date will be converted to Timestamp when it is in the subtraction with Timestmap.
  2. Promote string literals in date_add/date_sub/time_add

Why are the changes needed?

Currently the type coercion rule DateTimeOperations doesn't match the design of the ANSI type coercion system:

  1. For date_add/date_sub, if the input is timestamp type, Spark should not convert it into date type since date type is narrower than the timestamp type.
  2. For date_add/date_sub/time_add, string value can be implicit cast to date/timestamp only when it is literal.

Thus, we need to have a new rule for the date-time operations in the ANSI type coercion system.

Does this PR introduce any user-facing change?

No, the ANSI type coercion rules are not releaesd.

How was this patch tested?

New UT

@gengliangwang gengliangwang requested a review from cloud-fan August 6, 2021 09:15
@github-actions github-actions bot added the SQL label Aug 6, 2021
@SparkQA
Copy link

SparkQA commented Aug 6, 2021

Test build #142155 has finished for PR 33666 at commit 21391c4.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

select date_sub('2011-11-11', str) from v;

-- non-literal string column add/sub with integer
create or replace temp view v2 as select '2011-11-11' str;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: let's remove or replace. It's always error-prone if we replace a temp view in the test, as we may change the rest of the test cases.

select '2011-11-11 11:11:11' - interval '2' second;
select '1' - interval '2' second;
select 1 - interval '2' second;
create or replace temp view v as select '2011-11-11' str;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

select '2011-11-11 11:11:11' - interval '2' second;
select '1' - interval '2' second;
select 1 - interval '2' second;
select '2011-11-11 11:11:11' - date'2011-11-11';
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about string - date/timestamp?

I have added a new test case. But it seems that string - date will be cast(string as date) - date. Should we move it to date.sql or just leave it here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's put it in date.sql, as we don't need to test this query 4 times with different timestamp settings.

@SparkQA
Copy link

SparkQA commented Aug 6, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46674/

@SparkQA
Copy link

SparkQA commented Aug 6, 2021

Test build #142160 has finished for PR 33666 at commit 653d0c6.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 6, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46674/

@SparkQA
Copy link

SparkQA commented Aug 6, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46681/

@SparkQA
Copy link

SparkQA commented Aug 6, 2021

Kubernetes integration test unable to build dist.

exiting with code: 1
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46668/

@SparkQA
Copy link

SparkQA commented Aug 6, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46681/

@SparkQA
Copy link

SparkQA commented Aug 6, 2021

Test build #142168 has finished for PR 33666 at commit 17470db.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

struct<>
-- !query output
org.apache.spark.sql.AnalysisException
cannot resolve '('2011-11-11 11:11:11' - TIMESTAMP '2011-11-11 11:11:10')' due to data type mismatch: argument 1 requires (timestamp or timestamp without time zone) type, however, ''2011-11-11 11:11:11'' is of string type.; line 1 pos 7
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why does this fail under ansi mode?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't have a rule for this one

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall we fix it under ansi mode? this is inconsistent: string_literal - date works but string_literal - timestamp does not.

struct<>
-- !query output
org.apache.spark.sql.catalyst.analysis.TempTableAlreadyExistsException
Temporary view 'v' already exists
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems we need to change the temp view name

@SparkQA
Copy link

SparkQA commented Aug 9, 2021

Test build #142230 has started for PR 33666 at commit d8e468f.

@SparkQA
Copy link

SparkQA commented Aug 9, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46741/

@SparkQA
Copy link

SparkQA commented Aug 9, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46741/

@SparkQA
Copy link

SparkQA commented Aug 10, 2021

Test build #142261 has finished for PR 33666 at commit eb0b080.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gengliangwang
Copy link
Member Author

jenkins, retest this please

select null - date '2019-10-06';
select date '2001-10-01' - date '2001-09-28';
select '2011-11-11 11:11:11' - date'2011-11-11';

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: we can put select str - date'2011-11-11' from v2; here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done



-- !query
select date_add(timestamp_ntz'2011-11-11 12:12:12', 1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if timestamp_ntz becomes the default in the future, it's better to support this query as well to not break anything.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The LTZ one will fail in ANSI mode. We tend to make the NTZ behavior consistent in default/ANSI mode, right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a different topic. Breaking change means it's very hard to enable NTZ by default in the future.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we tend to make LTZ/NTZ behavior consistent within default mode, or within ANSI mode, but not between default and ANSI mode.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will have a follow-up for NTZ behavior.

@SparkQA
Copy link

SparkQA commented Aug 10, 2021

Kubernetes integration test unable to build dist.

exiting with code: 1
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46768/

@SparkQA
Copy link

SparkQA commented Aug 10, 2021

Kubernetes integration test unable to build dist.

exiting with code: 1
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46770/

@SparkQA
Copy link

SparkQA commented Aug 10, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46778/

@SparkQA
Copy link

SparkQA commented Aug 10, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46778/

@SparkQA
Copy link

SparkQA commented Aug 10, 2021

Test build #142262 has finished for PR 33666 at commit eb0b080.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 10, 2021

Test build #142270 has finished for PR 33666 at commit 0e7ce1d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gengliangwang
Copy link
Member Author

Merging to master/3.2

gengliangwang added a commit that referenced this pull request Aug 11, 2021
### What changes were proposed in this pull request?

Implement a new rule for the date-time operations in the ANSI type coercion system:
1. Date will be converted to Timestamp when it is in the subtraction with Timestmap.
2. Promote string literals in date_add/date_sub/time_add

### Why are the changes needed?

Currently the type coercion rule `DateTimeOperations` doesn't match the design of the ANSI type coercion system:
1. For date_add/date_sub, if the input is timestamp type, Spark should not convert it into date type since date type is narrower than the timestamp type.
2. For date_add/date_sub/time_add, string value can be implicit cast to date/timestamp only when it is literal.

Thus, we need to have a new rule for the date-time operations in the ANSI type coercion system.

### Does this PR introduce _any_ user-facing change?

No, the ANSI type coercion rules are not releaesd.

### How was this patch tested?

New UT

Closes #33666 from gengliangwang/datetimeOp.

Authored-by: Gengliang Wang <gengliang@apache.org>
Signed-off-by: Gengliang Wang <gengliang@apache.org>
(cherry picked from commit 3029e62)
Signed-off-by: Gengliang Wang <gengliang@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants