Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-36286][SQL] Block some invalid datetime string #33490

Conversation

linhongliu-db
Copy link
Contributor

What changes were proposed in this pull request?

In PR #32959, we found some weird datetime strings that can be parsed. (details)
This PR blocks the invalid datetime string.

Why are the changes needed?

bug fix

Does this PR introduce any user-facing change?

Yes, below strings will have different results when cast to datetime.

select cast('12::' as timestamp); -- Before: 2021-07-07 12:00:00, After: NULL
select cast('T' as timestamp); -- Before: 2021-07-07 00:00:00, After: NULL

How was this patch tested?

some new test cases

@github-actions github-actions bot added the SQL label Jul 23, 2021
@@ -289,6 +290,11 @@ class DateTimeUtilsSuite extends SparkFunSuite with Matchers with SQLHelper {
checkStringToTimestamp("", None)
checkStringToTimestamp(" ", None)
checkStringToTimestamp("+", None)
checkStringToTimestamp("T", None)
checkStringToTimestamp("2015-03-18T", None)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can do some special handling to allow this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this valid? doesn't quite look like it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@srowen, thanks for reviewing. Personally I think it's invalid and this usage is blocked in my PR. But since previously it's allowed in the test case, so maybe someone will argue we should still allow it

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was it previously allowed? this test is new

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SparkQA
Copy link

SparkQA commented Jul 23, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46058/

@SparkQA
Copy link

SparkQA commented Jul 23, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46058/

@SparkQA
Copy link

SparkQA commented Jul 23, 2021

Test build #141540 has finished for PR 33490 at commit 4a18456.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@srowen srowen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pending fixing tests, this seems reasonable

@@ -289,6 +290,11 @@ class DateTimeUtilsSuite extends SparkFunSuite with Matchers with SQLHelper {
checkStringToTimestamp("", None)
checkStringToTimestamp(" ", None)
checkStringToTimestamp("+", None)
checkStringToTimestamp("T", None)
checkStringToTimestamp("2015-03-18T", None)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this valid? doesn't quite look like it

@linhongliu-db linhongliu-db changed the title [SPARK-35780][SQL][FOLLOW-UP] Block some invalid datetime string [SPARK-36286][SQL] Block some invalid datetime string Jul 26, 2021
@linhongliu-db linhongliu-db force-pushed the SPARK-35780-block-invalid-format branch from 4a18456 to 389a610 Compare July 26, 2021 07:09
@SparkQA
Copy link

SparkQA commented Jul 26, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46133/

@SparkQA
Copy link

SparkQA commented Jul 26, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46133/

@SparkQA
Copy link

SparkQA commented Jul 26, 2021

Kubernetes integration test unable to build dist.

exiting with code: 1
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46146/

@SparkQA
Copy link

SparkQA commented Jul 26, 2021

Test build #141616 has finished for PR 33490 at commit 389a610.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@linhongliu-db
Copy link
Contributor Author

cc @cloud-fan

@SparkQA
Copy link

SparkQA commented Jul 26, 2021

Test build #141630 has finished for PR 33490 at commit 48e7357.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

This is clearly a bug fix. These invalid strings are not documented as supported and it doesn't make sense to support them.

Thanks, merging to master/3.2!

@cloud-fan cloud-fan closed this in ed0e351 Jul 29, 2021
cloud-fan pushed a commit that referenced this pull request Jul 29, 2021
### What changes were proposed in this pull request?
In PR #32959, we found some weird datetime strings that can be parsed. ([details](#32959 (comment)))
This PR blocks the invalid datetime string.

### Why are the changes needed?
bug fix

### Does this PR introduce _any_ user-facing change?
Yes, below strings will have different results when cast to datetime.
```sql
select cast('12::' as timestamp); -- Before: 2021-07-07 12:00:00, After: NULL
select cast('T' as timestamp); -- Before: 2021-07-07 00:00:00, After: NULL
```

### How was this patch tested?
some new test cases

Closes #33490 from linhongliu-db/SPARK-35780-block-invalid-format.

Authored-by: Linhong Liu <linhong.liu@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(cherry picked from commit ed0e351)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
4 participants