Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-32840][SQL] Invalid interval value can happen to be just adhesive with the unit #29708

Closed
wants to merge 4 commits into from

Conversation

yaooqinn
Copy link
Member

@yaooqinn yaooqinn commented Sep 10, 2020

What changes were proposed in this pull request?

In this PR, we add a checker for STRING form interval value ahead for parsing multiple units intervals and fail directly if the interval value contains alphabets to prevent correctness issues like interval '1 day 2' day=3 days.

Why are the changes needed?

fix correctness issue

Does this PR introduce any user-facing change?

yes, in spark 3.0.0 interval '1 day 2' day=3 days but now we fail with ParseException

How was this patch tested?

add a test.

@yaooqinn
Copy link
Member Author

cc @cloud-fan @maropu @HyukjinKwon and thanks very much for the review.

@@ -2120,6 +2120,8 @@ class AstBuilder(conf: SQLConf) extends SqlBaseBaseVisitor[AnyRef] with Logging
}
}

private final val alphabet = "[a-zA-Z]".r.pattern
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of checking the invalid ones, how about we make sure the value only has digits and dots?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK updated.

@@ -2132,7 +2134,12 @@ class AstBuilder(conf: SQLConf) extends SqlBaseBaseVisitor[AnyRef] with Logging
val kvs = units.indices.map { i =>
val u = units(i).getText
val v = if (values(i).STRING() != null) {
string(values(i).STRING())
val value = string(values(i).STRING()).trim
if (!intervalValStrPattern.matcher(value).matches()) {
Copy link
Contributor

@cloud-fan cloud-fan Sep 10, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about !value.forall { c => c == '.' || c == "+" || c == "-" || Character.isDigit(c)}? We don't have to use regex.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok maybe the first version is better, as we basically just want to avoid the value string to contain unit. value.exists(Character.isLetter) should be good enough.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And we should add comments to explain why we need the check.

@cloud-fan
Copy link
Contributor

github action passed, I'm merging it to master, thanks!

@cloud-fan cloud-fan closed this in 5669b21 Sep 10, 2020
@cloud-fan
Copy link
Contributor

@yaooqinn can you open a new PR for 3.0?

@SparkQA
Copy link

SparkQA commented Sep 10, 2020

Test build #128501 has finished for PR 29708 at commit 7e98555.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 10, 2020

Test build #128497 has finished for PR 29708 at commit 480fe6d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 10, 2020

Test build #128499 has finished for PR 29708 at commit aa9e9bf.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 10, 2020

Test build #128500 has finished for PR 29708 at commit a9294a1.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

maropu pushed a commit that referenced this pull request Sep 11, 2020
…adhesive with the unit

THIS PR backports #29708 to 3.0

### What changes were proposed in this pull request?
In this PR, we add a checker for STRING form interval value ahead for parsing multiple units intervals and fail directly if the interval value contains alphabets to prevent correctness issues like `interval '1 day 2' day`=`3 days`.

### Why are the changes needed?

fix correctness issue

### Does this PR introduce _any_ user-facing change?

yes, in spark 3.0.0 `interval '1 day 2' day`=`3 days` but now we fail with ParseException
### How was this patch tested?

add a test.

Closes #29716 from yaooqinn/SPARK-32840-30.

Authored-by: Kent Yao <yaooqinn@hotmail.com>
Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>
holdenk pushed a commit to holdenk/spark that referenced this pull request Oct 27, 2020
…adhesive with the unit

THIS PR backports apache#29708 to 3.0

### What changes were proposed in this pull request?
In this PR, we add a checker for STRING form interval value ahead for parsing multiple units intervals and fail directly if the interval value contains alphabets to prevent correctness issues like `interval '1 day 2' day`=`3 days`.

### Why are the changes needed?

fix correctness issue

### Does this PR introduce _any_ user-facing change?

yes, in spark 3.0.0 `interval '1 day 2' day`=`3 days` but now we fail with ParseException
### How was this patch tested?

add a test.

Closes apache#29716 from yaooqinn/SPARK-32840-30.

Authored-by: Kent Yao <yaooqinn@hotmail.com>
Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
4 participants