-
Notifications
You must be signed in to change notification settings - Fork 28.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-32840][SQL] Invalid interval value can happen to be just adhesive with the unit #29708
Conversation
…ive with the unit
cc @cloud-fan @maropu @HyukjinKwon and thanks very much for the review. |
@@ -2120,6 +2120,8 @@ class AstBuilder(conf: SQLConf) extends SqlBaseBaseVisitor[AnyRef] with Logging | |||
} | |||
} | |||
|
|||
private final val alphabet = "[a-zA-Z]".r.pattern |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of checking the invalid ones, how about we make sure the value only has digits and dots?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK updated.
@@ -2132,7 +2134,12 @@ class AstBuilder(conf: SQLConf) extends SqlBaseBaseVisitor[AnyRef] with Logging | |||
val kvs = units.indices.map { i => | |||
val u = units(i).getText | |||
val v = if (values(i).STRING() != null) { | |||
string(values(i).STRING()) | |||
val value = string(values(i).STRING()).trim | |||
if (!intervalValStrPattern.matcher(value).matches()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about !value.forall { c => c == '.' || c == "+" || c == "-" || Character.isDigit(c)}
? We don't have to use regex.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok maybe the first version is better, as we basically just want to avoid the value
string to contain unit. value.exists(Character.isLetter)
should be good enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And we should add comments to explain why we need the check.
github action passed, I'm merging it to master, thanks! |
@yaooqinn can you open a new PR for 3.0? |
Test build #128501 has finished for PR 29708 at commit
|
Test build #128497 has finished for PR 29708 at commit
|
Test build #128499 has finished for PR 29708 at commit
|
Test build #128500 has finished for PR 29708 at commit
|
…adhesive with the unit THIS PR backports #29708 to 3.0 ### What changes were proposed in this pull request? In this PR, we add a checker for STRING form interval value ahead for parsing multiple units intervals and fail directly if the interval value contains alphabets to prevent correctness issues like `interval '1 day 2' day`=`3 days`. ### Why are the changes needed? fix correctness issue ### Does this PR introduce _any_ user-facing change? yes, in spark 3.0.0 `interval '1 day 2' day`=`3 days` but now we fail with ParseException ### How was this patch tested? add a test. Closes #29716 from yaooqinn/SPARK-32840-30. Authored-by: Kent Yao <yaooqinn@hotmail.com> Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>
…adhesive with the unit THIS PR backports apache#29708 to 3.0 ### What changes were proposed in this pull request? In this PR, we add a checker for STRING form interval value ahead for parsing multiple units intervals and fail directly if the interval value contains alphabets to prevent correctness issues like `interval '1 day 2' day`=`3 days`. ### Why are the changes needed? fix correctness issue ### Does this PR introduce _any_ user-facing change? yes, in spark 3.0.0 `interval '1 day 2' day`=`3 days` but now we fail with ParseException ### How was this patch tested? add a test. Closes apache#29716 from yaooqinn/SPARK-32840-30. Authored-by: Kent Yao <yaooqinn@hotmail.com> Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>
What changes were proposed in this pull request?
In this PR, we add a checker for STRING form interval value ahead for parsing multiple units intervals and fail directly if the interval value contains alphabets to prevent correctness issues like
interval '1 day 2' day
=3 days
.Why are the changes needed?
fix correctness issue
Does this PR introduce any user-facing change?
yes, in spark 3.0.0
interval '1 day 2' day
=3 days
but now we fail with ParseExceptionHow was this patch tested?
add a test.