New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-41092][SQL] Do not use identifier to match interval units #38583
Conversation
@@ -407,6 +407,7 @@ Below is a list of all the keywords in Spark SQL. | |||
|DATEADD|non-reserved|non-reserved|non-reserved| | |||
|DATEDIFF|non-reserved|non-reserved|non-reserved| | |||
|DAY|non-reserved|non-reserved|non-reserved| | |||
|DAYS|non-reserved|non-reserved|non-reserved| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I remember we don't support such plural units in other functions of interval parsing. How about to not mix this new feature (new units) with making the parser stricter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's not a new feature. InternalUtils.stringToInterval
allows ending s
in unit names.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The change looks good. But seems it doesn't add test for select b + interval '1 month' from values (1, 1)
?
This is an existing test and it fails in #38404 . After the changes here, it can pass. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks okay to me. I checked the parser code, but didn't see we disallow multi-units for some cases.
"messageParameters" : { | ||
"msg" : "Error parsing ' 1 fake_unit' to interval, invalid unit 'fake_unit'" | ||
}, | ||
"queryContext" : [ { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not good that we lost the context here but this is another issue.
+1, LGTM. Merging to master. |
### What changes were proposed in this pull request? The current antlr-based SQL parser is pretty fragile due to the fact that we make the antlr parser rule pretty flexible and push more parsing logic to the Scala side (`AstBuilder`). A tiny change to the antlr parser rule may break the parser unexpectedly. As an example, in apache#38404 , we added a new parser rule to extend the INSERT syntax, and it breaks interval literal. `select b + interval '1 month' from values (1, 1)` can't be parsed after apache#38404 This PR makes the interval literal parser rule stricter. Now it lists all the allowed interval units instead of matching an identifier. This fixes the issue we hit in apache#38404 . In the future, we can revisit other parser rules and try to rely on antlr more to do the parsing work. ### Why are the changes needed? fix parser issues we hit in apache#38404 ### Does this PR introduce _any_ user-facing change? The error message is changed a little bit for `SELECT INTERVAL 1 wrong_unit`. ### How was this patch tested? existing tests Closes apache#38583 from cloud-fan/parser. Authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by: Max Gekk <max.gekk@gmail.com>
What changes were proposed in this pull request?
The current antlr-based SQL parser is pretty fragile due to the fact that we make the antlr parser rule pretty flexible and push more parsing logic to the Scala side (
AstBuilder
). A tiny change to the antlr parser rule may break the parser unexpectedly. As an example, in #38404 , we added a new parser rule to extend the INSERT syntax, and it breaks interval literal.select b + interval '1 month' from values (1, 1)
can't be parsed after #38404This PR makes the interval literal parser rule stricter. Now it lists all the allowed interval units instead of matching an identifier. This fixes the issue we hit in #38404 .
In the future, we can revisit other parser rules and try to rely on antlr more to do the parsing work.
Why are the changes needed?
fix parser issues we hit in #38404
Does this PR introduce any user-facing change?
The error message is changed a little bit for
SELECT INTERVAL 1 wrong_unit
.How was this patch tested?
existing tests