Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-41092][SQL] Do not use identifier to match interval units #38583

Closed
wants to merge 1 commit into from

Conversation

cloud-fan
Copy link
Contributor

@cloud-fan cloud-fan commented Nov 9, 2022

What changes were proposed in this pull request?

The current antlr-based SQL parser is pretty fragile due to the fact that we make the antlr parser rule pretty flexible and push more parsing logic to the Scala side (AstBuilder). A tiny change to the antlr parser rule may break the parser unexpectedly. As an example, in #38404 , we added a new parser rule to extend the INSERT syntax, and it breaks interval literal. select b + interval '1 month' from values (1, 1) can't be parsed after #38404

This PR makes the interval literal parser rule stricter. Now it lists all the allowed interval units instead of matching an identifier. This fixes the issue we hit in #38404 .

In the future, we can revisit other parser rules and try to rely on antlr more to do the parsing work.

Why are the changes needed?

fix parser issues we hit in #38404

Does this PR introduce any user-facing change?

The error message is changed a little bit for SELECT INTERVAL 1 wrong_unit.

How was this patch tested?

existing tests

@github-actions github-actions bot added the SQL label Nov 9, 2022
@github-actions github-actions bot added the DOCS label Nov 10, 2022
@cloud-fan cloud-fan changed the title [WIP] Do not use identifier to match interval units [SPARK-41092][SQL] Do not use identifier to match interval units Nov 10, 2022
@cloud-fan
Copy link
Contributor Author

cc @gengliangwang @MaxGekk @viirya

@@ -407,6 +407,7 @@ Below is a list of all the keywords in Spark SQL.
|DATEADD|non-reserved|non-reserved|non-reserved|
|DATEDIFF|non-reserved|non-reserved|non-reserved|
|DAY|non-reserved|non-reserved|non-reserved|
|DAYS|non-reserved|non-reserved|non-reserved|
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I remember we don't support such plural units in other functions of interval parsing. How about to not mix this new feature (new units) with making the parser stricter?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's not a new feature. InternalUtils.stringToInterval allows ending s in unit names.

Copy link
Member

@viirya viirya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change looks good. But seems it doesn't add test for select b + interval '1 month' from values (1, 1)?

@cloud-fan
Copy link
Contributor Author

But seems it doesn't add test for select b + interval '1 month' from values (1, 1)?

This is an existing test and it fails in #38404 . After the changes here, it can pass.

Copy link
Member

@viirya viirya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks okay to me. I checked the parser code, but didn't see we disallow multi-units for some cases.

"messageParameters" : {
"msg" : "Error parsing ' 1 fake_unit' to interval, invalid unit 'fake_unit'"
},
"queryContext" : [ {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not good that we lost the context here but this is another issue.

@MaxGekk
Copy link
Member

MaxGekk commented Nov 10, 2022

+1, LGTM. Merging to master.
Thank you, @cloud-fan and @viirya for review.

@MaxGekk MaxGekk closed this in 012d99d Nov 10, 2022
SandishKumarHN pushed a commit to SandishKumarHN/spark that referenced this pull request Dec 12, 2022
### What changes were proposed in this pull request?

The current antlr-based SQL parser is pretty fragile due to the fact that we make the antlr parser rule pretty flexible and push more parsing logic to the Scala side (`AstBuilder`). A tiny change to the antlr parser rule may break the parser unexpectedly. As an example, in apache#38404 , we added a new parser rule to extend the INSERT syntax, and it breaks interval literal. `select b + interval '1 month' from values (1, 1)` can't be parsed after apache#38404

This PR makes the interval literal parser rule stricter. Now it lists all the allowed interval units instead of matching an identifier. This fixes the issue we hit in apache#38404 .

In the future, we can revisit other parser rules and try to rely on antlr more to do the parsing work.

### Why are the changes needed?

fix parser issues we hit in apache#38404

### Does this PR introduce _any_ user-facing change?

The error message is changed a little bit for `SELECT INTERVAL 1 wrong_unit`.

### How was this patch tested?

existing tests

Closes apache#38583 from cloud-fan/parser.

Authored-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: Max Gekk <max.gekk@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants