[SPARK-52709][SQL] Fix parsing of STRUCT<> #51480
[SPARK-52709][SQL] Fix parsing of STRUCT<> #51480ManosGEM wants to merge 3 commits intoapache:masterfrom
Conversation
mihailomilosevic2001
left a comment
There was a problem hiding this comment.
@ManosGEM Thanks for working on this issue. I have left a few comments on the PR. Some of them are just to make code more durable to future errors of this type. Please feel free to ping me for another round of review when you go through them.
sql/api/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseLexer.g4
Outdated
Show resolved
Hide resolved
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/PlanParserSuite.scala
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
In general, if we are adding tests to specific files, we try to reuse as much boiler plate code that is there. Could you please rewrite this test to have similar structure as the tests above?
There was a problem hiding this comment.
By this, I mean to reuse comparePlans. Also, I would say we can move this test to SparkSqlParserSuite as the point of this test is to enable parsing of different queries.
There was a problem hiding this comment.
I will try my best to rewrite it in the manner you propose.
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/PlanParserSuite.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/PlanParserSuite.scala
Outdated
Show resolved
Hide resolved
|
nit: We usually like to name the PR the same way the ticket is named on Jira. Could you please align the PR name with Jira? |
remove the word "null" then? Because the only other difference is the [SQL] tag that I added according to the guidelines. |
|
Yeah, I meant the main name of the PR, the tags are all good. |
This commit addresses a parsing issue where the `STRUCT<>` data type was incorrectly handled, especially when appearing before operators like the bitwise shift (`>>`). The problem stemmed from the parser misinterpreting the angle brackets of `STRUCT<>` as the 'not equal' operator (`<>`), leading to syntax errors. The fix ensures `STRUCT<>` is correctly recognized as a data type. A new test case in `PlanParserSuite.scala` confirms that queries with `CAST(null AS STRUCT<>)`, nested structs, and `>>` now parse correctly.
88f22d9 to
81f7573
Compare
sql/core/src/test/scala/org/apache/spark/sql/execution/SparkSqlParserSuite.scala
Outdated
Show resolved
Hide resolved
…ta types. This commit follows SPARK-52709 by: - Removing from parsing rules. - Relocating relevant parser tests from to . - Refactoring the test setup for complex data types into a reusable helper function. - Adding comprehensive tests for valid , nested , , and their combinations to . - Adding negative tests for invalid empty and types to confirm correct behavior.
81f7573 to
76c0544
Compare
mihailomilosevic2001
left a comment
There was a problem hiding this comment.
LGTM, @MaxGekk @cloud-fan Could you please have a last review/merge.
|
good catch! merging to master! |
|
@ManosGEM Can you open a branch 4.0 backport PR? thanks! |
|
@cloud-fan Everything should be the same in the PR but open it on the branch-4.0 ? Sorry but this is actually my first PR on Spark. |
|
It has merge conflicts, you will need to git cherry-pick this commit against branch-4.0 locally first, resolve merge conflicts, and open a new PR against branch-4.0 |
|
This PR introduced a regression. That being said, Let me revert this PR first. |
What changes were proposed in this pull request?
This PR fixes an issue in Spark SQL's parser where empty or nested STRUCT<> types cause incorrect parenthesis tracking and parsing failures. Previously, the parser increased the parenthesis depth counter upon encountering the keyword STRUCT. Due to the operator precedence in the lexer (e.g., NEQ is matched before LT), a construct like STRUCT<> could incorrectly be tokenized as STRUCT and NEQ. This caused the parser to increase the nesting counter without ever decreasing it, eventually resulting in a syntax error.
For example, the following valid query fails under the current logic:
SELECT cast(null as STRUCT<>), 2 >> 1;To fix this, we adjusted the definition of the NEQ token in the SQL lexer so that it no longer matches <> when used in a complex data type. This ensures that the parser correctly interprets the angle brackets as part of a type specification rather than as a comparison operator.
Why are the changes needed?
bug fix.
By modifying the NEQ token rule to avoid incorrectly matching <> in this context, we ensure that:
This change improves Spark SQL’s compatibility with standard SQL syntax and user expectations.
Does this PR introduce any user-facing change?
No
How was this patch tested?
A new test case has been added to PlanParserSuite.scala to specifically verify that queries containing CAST(null AS STRUCT<>), nested STRUCT<> (if applicable), and the >> operator now parse successfully into the expected logical plan.
Was this patch authored or co-authored using generative AI tooling?
No
Closes SPARK-52709
Reported by : @mihailom-db