[SPARK-52709][SQL] Fix parsing of STRUCT<> by ManosGEM · Pull Request #51480 · apache/spark

ManosGEM · 2025-07-14T19:10:09Z

What changes were proposed in this pull request?

This PR fixes an issue in Spark SQL's parser where empty or nested STRUCT<> types cause incorrect parenthesis tracking and parsing failures. Previously, the parser increased the parenthesis depth counter upon encountering the keyword STRUCT. Due to the operator precedence in the lexer (e.g., NEQ is matched before LT), a construct like STRUCT<> could incorrectly be tokenized as STRUCT and NEQ. This caused the parser to increase the nesting counter without ever decreasing it, eventually resulting in a syntax error.
For example, the following valid query fails under the current logic:
SELECT cast(null as STRUCT<>), 2 >> 1;

To fix this, we adjusted the definition of the NEQ token in the SQL lexer so that it no longer matches <> when used in a complex data type. This ensures that the parser correctly interprets the angle brackets as part of a type specification rather than as a comparison operator.

Why are the changes needed?

bug fix.
By modifying the NEQ token rule to avoid incorrectly matching <> in this context, we ensure that:

Empty STRUCT types like STRUCT<> are parsed correctly.

Nested and complex STRUCT types are supported without breaking parsing logic.

Queries with complex data types and bitwise operations (e.g., >>), are no longer broken due to incorrect token handling.

This change improves Spark SQL’s compatibility with standard SQL syntax and user expectations.

Does this PR introduce any user-facing change?

No

How was this patch tested?

A new test case has been added to PlanParserSuite.scala to specifically verify that queries containing CAST(null AS STRUCT<>), nested STRUCT<> (if applicable), and the >> operator now parse successfully into the expected logical plan.

Was this patch authored or co-authored using generative AI tooling?

No

Closes SPARK-52709
Reported by : @mihailom-db

mihailomilosevic2001

@ManosGEM Thanks for working on this issue. I have left a few comments on the PR. Some of them are just to make code more durable to future errors of this type. Please feel free to ping me for another round of review when you go through them.

sql/api/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseLexer.g4

sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/PlanParserSuite.scala

mihailomilosevic2001 · 2025-07-14T19:42:48Z

sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/PlanParserSuite.scala

In general, if we are adding tests to specific files, we try to reuse as much boiler plate code that is there. Could you please rewrite this test to have similar structure as the tests above?

By this, I mean to reuse comparePlans. Also, I would say we can move this test to SparkSqlParserSuite as the point of this test is to enable parsing of different queries.

I will try my best to rewrite it in the manner you propose.

sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/PlanParserSuite.scala

mihailomilosevic2001 · 2025-07-15T05:52:27Z

nit: We usually like to name the PR the same way the ticket is named on Jira. Could you please align the PR name with Jira?

ManosGEM · 2025-07-15T07:33:19Z

nit: We usually like to name the PR the same way the ticket is named on Jira. Could you please align the PR name with Jira?

remove the word "null" then? Because the only other difference is the [SQL] tag that I added according to the guidelines.

mihailomilosevic2001 · 2025-07-15T10:00:17Z

Yeah, I meant the main name of the PR, the tags are all good.

This commit addresses a parsing issue where the `STRUCT<>` data type was incorrectly handled, especially when appearing before operators like the bitwise shift (`>>`). The problem stemmed from the parser misinterpreting the angle brackets of `STRUCT<>` as the 'not equal' operator (`<>`), leading to syntax errors. The fix ensures `STRUCT<>` is correctly recognized as a data type. A new test case in `PlanParserSuite.scala` confirms that queries with `CAST(null AS STRUCT<>)`, nested structs, and `>>` now parse correctly.

sql/core/src/test/scala/org/apache/spark/sql/execution/SparkSqlParserSuite.scala

…ta types. This commit follows SPARK-52709 by: - Removing from parsing rules. - Relocating relevant parser tests from to . - Refactoring the test setup for complex data types into a reusable helper function. - Adding comprehensive tests for valid , nested , , and their combinations to . - Adding negative tests for invalid empty and types to confirm correct behavior.

mihailomilosevic2001

LGTM, @MaxGekk @cloud-fan Could you please have a last review/merge.

cloud-fan · 2025-07-24T04:22:14Z

good catch! merging to master!

cloud-fan · 2025-07-24T04:24:19Z

@ManosGEM Can you open a branch 4.0 backport PR? thanks!

ManosGEM · 2025-07-24T07:49:33Z

@cloud-fan Everything should be the same in the PR but open it on the branch-4.0 ? Sorry but this is actually my first PR on Spark.

cloud-fan · 2025-07-25T09:39:27Z

It has merge conflicts, you will need to git cherry-pick this commit against branch-4.0 locally first, resolve merge conflicts, and open a new PR against branch-4.0

cloud-fan · 2025-08-18T11:37:50Z

This PR introduced a regression. complex_type_level_counter is increased when the lexer sees STRUCR/ARRAY/MAP. However, STRUCR/ARRAY/MAP are also function names and ARRAY(col1 <> col2) should be allowed.

That being said, complex_type_level_counter itself already has bugs and ARRAY(col >> 1) should be allowed but not today. I think we can't disambiguate it at the lexer side, but we should handle it at the parser. For example, STRUCT<> can be an empty struct type, but can also be part of struct <> another_col where <> means "not equal to".

Let me revert this PR first.

github-actions bot added the SQL label Jul 14, 2025

mihailomilosevic2001 reviewed Jul 14, 2025

View reviewed changes

ManosGEM changed the title ~~[SPARK-52709][SQL] Fix parsing of null STRUCT<>~~ [SPARK-52709][SQL] Fix parsing of STRUCT<> Jul 15, 2025

ManosGEM force-pushed the SPARK-52709-fix-empty-struct-parsing branch from 88f22d9 to 81f7573 Compare July 18, 2025 08:23

mihailomilosevic2001 reviewed Jul 18, 2025

View reviewed changes

sql/core/src/test/scala/org/apache/spark/sql/execution/SparkSqlParserSuite.scala Outdated Show resolved Hide resolved

ManosGEM force-pushed the SPARK-52709-fix-empty-struct-parsing branch from 81f7573 to 76c0544 Compare July 18, 2025 11:46

Merge branch 'apache:master' into SPARK-52709-fix-empty-struct-parsing

569e2b2

mihailomilosevic2001 approved these changes Jul 23, 2025

View reviewed changes

cloud-fan approved these changes Jul 24, 2025

View reviewed changes

cloud-fan closed this in 64cada1 Jul 24, 2025

Conversation

ManosGEM commented Jul 14, 2025

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

mihailomilosevic2001 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

mihailomilosevic2001 Jul 14, 2025

Choose a reason for hiding this comment

Uh oh!

mihailomilosevic2001 Jul 14, 2025

Choose a reason for hiding this comment

Uh oh!

ManosGEM Jul 14, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

mihailomilosevic2001 commented Jul 15, 2025

Uh oh!

ManosGEM commented Jul 15, 2025

Uh oh!

mihailomilosevic2001 commented Jul 15, 2025

Uh oh!

Uh oh!

mihailomilosevic2001 left a comment

Choose a reason for hiding this comment

Uh oh!

cloud-fan commented Jul 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cloud-fan commented Jul 24, 2025

Uh oh!

ManosGEM commented Jul 24, 2025

Uh oh!

cloud-fan commented Jul 25, 2025

Uh oh!

cloud-fan commented Aug 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

cloud-fan commented Jul 24, 2025 •

edited

Loading

cloud-fan commented Aug 18, 2025 •

edited

Loading