[SPARK-56050][SQL] Eagerly resolve IDENTIFIER() with string literals at parse time by cloud-fan · Pull Request #54888 · apache/spark

cloud-fan · 2026-03-18T18:51:15Z

What changes were proposed in this pull request?

This PR optimizes withIdentClause in AstBuilder to eagerly resolve IDENTIFIER() calls with string literal arguments at parse time instead of deferring to the analysis phase via PlanWithUnresolvedIdentifier.

Why are the changes needed?

When IDENTIFIER() is called with a string literal (e.g., IDENTIFIER('db.table')), the value is already known at parse time. The current implementation always defers resolution to analysis by wrapping it in PlanWithUnresolvedIdentifier, which is unnecessary overhead for this case.

The optimized withIdentClause resolves string literals eagerly and falls back to PlanWithUnresolvedIdentifier (deferred resolution) for non-literal expressions.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Existing tests.

Was this patch authored or co-authored using generative AI tooling?

Yes, Claude Opus 4.6.

… parse time When IDENTIFIER() is called with a string literal argument (e.g., IDENTIFIER('db.table')), the identifier can be resolved immediately at parse time rather than deferring to analysis via PlanWithUnresolvedIdentifier. This optimizes withIdentClause to check if the expression is a string literal and resolve it eagerly, falling back to deferred resolution for non-literal expressions.

…withIdentClause The IdentifierReferenceContext only covers the identifier part of the SQL statement, not the full statement. Wrapping the builder call with withOrigin(ctx) caused ParseException to capture the narrow identifier position instead of the full statement origin set by the calling visit methods. Co-authored-by: Isaac

dongjoon-hyun

Do you think we can have a test coverage for this, @cloud-fan ?

BTW, FYI, when I ask Cluade Caude to write a PR description based on the pull request template (here, I used the file reference feature of Claude Claude), Claude Code works well by respecting the ASF and Apache Spark Pull Request template like the the following. In these days, it's rare to write the PR description from the scratches.

spark/.github/PULL_REQUEST_TEMPLATE

Lines 55 to 56 in 11a99d0

    
           If generative AI tooling has been used in the process of authoring this patch, please include the 
        
           phrase: 'Generated-by: ' followed by the name of the tool and its version.

https://www.apache.org/legal/generative-tooling.html

### Was this patch authored or co-authored using generative AI tooling?
- Yes, Claude Opus 4.6.
+ Generated-by: Claude Code (Claude Opus 4.6)

dongjoon-hyun · 2026-03-19T16:16:54Z

cc @peter-toth for this SQL parser optimization.

dongjoon-hyun · 2026-03-19T23:07:55Z

Merged to master for Apache Spark 4.2.0 because the test coverage or PR description is nit.

cloud-fan · 2026-03-20T10:53:28Z

it's a perf improvement so I didn't add a new test. Actually the PR description was generated by claude code, hopefully llm can follow the PR template more strictly after we improving AGENTS.md (I didn't explicitly ask llm to follow PR template as I thought it's a common rule).

…at parse time ### What changes were proposed in this pull request? This PR optimizes `withIdentClause` in `AstBuilder` to eagerly resolve `IDENTIFIER()` calls with string literal arguments at parse time instead of deferring to the analysis phase via `PlanWithUnresolvedIdentifier`. ### Why are the changes needed? When `IDENTIFIER()` is called with a string literal (e.g., `IDENTIFIER('db.table')`), the value is already known at parse time. The current implementation always defers resolution to analysis by wrapping it in `PlanWithUnresolvedIdentifier`, which is unnecessary overhead for this case. The optimized `withIdentClause` resolves string literals eagerly and falls back to `PlanWithUnresolvedIdentifier` (deferred resolution) for non-literal expressions. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Existing tests. ### Was this patch authored or co-authored using generative AI tooling? Yes, Claude Opus 4.6. Closes apache#54888 from cloud-fan/SPARK-XXX-resolve-identifier-at-parse-time. Authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>

…hOrigin pitfall Followup to apache#54888. Adding a comment to withIdentClause explaining why withOrigin(ctx) should not be added to this method, and where to add it instead.

…hOrigin pitfall ### What changes were proposed in this pull request? Followup to #54888. This PR adds a documentation comment to `withIdentClause` in `AstBuilder` explaining why `withOrigin(ctx)` should not be added to the method body, and where to place it instead (inside specific call site builder lambdas). ### Why are the changes needed? 1. `withIdentClause` is called from many places with builders that create different types of plans — some create full command plans (e.g., `ReplaceTable`, `CreateFunction`), not just identifier-related plans. 2. Adding `withOrigin(ctx)` to the method body would set `CurrentOrigin` to the identifier reference for all code in the builder, including error-throwing validation code. Since `ParseException.getQueryContext()` reads from `CurrentOrigin`, this would cause error contexts to incorrectly point to the identifier instead of the full statement. 3. This is a non-obvious pitfall that is worth documenting to prevent future contributors from introducing this bug. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Comment-only change, no tests needed. ### Was this patch authored or co-authored using generative AI tooling? Yes Closes #55001 from cloud-fan/SPARK-56050-followup. Authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>

cloud-fan changed the title ~~[SPARK-XXX][SQL] Eagerly resolve IDENTIFIER() with string literals at parse time~~ [SPARK-56050][SQL] Eagerly resolve IDENTIFIER() with string literals at parse time Mar 18, 2026

cloud-fan force-pushed the SPARK-XXX-resolve-identifier-at-parse-time branch from 5beb0d9 to 05867ee Compare March 18, 2026 19:11

cloud-fan changed the title ~~[SPARK-56050][SQL] Eagerly resolve IDENTIFIER() with string literals at parse time~~ [SPARK-XXX][SQL] Eagerly resolve IDENTIFIER() with string literals at parse time Mar 18, 2026

cloud-fan changed the title ~~[SPARK-XXX][SQL] Eagerly resolve IDENTIFIER() with string literals at parse time~~ [SPARK-56050][SQL] Eagerly resolve IDENTIFIER() with string literals at parse time Mar 18, 2026

dongjoon-hyun approved these changes Mar 19, 2026

View reviewed changes

peter-toth approved these changes Mar 19, 2026

View reviewed changes

dongjoon-hyun closed this in cbcf2be Mar 19, 2026

cloud-fan mentioned this pull request Mar 25, 2026

[SPARK-56050][SQL][FOLLOWUP] Add comment to withIdentClause about withOrigin pitfall #55001

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-56050][SQL] Eagerly resolve IDENTIFIER() with string literals at parse time#54888

[SPARK-56050][SQL] Eagerly resolve IDENTIFIER() with string literals at parse time#54888
cloud-fan wants to merge 2 commits intoapache:masterfrom
cloud-fan:SPARK-XXX-resolve-identifier-at-parse-time

cloud-fan commented Mar 18, 2026 •

edited

Loading

Uh oh!

dongjoon-hyun left a comment

Uh oh!

dongjoon-hyun commented Mar 19, 2026

Uh oh!

dongjoon-hyun commented Mar 19, 2026

Uh oh!

cloud-fan commented Mar 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	If generative AI tooling has been used in the process of authoring this patch, please include the
	phrase: 'Generated-by: ' followed by the name of the tool and its version.

Conversation

cloud-fan commented Mar 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

dongjoon-hyun left a comment

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun commented Mar 19, 2026

Uh oh!

dongjoon-hyun commented Mar 19, 2026

Uh oh!

cloud-fan commented Mar 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

cloud-fan commented Mar 18, 2026 •

edited

Loading