Skip to content

[SPARK-56050][SQL] Eagerly resolve IDENTIFIER() with string literals at parse time#54888

Closed
cloud-fan wants to merge 2 commits intoapache:masterfrom
cloud-fan:SPARK-XXX-resolve-identifier-at-parse-time
Closed

[SPARK-56050][SQL] Eagerly resolve IDENTIFIER() with string literals at parse time#54888
cloud-fan wants to merge 2 commits intoapache:masterfrom
cloud-fan:SPARK-XXX-resolve-identifier-at-parse-time

Conversation

@cloud-fan
Copy link
Contributor

@cloud-fan cloud-fan commented Mar 18, 2026

What changes were proposed in this pull request?

This PR optimizes withIdentClause in AstBuilder to eagerly resolve IDENTIFIER() calls with string literal arguments at parse time instead of deferring to the analysis phase via PlanWithUnresolvedIdentifier.

Why are the changes needed?

When IDENTIFIER() is called with a string literal (e.g., IDENTIFIER('db.table')), the value is already known at parse time. The current implementation always defers resolution to analysis by wrapping it in PlanWithUnresolvedIdentifier, which is unnecessary overhead for this case.

The optimized withIdentClause resolves string literals eagerly and falls back to PlanWithUnresolvedIdentifier (deferred resolution) for non-literal expressions.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Existing tests.

Was this patch authored or co-authored using generative AI tooling?

Yes, Claude Opus 4.6.

@cloud-fan cloud-fan changed the title [SPARK-XXX][SQL] Eagerly resolve IDENTIFIER() with string literals at parse time [SPARK-56050][SQL] Eagerly resolve IDENTIFIER() with string literals at parse time Mar 18, 2026
… parse time

When IDENTIFIER() is called with a string literal argument (e.g., IDENTIFIER('db.table')), the identifier can be resolved immediately at parse time rather than deferring to analysis via PlanWithUnresolvedIdentifier.

This optimizes withIdentClause to check if the expression is a string literal and resolve it eagerly, falling back to deferred resolution for non-literal expressions.
@cloud-fan cloud-fan force-pushed the SPARK-XXX-resolve-identifier-at-parse-time branch from 5beb0d9 to 05867ee Compare March 18, 2026 19:11
@cloud-fan cloud-fan changed the title [SPARK-56050][SQL] Eagerly resolve IDENTIFIER() with string literals at parse time [SPARK-XXX][SQL] Eagerly resolve IDENTIFIER() with string literals at parse time Mar 18, 2026
@cloud-fan cloud-fan changed the title [SPARK-XXX][SQL] Eagerly resolve IDENTIFIER() with string literals at parse time [SPARK-56050][SQL] Eagerly resolve IDENTIFIER() with string literals at parse time Mar 18, 2026
…withIdentClause

The IdentifierReferenceContext only covers the identifier part of the SQL
statement, not the full statement. Wrapping the builder call with
withOrigin(ctx) caused ParseException to capture the narrow identifier
position instead of the full statement origin set by the calling visit
methods.

Co-authored-by: Isaac
Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think we can have a test coverage for this, @cloud-fan ?

BTW, FYI, when I ask Cluade Caude to write a PR description based on the pull request template (here, I used the file reference feature of Claude Claude), Claude Code works well by respecting the ASF and Apache Spark Pull Request template like the the following. In these days, it's rare to write the PR description from the scratches.

If generative AI tooling has been used in the process of authoring this patch, please include the
phrase: 'Generated-by: ' followed by the name of the tool and its version.

https://www.apache.org/legal/generative-tooling.html

### Was this patch authored or co-authored using generative AI tooling?
- Yes, Claude Opus 4.6.
+ Generated-by: Claude Code (Claude Opus 4.6)

@dongjoon-hyun
Copy link
Member

cc @peter-toth for this SQL parser optimization.

@dongjoon-hyun
Copy link
Member

Merged to master for Apache Spark 4.2.0 because the test coverage or PR description is nit.

@cloud-fan
Copy link
Contributor Author

it's a perf improvement so I didn't add a new test. Actually the PR description was generated by claude code, hopefully llm can follow the PR template more strictly after we improving AGENTS.md (I didn't explicitly ask llm to follow PR template as I thought it's a common rule).

terana pushed a commit to terana/spark that referenced this pull request Mar 23, 2026
…at parse time

### What changes were proposed in this pull request?

This PR optimizes `withIdentClause` in `AstBuilder` to eagerly resolve `IDENTIFIER()` calls with string literal arguments at parse time instead of deferring to the analysis phase via `PlanWithUnresolvedIdentifier`.

### Why are the changes needed?

When `IDENTIFIER()` is called with a string literal (e.g., `IDENTIFIER('db.table')`), the value is already known at parse time. The current implementation always defers resolution to analysis by wrapping it in `PlanWithUnresolvedIdentifier`, which is unnecessary overhead for this case.

The optimized `withIdentClause` resolves string literals eagerly and falls back to `PlanWithUnresolvedIdentifier` (deferred resolution) for non-literal expressions.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Existing tests.

### Was this patch authored or co-authored using generative AI tooling?

Yes, Claude Opus 4.6.

Closes apache#54888 from cloud-fan/SPARK-XXX-resolve-identifier-at-parse-time.

Authored-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
cloud-fan added a commit to cloud-fan/spark that referenced this pull request Mar 25, 2026
…hOrigin pitfall

Followup to apache#54888.

Adding a comment to withIdentClause explaining why withOrigin(ctx)
should not be added to this method, and where to add it instead.
HyukjinKwon pushed a commit that referenced this pull request Mar 25, 2026
…hOrigin pitfall

### What changes were proposed in this pull request?

Followup to #54888.

This PR adds a documentation comment to `withIdentClause` in `AstBuilder` explaining why `withOrigin(ctx)` should not be added to the method body, and where to place it instead (inside specific call site builder lambdas).

### Why are the changes needed?

1. `withIdentClause` is called from many places with builders that create different types of plans — some create full command plans (e.g., `ReplaceTable`, `CreateFunction`), not just identifier-related plans.
2. Adding `withOrigin(ctx)` to the method body would set `CurrentOrigin` to the identifier reference for all code in the builder, including error-throwing validation code. Since `ParseException.getQueryContext()` reads from `CurrentOrigin`, this would cause error contexts to incorrectly point to the identifier instead of the full statement.
3. This is a non-obvious pitfall that is worth documenting to prevent future contributors from introducing this bug.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Comment-only change, no tests needed.

### Was this patch authored or co-authored using generative AI tooling?

Yes

Closes #55001 from cloud-fan/SPARK-56050-followup.

Authored-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants