Skip to content

[SPARK-55983][SQL] New single-pass analyzer functionality and bugfixes#54729

Closed
mihailotim-db wants to merge 2 commits intoapache:masterfrom
mihailotim-db:resolver-pr1-core
Closed

[SPARK-55983][SQL] New single-pass analyzer functionality and bugfixes#54729
mihailotim-db wants to merge 2 commits intoapache:masterfrom
mihailotim-db:resolver-pr1-core

Conversation

@mihailotim-db
Copy link
Copy Markdown
Contributor

@mihailotim-db mihailotim-db commented Mar 10, 2026

What changes were proposed in this pull request?

This PR implements core single-pass resolver infrastructure and bugfixes. The single-pass resolver is an alternative to the traditional fixed-point (iterative) analyzer that resolves SQL plans in a single bottom-up traversal.

Key changes:

New infrastructure:

  • OperatorResolutionContext and OperatorResolutionContextStack for tracking operator-level state
    during resolution (HAVING tracking, subquery aggregate push-down, grouping analytics)
  • NameResolutionParameters for bundling name resolution flags (LCA access, hidden output, view
    resolution, extract value keys, etc.)
  • ResolverGuardResult structured result type replacing boolean return from ResolverGuard
  • NonDeterministicExpressionCheck as a single-pass-only resolution check
  • RetainsOriginalJoinOutput trait for preserving join output when metadata columns change child
    projections
  • AliasKind enum for distinguishing alias types during resolution
  • TryExtractOrdinal utility for ordinal extraction from expressions

Core improvements:

  • Extended ExpressionResolutionContext with window expression tracking (nestedness level, window
    function/spec flags, parent context)
  • Enriched NameScope with variable resolution, extract value extraction keys, aggregate expression
    alias lookup, and hidden output improvements
  • Improved HavingResolver to handle aggregate expressions extracted from subqueries and window +
    HAVING interaction patterns
  • Extended ExpressionIdAssigner with subquery expression ID remapping and outer reference mapping
  • Expanded ResolverGuard to support lambda functions, star-with-target, regex columns, COALESCE with
    star, and multi-part TVF names
  • Moved plan rewrite rules (CleanupAliases, PullOutNondeterministic, PruneMetadataColumns) from
    ResolverRunner to Resolver.lookupMetadataAndResolve for correct per-view config handling
  • Improved HybridAnalyzer tentative mode with comprehensive fallback handling
  • Various improvements to aggregate resolution, sort resolution, join resolution, set operation
    resolution, and subquery expression resolution

Why are the changes needed?

To bring the Apache Spark single-pass analyzer closer to feature parity with the fixed-point analyzer implementation, enabling:

  1. Correct resolution of window expressions, HAVING clauses with windows, and complex aggregate
    patterns
  2. Proper operator-level context tracking during resolution
  3. Better name resolution with variable support, extract value keys, and hidden output handling
  4. Structured guard results for cleaner tentative mode fallback logic
  5. Foundation for future features like nested correlated subquery support

Does this PR introduce any user-facing change?

No. The single-pass analyzer is behind feature flags (spark.sql.analyzer.singlePassResolver.enabled
and spark.sql.analyzer.singlePassResolver.enabled.tentatively) and is not the default code path.

How was this patch tested?

  • Extended HybridAnalyzerSuite with testDualRun framework covering dual-run, tentative, and
    single-pass modes
  • New DataFrameAnalyzerTestGapsSuite for alias resolution edge cases (implicit aliases, nested alias
    collapsing, autogenerated alias names)
  • New SinglePassAnalyzerTestUtils test utility trait
  • Updated ResolverGuardSuite with structured result assertions and new feature tests
    (star-with-target, regex columns, COALESCE with star, multi-part TVF names)
  • Updated ExplicitlyUnsupportedResolverFeatureSuite with generator edge case tests
  • Updated ExpressionIdAssignerSuite for new attribute reference mapping semantics and subquery
    remapping
  • Updated NameScopeSuite with variable resolution and extract value key tests
  • Updated AggregateResolverSuite, AliasResolverSuite, MetadataResolverSuite, ViewResolverSuite

Was this patch authored or co-authored using generative AI tooling?

No.

@mihailotim-db mihailotim-db force-pushed the resolver-pr1-core branch 5 times, most recently from ce384ce to 391e340 Compare March 11, 2026 15:23
@mihailotim-db mihailotim-db changed the title [SPARK-XXXXX][SQL] single-pass resolver - core infrastruc… [SPARK-XXXXX][SQL] New single-pass analyzer functionality and bugfixes Mar 13, 2026
@mihailotim-db mihailotim-db changed the title [SPARK-XXXXX][SQL] New single-pass analyzer functionality and bugfixes [SPARK-55983][SQL] New single-pass analyzer functionality and bugfixes Mar 13, 2026
@mihailotim-db mihailotim-db force-pushed the resolver-pr1-core branch 5 times, most recently from 55db6d7 to 29e2af0 Compare March 16, 2026 15:16
Copy link
Copy Markdown
Contributor

@dtenedor dtenedor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reviewed all the changes. Everything generally looks correct and well-tested, just a few small comments.

@mihailotim-db mihailotim-db force-pushed the resolver-pr1-core branch 2 times, most recently from 35a04fe to 98a58a3 Compare March 17, 2026 11:07
Co-authored-by: Isaac
@dtenedor
Copy link
Copy Markdown
Contributor

LGTM, merging to master!

@dtenedor dtenedor closed this in d06d086 Mar 17, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants