[SPARK-55983][SQL] New single-pass analyzer functionality and bugfixes#54729
Closed
mihailotim-db wants to merge 2 commits intoapache:masterfrom
Closed
[SPARK-55983][SQL] New single-pass analyzer functionality and bugfixes#54729mihailotim-db wants to merge 2 commits intoapache:masterfrom
mihailotim-db wants to merge 2 commits intoapache:masterfrom
Conversation
ce384ce to
391e340
Compare
55db6d7 to
29e2af0
Compare
dtenedor
reviewed
Mar 16, 2026
Contributor
dtenedor
left a comment
There was a problem hiding this comment.
I reviewed all the changes. Everything generally looks correct and well-tested, just a few small comments.
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/resolver/AliasKind.scala
Outdated
Show resolved
Hide resolved
...st/src/main/scala/org/apache/spark/sql/catalyst/analysis/resolver/ExpressionIdAssigner.scala
Outdated
Show resolved
Hide resolved
...rc/main/scala/org/apache/spark/sql/catalyst/analysis/resolver/NameResolutionParameters.scala
Outdated
Show resolved
Hide resolved
.../scala/org/apache/spark/sql/catalyst/analysis/resolver/NonDeterministicExpressionCheck.scala
Outdated
Show resolved
Hide resolved
35a04fe to
98a58a3
Compare
98a58a3 to
595fa9d
Compare
dtenedor
approved these changes
Mar 17, 2026
Contributor
|
LGTM, merging to master! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
This PR implements core single-pass resolver infrastructure and bugfixes. The single-pass resolver is an alternative to the traditional fixed-point (iterative) analyzer that resolves SQL plans in a single bottom-up traversal.
Key changes:
New infrastructure:
OperatorResolutionContextandOperatorResolutionContextStackfor tracking operator-level stateduring resolution (HAVING tracking, subquery aggregate push-down, grouping analytics)
NameResolutionParametersfor bundling name resolution flags (LCA access, hidden output, viewresolution, extract value keys, etc.)
ResolverGuardResultstructured result type replacing boolean return fromResolverGuardNonDeterministicExpressionCheckas a single-pass-only resolution checkRetainsOriginalJoinOutputtrait for preserving join output when metadata columns change childprojections
AliasKindenum for distinguishing alias types during resolutionTryExtractOrdinalutility for ordinal extraction from expressionsCore improvements:
ExpressionResolutionContextwith window expression tracking (nestedness level, windowfunction/spec flags, parent context)
NameScopewith variable resolution, extract value extraction keys, aggregate expressionalias lookup, and hidden output improvements
HavingResolverto handle aggregate expressions extracted from subqueries and window +HAVING interaction patterns
ExpressionIdAssignerwith subquery expression ID remapping and outer reference mappingResolverGuardto support lambda functions, star-with-target, regex columns, COALESCE withstar, and multi-part TVF names
CleanupAliases,PullOutNondeterministic,PruneMetadataColumns) fromResolverRunnertoResolver.lookupMetadataAndResolvefor correct per-view config handlingHybridAnalyzertentative mode with comprehensive fallback handlingresolution, and subquery expression resolution
Why are the changes needed?
To bring the Apache Spark single-pass analyzer closer to feature parity with the fixed-point analyzer implementation, enabling:
patterns
Does this PR introduce any user-facing change?
No. The single-pass analyzer is behind feature flags (
spark.sql.analyzer.singlePassResolver.enabledand
spark.sql.analyzer.singlePassResolver.enabled.tentatively) and is not the default code path.How was this patch tested?
HybridAnalyzerSuitewithtestDualRunframework covering dual-run, tentative, andsingle-pass modes
DataFrameAnalyzerTestGapsSuitefor alias resolution edge cases (implicit aliases, nested aliascollapsing, autogenerated alias names)
SinglePassAnalyzerTestUtilstest utility traitResolverGuardSuitewith structured result assertions and new feature tests(star-with-target, regex columns, COALESCE with star, multi-part TVF names)
ExplicitlyUnsupportedResolverFeatureSuitewith generator edge case testsExpressionIdAssignerSuitefor new attribute reference mapping semantics and subqueryremapping
NameScopeSuitewith variable resolution and extract value key testsAggregateResolverSuite,AliasResolverSuite,MetadataResolverSuite,ViewResolverSuiteWas this patch authored or co-authored using generative AI tooling?
No.