Skip to content

[SPARK-56596][SQL] Enable dual runs for single-pass analyzer#55447

Open
mihailotim-db wants to merge 1 commit intoapache:masterfrom
mihailotim-db:enable_single_pass_analyzer
Open

[SPARK-56596][SQL] Enable dual runs for single-pass analyzer#55447
mihailotim-db wants to merge 1 commit intoapache:masterfrom
mihailotim-db:enable_single_pass_analyzer

Conversation

@mihailotim-db
Copy link
Copy Markdown
Contributor

@mihailotim-db mihailotim-db commented Apr 21, 2026

What changes were proposed in this pull request?

This PR enables single-pass analyzer dual runs in testing mode and adds the infrastructure to control dual-run behavior.

Background: How dual runs work

The HybridAnalyzer sits between the SQL parser and the rest of the query planning pipeline. When dual-run mode is enabled, every query goes through
the following flow:

  1. The ResolverGuard inspects the unresolved logical plan to check whether all operators and expressions are supported by the single-pass resolver.
    If any unsupported feature is detected, the query skips dual-run entirely and uses the legacy fixed-point analyzer only.
  2. If the guard passes, the fixed-point analyzer runs first and produces a resolved plan. During this run, relation metadata (catalog lookups, temp
    view resolutions) is captured in the AnalyzerBridgeState.
  3. The single-pass resolver then runs on the same unresolved plan. It uses the BridgedRelationMetadataProvider to reuse the relation metadata from
    step 2, avoiding duplicate catalog RPCs.
  4. The results are compared:
    • If both succeed, their output schemas and normalized logical plans are compared. A mismatch indicates a bug in the single-pass resolver.
    • If the single-pass fails but the fixed-point succeeds, the behavior depends on ANALYZER_LOG_ERRORS_INSTEAD_OF_THROWING_IN_DUAL_RUNS: in tests
      it throws (to catch regressions), in production it logs and returns the fixed-point result.
    • If the single-pass throws ExplicitlyUnsupportedResolverFeature, the fixed-point result is returned silently (this is expected for features not
      yet ported to the resolver).
    • The fixed-point result is always returned as the final plan (ANALYZER_DUAL_RUN_RETURN_SINGLE_PASS_RESULT defaults to false).

Why are the changes needed?

Dual runs are the primary mechanism for validating single-pass analyzer correctness against the legacy fixed-point analyzer. Enabling them by default
in tests ensures that any regressions in single-pass resolution are caught immediately during development, rather than discovered later in production.

Does this PR introduce any user-facing change?

No. All changes are internal and only affect test-time behavior. The ANALYZER_LOG_ERRORS_INSTEAD_OF_THROWING_IN_DUAL_RUNS config defaults to true
outside of tests, so production behavior is unchanged.

How was this patch tested?

Existing tests (SQLQueryTestSuite and others). The dual-run mode validates single-pass results against fixed-point results on every query executed
during tests.

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Code (claude-opus-4-6)

@dongjoon-hyun dongjoon-hyun marked this pull request as draft April 21, 2026 18:48
Copy link
Copy Markdown
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please create a JIRA issue and use it in the PR title before converting this back to a normal PR.

@mihailotim-db mihailotim-db force-pushed the enable_single_pass_analyzer branch from e5f82c7 to 978050b Compare April 23, 2026 11:37
@mihailotim-db mihailotim-db changed the title [SQL] Enable dual runs for single-pass analyzer [SPARK-56596][SQL] Enable dual runs for single-pass analyzer Apr 23, 2026
@mihailotim-db mihailotim-db force-pushed the enable_single_pass_analyzer branch from 978050b to 9c815d6 Compare April 23, 2026 12:19
@mihailotim-db mihailotim-db marked this pull request as ready for review April 23, 2026 12:32
@mihailotim-db mihailotim-db force-pushed the enable_single_pass_analyzer branch 2 times, most recently from 0c779a7 to af467f2 Compare April 23, 2026 13:21
@mihailotim-db mihailotim-db force-pushed the enable_single_pass_analyzer branch from af467f2 to 39cc5a9 Compare April 29, 2026 06:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants