Skip to content

[SPARK-51385][SQL] Normalize out projection added in DeduplicateRelations for union child output deduplication#50148

Closed
vladimirg-db wants to merge 1 commit intoapache:masterfrom
vladimirg-db:vladimir-golubev_data/normalize-artificial-project-in-union
Closed

[SPARK-51385][SQL] Normalize out projection added in DeduplicateRelations for union child output deduplication#50148
vladimirg-db wants to merge 1 commit intoapache:masterfrom
vladimirg-db:vladimir-golubev_data/normalize-artificial-project-in-union

Conversation

@vladimirg-db
Copy link
Copy Markdown
Contributor

@vladimirg-db vladimirg-db commented Mar 4, 2025

What changes were proposed in this pull request?

Strip away extra projection added by DeduplicateRelations when comparing logical plans.

DeduplicateRelations puts one extra Project on the right branch of Union when the outputs of children are conflicting. This is a hack for streaming relations. Unfortunately this logic is generalized an the extra projection is used for simple cases like views:

CREATE VIEW IF NOT EXISTS v1 AS SELECT * FROM VALUES (1, 2);

SELECT * FROM (
  SELECT col1, col2 FROM v1
  UNION ALL
  SELECT col1, col2 FROM v1
);

Single-pass Analyzer should not produce this projection, because it assigns expression IDs in single-pass, so we strip it in NormalizePlan to correctly compare the plans.

Why are the changes needed?

This is to make sure that single-pass and fixed-point Analyzed plans are the same.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Existing tests.

Was this patch authored or co-authored using generative AI tooling?

No.

@github-actions github-actions bot added the SQL label Mar 4, 2025
@cloud-fan
Copy link
Copy Markdown
Contributor

thanks, merging to master!

@cloud-fan cloud-fan closed this in c005a37 Mar 5, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants