Skip to content

[SPARK-56703][SQL] Avoid redundant propagatedFilter aliases in PlanMerger#55654

Draft
peter-toth wants to merge 1 commit intoapache:masterfrom
peter-toth:SPARK-56703-avoid-unnecessary-propagated-filters
Draft

[SPARK-56703][SQL] Avoid redundant propagatedFilter aliases in PlanMerger#55654
peter-toth wants to merge 1 commit intoapache:masterfrom
peter-toth:SPARK-56703-avoid-unnecessary-propagated-filters

Conversation

@peter-toth
Copy link
Copy Markdown
Contributor

What changes were proposed in this pull request?

When PlanMerger merges N non-grouping subplans where the first has no filter and the 2nd and 3rd share the same filter condition, the merged child Project already contains an alias for that condition after the 1st+2nd merge round. The 3rd merge should reuse that alias instead of creating a redundant one. Two fixes are applied.

Fix 1 — symmetric reuse check in (np: Filter, cp). The (np: Filter, cp: Filter) case already had a reuse check: when the cp filter carries MERGED_FILTER_TAG, it looks for an existing alias in the child Project and reuses it instead of creating a new one. The (np: Filter, cp) case now gets the same check, making the two cases symmetric.

Fix 2 — reorder match cases so Filter cases precede Project-peeling cases. For the reuse check in fix 1 to work, the merged child must still be a Project at the point the check runs. When the cached plan's child is itself a Project (as it is after the first merge round), the generic (np, cp: Project) case was firing first and peeling that Project layer, causing the recursion to see a LocalRelation with no aliased conditions. The fix reorders the match so that all Filter cases precede the generic Project-peeling cases. The (np: Filter, cp: Filter) case is kept before (np: Filter, cp) to prevent (Filter, Filter) pairs from being handled by the asymmetric propagation path. The (np: Project, cp: Project) case is also moved into the Project group for clarity.

Why are the changes needed?

Without this fix, merging three non-grouping subplans where the 2nd and 3rd carry the same filter condition produces two redundant propagatedFilter aliases with identical expressions, resulting in an unnecessarily larger merged plan.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Added SPARK-56703: Merge three non-grouping subqueries where the third has the same filter condition as the second to MergeSubplansSuite.

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Sonnet 4.6

…rger

### What changes were proposed in this pull request?

When `PlanMerger` merges N non-grouping subplans where the first has no filter and the 2nd and 3rd share the same filter condition, the merged child `Project` already contains an alias for that condition after the 1st+2nd merge round. The 3rd merge should reuse that alias instead of creating a redundant one. Two fixes are applied.

**Fix 1 — symmetric reuse check in `(np: Filter, cp)`.** The `(np: Filter, cp: Filter)` case already had a reuse check: when the cp filter carries `MERGED_FILTER_TAG`, it looks for an existing alias in the child `Project` and reuses it instead of creating a new one. The `(np: Filter, cp)` case now gets the same check, making the two cases symmetric.

**Fix 2 — reorder match cases so Filter cases precede Project-peeling cases.** For the reuse check in fix 1 to work, the merged child must still be a `Project` at the point the check runs. When the cached plan's child is itself a `Project` (as it is after the first merge round), the generic `(np, cp: Project)` case was firing first and peeling that Project layer, causing the recursion to see a `LocalRelation` with no aliased conditions. The fix reorders the match so that all Filter cases precede the generic Project-peeling cases. The `(np: Filter, cp: Filter)` case is kept before `(np: Filter, cp)` to prevent `(Filter, Filter)` pairs from being handled by the asymmetric propagation path. The `(np: Project, cp: Project)` case is also moved into the Project group for clarity.

### Why are the changes needed?

Without this fix, merging three non-grouping subplans where the 2nd and 3rd carry the same filter condition produces two redundant `propagatedFilter` aliases with identical expressions, resulting in an unnecessarily larger merged plan.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Added `SPARK-56703: Merge three non-grouping subqueries where the third has the same filter condition as the second` to `MergeSubplansSuite`.

### Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Sonnet 4.6
mergedChild.output.toList ++ Seq(newNPFilterAlias) ++ cpFilter.toSeq,
mergedChild)
TryMergeResult(project, npMapping, Some((newNPFilter, true)), cpFilter)
// If newNPCondition is already aliased in the child Project (e.g. a subsequent
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the new logic symmetrical to (np: Filter, cp: Filter) case.

TryMergeResult(project, npMapping, npFilter, Some(newCPFilter))
}

case (np: Project, cp: Project) =>
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just a code move as Filter cases should come before Projects.

@peter-toth peter-toth marked this pull request as draft May 2, 2026 13:21
@peter-toth
Copy link
Copy Markdown
Contributor Author

I will rebase this after #55659.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant