[SPARK-56703][SQL] Avoid redundant propagatedFilter aliases in PlanMerger#55654
Draft
peter-toth wants to merge 1 commit intoapache:masterfrom
Draft
[SPARK-56703][SQL] Avoid redundant propagatedFilter aliases in PlanMerger#55654peter-toth wants to merge 1 commit intoapache:masterfrom
propagatedFilter aliases in PlanMerger#55654peter-toth wants to merge 1 commit intoapache:masterfrom
Conversation
…rger ### What changes were proposed in this pull request? When `PlanMerger` merges N non-grouping subplans where the first has no filter and the 2nd and 3rd share the same filter condition, the merged child `Project` already contains an alias for that condition after the 1st+2nd merge round. The 3rd merge should reuse that alias instead of creating a redundant one. Two fixes are applied. **Fix 1 — symmetric reuse check in `(np: Filter, cp)`.** The `(np: Filter, cp: Filter)` case already had a reuse check: when the cp filter carries `MERGED_FILTER_TAG`, it looks for an existing alias in the child `Project` and reuses it instead of creating a new one. The `(np: Filter, cp)` case now gets the same check, making the two cases symmetric. **Fix 2 — reorder match cases so Filter cases precede Project-peeling cases.** For the reuse check in fix 1 to work, the merged child must still be a `Project` at the point the check runs. When the cached plan's child is itself a `Project` (as it is after the first merge round), the generic `(np, cp: Project)` case was firing first and peeling that Project layer, causing the recursion to see a `LocalRelation` with no aliased conditions. The fix reorders the match so that all Filter cases precede the generic Project-peeling cases. The `(np: Filter, cp: Filter)` case is kept before `(np: Filter, cp)` to prevent `(Filter, Filter)` pairs from being handled by the asymmetric propagation path. The `(np: Project, cp: Project)` case is also moved into the Project group for clarity. ### Why are the changes needed? Without this fix, merging three non-grouping subplans where the 2nd and 3rd carry the same filter condition produces two redundant `propagatedFilter` aliases with identical expressions, resulting in an unnecessarily larger merged plan. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Added `SPARK-56703: Merge three non-grouping subqueries where the third has the same filter condition as the second` to `MergeSubplansSuite`. ### Was this patch authored or co-authored using generative AI tooling? Generated-by: Claude Sonnet 4.6
peter-toth
commented
May 2, 2026
| mergedChild.output.toList ++ Seq(newNPFilterAlias) ++ cpFilter.toSeq, | ||
| mergedChild) | ||
| TryMergeResult(project, npMapping, Some((newNPFilter, true)), cpFilter) | ||
| // If newNPCondition is already aliased in the child Project (e.g. a subsequent |
Contributor
Author
There was a problem hiding this comment.
This is the new logic symmetrical to (np: Filter, cp: Filter) case.
peter-toth
commented
May 2, 2026
| TryMergeResult(project, npMapping, npFilter, Some(newCPFilter)) | ||
| } | ||
|
|
||
| case (np: Project, cp: Project) => |
Contributor
Author
There was a problem hiding this comment.
This is just a code move as Filter cases should come before Projects.
Contributor
Author
|
I will rebase this after #55659. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
When
PlanMergermerges N non-grouping subplans where the first has no filter and the 2nd and 3rd share the same filter condition, the merged childProjectalready contains an alias for that condition after the 1st+2nd merge round. The 3rd merge should reuse that alias instead of creating a redundant one. Two fixes are applied.Fix 1 — symmetric reuse check in
(np: Filter, cp). The(np: Filter, cp: Filter)case already had a reuse check: when the cp filter carriesMERGED_FILTER_TAG, it looks for an existing alias in the childProjectand reuses it instead of creating a new one. The(np: Filter, cp)case now gets the same check, making the two cases symmetric.Fix 2 — reorder match cases so Filter cases precede Project-peeling cases. For the reuse check in fix 1 to work, the merged child must still be a
Projectat the point the check runs. When the cached plan's child is itself aProject(as it is after the first merge round), the generic(np, cp: Project)case was firing first and peeling that Project layer, causing the recursion to see aLocalRelationwith no aliased conditions. The fix reorders the match so that all Filter cases precede the generic Project-peeling cases. The(np: Filter, cp: Filter)case is kept before(np: Filter, cp)to prevent(Filter, Filter)pairs from being handled by the asymmetric propagation path. The(np: Project, cp: Project)case is also moved into the Project group for clarity.Why are the changes needed?
Without this fix, merging three non-grouping subplans where the 2nd and 3rd carry the same filter condition produces two redundant
propagatedFilteraliases with identical expressions, resulting in an unnecessarily larger merged plan.Does this PR introduce any user-facing change?
No.
How was this patch tested?
Added
SPARK-56703: Merge three non-grouping subqueries where the third has the same filter condition as the secondtoMergeSubplansSuite.Was this patch authored or co-authored using generative AI tooling?
Generated-by: Claude Sonnet 4.6