[SPARK-56570][SQL] `PlanMerger` correctness fix and code cleanup by peter-toth · Pull Request #55482 · apache/spark

peter-toth · 2026-04-22T18:00:33Z

What changes were proposed in this pull request?

Remove cachedPlanMapping / cpMapping from PlanMerger by preserving ExprIds when wrapping cached-plan expressions in mergeNamedExpressions.

Previously, wrapping a cached expression in Alias(If(filter, expr, null)) generated a fresh ExprId, making parent nodes that referenced the original attribute by ExprId stale. A cachedPlanMapping field was threaded through TryMergeResult and mergeNamedExpressions to remap those references at every level of tryMergePlans.

This PR eliminates the need for that mapping by preserving the original ExprId. Since the mapping is now always identity, TryMergeResult.cachedPlanMapping and the cachedPlanMapping parameter of mergeNamedExpressions are removed entirely, along with all cpMapping threading throughout tryMergePlans.

Additionally, this PR fixes a correctness issue: cached-expression wrapping loop is limited to cachedPlanExpressions.size entries to avoid accidentally wrapping newly-appended new-plan expressions.

Why are the changes needed?

cachedPlanMapping was purely mechanical bookkeeping to compensate for Alias(...)() generating fresh ExprIds. Preserving the original ExprId makes the mapping a no-op everywhere, so it can be removed. This simplifies TryMergeResult from 5 fields to 4, changes mergeNamedExpressions to return a pair instead of a triple, and removes cpMapping threading from all branches of tryMergePlans.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Existing MergeSubplansSuite unit tests.

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Sonnet 4.6

### What changes were proposed in this pull request? Remove `cachedPlanMapping` / `cpMapping` from `PlanMerger` by preserving ExprIds when wrapping cached-plan expressions in `mergeNamedExpressions`. Previously, wrapping a cached expression in `Alias(If(filter, expr, null))` generated a fresh `ExprId`, making parent nodes that referenced the original attribute by ExprId stale. A `cachedPlanMapping` field was threaded through `TryMergeResult` and `mergeNamedExpressions` to remap those references at every level of `tryMergePlans`. This PR eliminates the need for that mapping by passing `exprId = ce.toAttribute.exprId` to the wrapping `Alias`, preserving the original `ExprId`. Since the mapping is now always identity, `TryMergeResult.cachedPlanMapping` and the `cachedPlanMapping` parameter of `mergeNamedExpressions` are removed entirely, along with all `cpMapping` threading throughout `tryMergePlans`. Additional cleanups: - `mappedCPCondition` locals replaced with direct `cp.condition` references (no remapping needed after cpMapping removal). - `mappedCPGroupingExpression` assigned directly from `cp.groupingExpressions`. - Cached-expression wrapping loop limited to `cachedPlanExpressions.size` entries to avoid accidentally wrapping newly-appended new-plan expressions. - Cp-expression wrapping simplified to a single `case Alias(child, _) if !child.isInstanceOf[Attribute]` match. ### Why are the changes needed? `cachedPlanMapping` was purely mechanical bookkeeping to compensate for `Alias(...)()` generating fresh ExprIds. Preserving the original ExprId makes the mapping a no-op everywhere, so it can be removed. This simplifies `TryMergeResult` from 5 fields to 4, changes `mergeNamedExpressions` to return a pair instead of a triple, and removes cpMapping threading from all branches of `tryMergePlans`. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Existing `MergeSubplansSuite` unit tests. ### Was this patch authored or co-authored using generative AI tooling? Generated-by: Claude Sonnet 4.6

peter-toth · 2026-04-23T08:46:07Z

cc @dongjoon-hyun , this is a minor code simplification after #55298.

…inal cached range Fix `PlanMerger.mergeNamedExpressions` to wrap only the original cached expressions with the cached plan's filter. The loop previously iterated over all of `mergedExpressions`, including new-plan entries that were appended earlier in the same call and already wrapped with the new plan's filter; re-wrapping them with the cached plan's filter produced double-wrapped `If(cpFilter, If(npFilter, expr, null), null)` expressions, stale `newNPMapping` targets, and analysis failures (missing attribute). Also tighten the `(np: Filter, cp)` and `(np, cp: Filter)` cases in `tryMergePlans` to match only the structurally reachable child results (`cpFilter`/`npFilter` always `None` because the recursion keeps the non-Filter side unchanged), and drop the associated dead-code appends. Co-authored-by: Isaac

peter-toth · 2026-04-23T12:41:12Z

@cloud-fan, I've cherry picked the test from #55500 in 31621fe.

dongjoon-hyun

It would be great to fix this before Apache Spark 4.2.0. BTW, the PR title looks a little misleading because this is a bug fix as shown by the test case.

+1, LGTM, @peter-toth .

cloud-fan

LGTM overall — the ExprId-preservation idea is clean, and dropping cpMapping falls out of it consistently. Two small readability nits inline.

(Not re-raising the "title says cleanup but this is also a bug fix" point — @dongjoon-hyun already covered that in his approval.)

cloud-fan · 2026-04-23T16:26:39Z

+              val mappedCPGroupingExpression = cp.groupingExpressions
              // Order of grouping expression does matter as merging different grouping orders can
              // introduce "extra" shuffles/sorts that might not present in all of the original
              // subqueries.
              if (mappedNPGroupingExpression.map(_.canonicalized) ==
                  mappedCPGroupingExpression.map(_.canonicalized)) {
-                val (mergedAggregateExpressions, newNPMapping, newCPMapping) =
-                  mergeNamedExpressions(np.aggregateExpressions, cp.aggregateExpressions, npMapping,
-                    cpMapping)
+                val (mergedAggregateExpressions, newNPMapping) =
+                  mergeNamedExpressions(np.aggregateExpressions, cp.aggregateExpressions, npMapping)
                val mergedPlan =
                  Aggregate(mappedCPGroupingExpression, mergedAggregateExpressions, mergedChild)


After dropping cpMapping, the mapped prefix no longer reflects what this local holds — it's just cp.groupingExpressions. Same for the reference used to build the merged Aggregate.

Suggested change

val mappedCPGroupingExpression = cp.groupingExpressions

// Order of grouping expression does matter as merging different grouping orders can

// introduce "extra" shuffles/sorts that might not present in all of the original

// subqueries.

if (mappedNPGroupingExpression.map(_.canonicalized) ==

mappedCPGroupingExpression.map(_.canonicalized)) {

val (mergedAggregateExpressions, newNPMapping, newCPMapping) =

mergeNamedExpressions(np.aggregateExpressions, cp.aggregateExpressions, npMapping,

cpMapping)

val (mergedAggregateExpressions, newNPMapping) =

mergeNamedExpressions(np.aggregateExpressions, cp.aggregateExpressions, npMapping)

val mergedPlan =

Aggregate(mappedCPGroupingExpression, mergedAggregateExpressions, mergedChild)

val cpGroupingExpressions = cp.groupingExpressions

// Order of grouping expression does matter as merging different grouping orders can

// introduce "extra" shuffles/sorts that might not present in all of the original

// subqueries.

if (mappedNPGroupingExpression.map(_.canonicalized) ==

cpGroupingExpressions.map(_.canonicalized)) {

val (mergedAggregateExpressions, newNPMapping) =

mergeNamedExpressions(np.aggregateExpressions, cp.aggregateExpressions, npMapping)

val mergedPlan =

Aggregate(cpGroupingExpressions, mergedAggregateExpressions, mergedChild)

Thanks, fixed in 7918455.

cloud-fan · 2026-04-23T16:26:39Z

+            mergedExpressions(i) =
+              Alias(If(f, child, Literal(null, child.dataType)), ce.name)(
+                exprId = ce.toAttribute.exprId)


The exprId = ce.toAttribute.exprId argument is the linchpin of this whole refactor — it's what makes cachedPlanMapping redundant — but there's nothing in the surrounding code or comments that flags it. A one-liner like // Preserve the original ExprId so parent references to this cached attribute stay valid without a cp-side remapping. (The new-plan wrapping above uses a fresh ExprId because those aliases are appended rather than replacing an existing entry.) would make the invariant — and the cp/np asymmetry — discoverable for future readers.

Added in 7918455.

peter-toth · 2026-04-23T16:57:10Z

Thank you @dongjoon-hyun and @cloud-fan. Initially I thought this is just a cosmetic issue (double If wrapping), but adjusted the PR title and description to reflect the latest findings.

peter-toth · 2026-04-24T07:55:24Z

Thanks for the review again!

Merged to master (4.2.0).

…eanup followup ### What changes were proposed in this pull request? This is a follow-up to #55482 and contains four bug fixes and two small cleanups in `PlanMerger`: Bug fixes in `PlanMerger`: 1. Tagged `(Filter, Filter)` reuse preserves `mergedChild`'s appended columns: When the reuse check finds an existing `propagatedFilter` alias, the branch now rebuilds the Filter over `mergedChild` (via `cp.withNewChildren(Seq(mergedChild))`) instead of returning `cp` unchanged. If the recursion extended `cp.child`'s output with new columns (e.g. a computed `d = a + b` from a user Project below the Filter), returning `cp` would drop those columns while `npMapping` still pointed into them, leaving the enclosing `Aggregate` with unresolved references. 2. `(np: Filter, cp)` does not duplicate a `cpFilter` already present in `mergedChild`: `cpFilter`, when set, was produced by a deeper `(np, cp: Filter)` (or `(Join, Join)` pass-through) and is already part of `mergedChild`'s output. Appending it a second time via `++ cpFilter.toSeq` duplicated the attribute in the outer Project's projectList. 3. `(np, cp: Filter)` does not duplicate an `npFilter` already present in `mergedChild`: Symmetric to 2. on the `np` side. 4. `(np, cp: Filter)` with a `MERGED_FILTER_TAG`-tagged `cp` drops the tagged Filter: `cp`'s condition is `OR(pf_0, pf_1, ...)` and `cp`'s aggregate expressions already carry individual `FILTER (WHERE pf_i)` clauses. Synthesising a new `propagatedFilter_X = OR(pf_0, pf_1, ...)` would just add `FILTER AND(OR(...), pf_i)` wrapping upstream (simplifying to `FILTER pf_i`) plus a redundant alias in the Project. The branch now drops `cp`'s Filter and returns `cpFilter = None` so `cp`'s aggregates are left untouched. Cleanups in `PlanMerger.merge()`: - Unify the local variable name to `newMergedPlan` across all three branches (was `newMergedPlan` in one and `newMergePlan` in the other two) -- matches the `MergedPlan` case class name. - Replace `cache(i).merged` with `mp.merged`; `mp` and `cache(i)` are the same object inside the `collectFirst` pattern. ### Why are the changes needed? Fix 1. is a correctness bug. Fixes 2-4. are plan-shape bugs that produce duplicated attributes or redundant `OR`-of-propagated-filter aliases in the merged plan. The cleanups are minor readability improvements. ### Does this PR introduce _any_ user-facing change? No. All changes are internal to the optimizer; they produce cleaner merged plans for queries that `MergeSubplans` already handled. ### How was this patch tested? Four new tests in `MergeSubplansSuite`, one per fix: - `(np: Filter, cp)` does not duplicate a `cpFilter` already present in mergedChild -- exercises 2. via a `Join` with a `Filter` on the right child, routing a `cpFilter` up through `(Join, Join)` so that `mergedChild.output` already contains the attribute the branch used to re-append. - `(np, cp: Filter)` does not duplicate an `npFilter` already present in mergedChild -- exercises 3., mirror shape on the `np` side. - tagged `(Filter, Filter)` reuse must keep mergedChild's appended columns -- exercises 1. with three subqueries (sq1/sq2 create the tagged structure; sq3's Filter sits above a user Project introducing `d = a + b`, so the `(Filter, Filter)` tagged recursion extends `mergedChild` with `d`). - `(np, cp: Filter)` drops a tagged `cp` Filter without synthesising a redundant alias -- exercises 4. with three subqueries (sq1/sq2 create the tagged structure; sq3 has no filter). ### Was this patch authored or co-authored using generative AI tooling? Generated-by: Claude Opus 4.7 Closes #55659 from peter-toth/SPARK-56570-planmerger-code-cleanup-followup. Authored-by: Peter Toth <peter.toth@gmail.com> Signed-off-by: Peter Toth <peter.toth@gmail.com>

…eanup followup ### What changes were proposed in this pull request? This is a follow-up to #55482 and contains four bug fixes and two small cleanups in `PlanMerger`: Bug fixes in `PlanMerger`: 1. Tagged `(Filter, Filter)` reuse preserves `mergedChild`'s appended columns: When the reuse check finds an existing `propagatedFilter` alias, the branch now rebuilds the Filter over `mergedChild` (via `cp.withNewChildren(Seq(mergedChild))`) instead of returning `cp` unchanged. If the recursion extended `cp.child`'s output with new columns (e.g. a computed `d = a + b` from a user Project below the Filter), returning `cp` would drop those columns while `npMapping` still pointed into them, leaving the enclosing `Aggregate` with unresolved references. 2. `(np: Filter, cp)` does not duplicate a `cpFilter` already present in `mergedChild`: `cpFilter`, when set, was produced by a deeper `(np, cp: Filter)` (or `(Join, Join)` pass-through) and is already part of `mergedChild`'s output. Appending it a second time via `++ cpFilter.toSeq` duplicated the attribute in the outer Project's projectList. 3. `(np, cp: Filter)` does not duplicate an `npFilter` already present in `mergedChild`: Symmetric to 2. on the `np` side. 4. `(np, cp: Filter)` with a `MERGED_FILTER_TAG`-tagged `cp` drops the tagged Filter: `cp`'s condition is `OR(pf_0, pf_1, ...)` and `cp`'s aggregate expressions already carry individual `FILTER (WHERE pf_i)` clauses. Synthesising a new `propagatedFilter_X = OR(pf_0, pf_1, ...)` would just add `FILTER AND(OR(...), pf_i)` wrapping upstream (simplifying to `FILTER pf_i`) plus a redundant alias in the Project. The branch now drops `cp`'s Filter and returns `cpFilter = None` so `cp`'s aggregates are left untouched. Cleanups in `PlanMerger.merge()`: - Unify the local variable name to `newMergedPlan` across all three branches (was `newMergedPlan` in one and `newMergePlan` in the other two) -- matches the `MergedPlan` case class name. - Replace `cache(i).merged` with `mp.merged`; `mp` and `cache(i)` are the same object inside the `collectFirst` pattern. ### Why are the changes needed? Fix 1. is a correctness bug. Fixes 2-4. are plan-shape bugs that produce duplicated attributes or redundant `OR`-of-propagated-filter aliases in the merged plan. The cleanups are minor readability improvements. ### Does this PR introduce _any_ user-facing change? No. All changes are internal to the optimizer; they produce cleaner merged plans for queries that `MergeSubplans` already handled. ### How was this patch tested? Four new tests in `MergeSubplansSuite`, one per fix: - `(np: Filter, cp)` does not duplicate a `cpFilter` already present in mergedChild -- exercises 2. via a `Join` with a `Filter` on the right child, routing a `cpFilter` up through `(Join, Join)` so that `mergedChild.output` already contains the attribute the branch used to re-append. - `(np, cp: Filter)` does not duplicate an `npFilter` already present in mergedChild -- exercises 3., mirror shape on the `np` side. - tagged `(Filter, Filter)` reuse must keep mergedChild's appended columns -- exercises 1. with three subqueries (sq1/sq2 create the tagged structure; sq3's Filter sits above a user Project introducing `d = a + b`, so the `(Filter, Filter)` tagged recursion extends `mergedChild` with `d`). - `(np, cp: Filter)` drops a tagged `cp` Filter without synthesising a redundant alias -- exercises 4. with three subqueries (sq1/sq2 create the tagged structure; sq3 has no filter). ### Was this patch authored or co-authored using generative AI tooling? Generated-by: Claude Opus 4.7 Closes #55659 from peter-toth/SPARK-56570-planmerger-code-cleanup-followup. Authored-by: Peter Toth <peter.toth@gmail.com> Signed-off-by: Peter Toth <peter.toth@gmail.com> (cherry picked from commit 3ae7da7) Signed-off-by: Peter Toth <peter.toth@gmail.com>

peter-toth force-pushed the SPARK-56570-planMerger-code-cleanup branch from e0d7814 to 11d7f3a Compare April 22, 2026 18:44

peter-toth marked this pull request as ready for review April 23, 2026 08:44

peter-toth mentioned this pull request Apr 23, 2026

[SPARK-40193][SQL][FOLLOWUP] Restrict cached-side If wrapping to original cached range #55500

Closed

dongjoon-hyun approved these changes Apr 23, 2026

View reviewed changes

cloud-fan approved these changes Apr 23, 2026

View reviewed changes

address review comments

7918455

peter-toth changed the title ~~[SPARK-56570][SQL] PlanMerger code cleanup~~ [SPARK-56570][SQL] PlanMerger fix and code cleanup Apr 23, 2026

peter-toth changed the title ~~[SPARK-56570][SQL] PlanMerger fix and code cleanup~~ [SPARK-56570][SQL] PlanMerger correctness fix and code cleanup Apr 23, 2026

peter-toth closed this in 174fc60 Apr 24, 2026

peter-toth mentioned this pull request May 3, 2026

[SPARK-56570][SQL][FOLLOWUP] PlanMerger correctness fix and code cleanup followup #55659

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-56570][SQL] `PlanMerger` correctness fix and code cleanup#55482

[SPARK-56570][SQL] `PlanMerger` correctness fix and code cleanup#55482
peter-toth wants to merge 3 commits into
apache:masterfrom
peter-toth:SPARK-56570-planMerger-code-cleanup

peter-toth commented Apr 22, 2026 •

edited

Loading

Uh oh!

peter-toth commented Apr 23, 2026

Uh oh!

peter-toth commented Apr 23, 2026

Uh oh!

dongjoon-hyun left a comment •

edited

Loading

Uh oh!

cloud-fan left a comment

Uh oh!

cloud-fan Apr 23, 2026

Uh oh!

peter-toth Apr 23, 2026

Uh oh!

cloud-fan Apr 23, 2026

Uh oh!

peter-toth Apr 23, 2026

Uh oh!

peter-toth commented Apr 23, 2026

Uh oh!

peter-toth commented Apr 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

peter-toth commented Apr 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

peter-toth commented Apr 23, 2026

Uh oh!

peter-toth commented Apr 23, 2026

Uh oh!

dongjoon-hyun left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cloud-fan left a comment

Choose a reason for hiding this comment

Uh oh!

cloud-fan Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

peter-toth Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

cloud-fan Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

peter-toth Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

peter-toth commented Apr 23, 2026

Uh oh!

peter-toth commented Apr 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

peter-toth commented Apr 22, 2026 •

edited

Loading

dongjoon-hyun left a comment •

edited

Loading