fix: InterleaveExec fallback to UnionExec when children partitioning diverges#21827
fix: InterleaveExec fallback to UnionExec when children partitioning diverges#21827zhuqi-lucas wants to merge 1 commit intoapache:mainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This PR prevents InterleaveExec::with_new_children from failing when optimizer rewrites change children’s output partitioning, by falling back to UnionExec instead of triggering an internal assertion failure. This keeps query execution correct in cases where plans evolve across multiple physical optimizer passes, at the cost of losing the interleave optimization.
Changes:
- Replace the
assert_or_internal_err!(can_interleave(..))inInterleaveExec::with_new_childrenwith a conditional fallback toUnionExec::try_new, and emit a warning when falling back. - Add unit tests covering both the fallback case (partitioning mismatch) and the preservation case (partitioning remains compatible).
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…diverges When optimizer rewrites (e.g. join_selection changing join modes, or additional EnforceDistribution passes) modify InterleaveExec children's plan structure, their output partitioning may no longer be consistent. Previously this caused an assertion panic. Now it gracefully falls back to UnionExec — correctness is preserved, only the interleave optimization is lost. Closes apache#21826
f57d747 to
16d1605
Compare
| self: Arc<Self>, | ||
| children: Vec<Arc<dyn ExecutionPlan>>, | ||
| ) -> Result<Arc<dyn ExecutionPlan>> { | ||
| // New children are no longer interleavable, which might be a bug of optimization rewrite. |
There was a problem hiding this comment.
As this comment says, I think another solution is to address it in the optimizer.
This might be the root cause. Once identified, we can decide the best option: either avoiding certain rewrites on InterleaveExec or falling back to UnionExec.
There was a problem hiding this comment.
As this comment says, I think another solution is to address it in the optimizer.
This might be the root cause. Once identified, we can decide the best option: either avoiding certain rewrites on InterleaveExec or falling back to UnionExec.
Good point @jonahgao . In our case the root cause is our custom optimizer chain that runs EnforceDistribution twice — the second pass modifies children that were already organized into an InterleaveExec by the first pass. This is specific to our setup, not a bug in the default DF optimizer chain.
However, the assertion in with_new_children is still fragile — any transform_up based optimizer that changes a child's output partitioning will trigger a panic. The fallback to UnionExec is a defensive fix that preserves correctness regardless of which optimizer causes the divergence. The existing comment in the code already acknowledges this possibility: "New children are no longer interleavable, which might be a bug of optimization rewrite."
Which issue does this PR close?
Closes #21826
Rationale for this change
InterleaveExec::with_new_childrenpanics with an assertion error when children's output partitioning diverges after optimizer rewrites. This happens in practice when multiple physical optimizer passes modify the plan between whenInterleaveExecis created and whenwith_new_childrenis called.We hit this in production with a complex UNION ALL query (5 subqueries with JOINs across partitioned tables, single-file tables, and materialized views). Our custom optimizer chain runs
EnforceDistributiontwice — the first pass createsInterleaveExec, later optimizers modify children's partitioning, then the second pass triggers the assertion panic.What changes are included in this PR?
InterleaveExec::with_new_childrennow gracefully falls back toUnionExecwhen children are no longer interleavable, instead of panicking. A warning is logged when the fallback occurs.Correctness is preserved —
EnforceDistributionwill addRepartitionExecas needed. Only the interleave optimization is lost.Are these changes tested?
Yes, two unit tests added:
test_interleave_fallback_to_union_on_partitioning_mismatch— verifies fallback to UnionExec when one child's partitioning changes from Hash to RoundRobintest_interleave_preserved_when_partitioning_matches— verifies InterleaveExec is preserved when children's Hash partitioning remains consistent