Skip to content

fix(physical-optimizer): make OutputRequirements idempotent#22522

Open
wirybeaver wants to merge 1 commit into
apache:mainfrom
wirybeaver:fix-output-requirements-idempotence
Open

fix(physical-optimizer): make OutputRequirements idempotent#22522
wirybeaver wants to merge 1 commit into
apache:mainfrom
wirybeaver:fix-output-requirements-idempotence

Conversation

@wirybeaver
Copy link
Copy Markdown

Which issue does this PR close?

Related to apache/datafusion-ballista#1359

Rationale

Ballista's Adaptive Query Execution (AQE) planner re-invokes DataFusion's full PhysicalOptimizer chain after every completed stage (AdaptivePlanner::replan_stages). Rules that are not idempotent (rule(rule(x)) != rule(x)) stack execution-plan nodes on each pass.

OutputRequirements::new_add_mode() wraps the plan root with OutputRequirementExec to preserve global ordering/distribution requirements. On a second pass the wrapper's maintains_input_order() == [true] and required_input_ordering() == [None] cause require_top_ordering_helper to recurse through it and produce a second wrapper, yielding OutputRequirementExec(OutputRequirementExec(...)). Each AQE replan adds another layer.

What changes are included in this PR?

  • Guard in require_top_ordering(): if the plan root is already an OutputRequirementExec, return it unchanged. This makes the rule idempotent with zero overhead for single-pass use.
  • Doc-comment update on new_add_mode() and require_top_ordering() documenting the idempotence guarantee.
  • Two tests in tests/physical_optimizer/output_requirements.rs:
    • add_mode_is_idempotent_on_bare_scan — bare ParquetExec (exercises is_changed = false path).
    • add_mode_is_idempotent_on_sorted_planSortExec → ParquetExec (exercises is_changed = true path).

Are these changes tested?

Yes. Two new tests run the rule twice on distinct fixtures and assert structural equality via get_plan_string. Both fail without the fix (double-wrapped OutputRequirementExec) and pass with it.

Are there any user-facing changes?

No. OutputRequirementExec is an internal ancillary node stripped before execution; the idempotence guard only affects re-optimization scenarios (AQE).

🤖 Generated with Claude Code

`OutputRequirements::new_add_mode()` was stacking a new
`OutputRequirementExec` wrapper at the top of the plan on every
invocation. `require_top_ordering_helper` descends through any operator
that maintains input order and has no hard ordering requirement, and
`OutputRequirementExec` itself qualifies — so on a second pass the
helper walked past the existing wrapper, found no SortExec/SPM beneath,
and `require_top_ordering` added a fresh wrapper above the old one.

Fix: at the start of `require_top_ordering`, return the plan unchanged
when it is already topped by an `OutputRequirementExec`. The existing
wrapper was either added by this rule's prior run (in which case it is
already correct) or inserted intentionally by the caller (in which case
we should not disturb it).

Motivation: adaptive execution in datafusion-ballista AQE
(apache/datafusion-ballista#1359) re-runs the entire `PhysicalOptimizer`
chain after every completed stage. Although the chain-level effect is
masked when `OutputRequirements::new_remove_mode()` later strips the
wrapper, the rule itself violates the idempotence property that
`PhysicalOptimizerRule`s are expected to satisfy. Surfaced by the
discovery harness added in a sibling PR.

Adds `datafusion/core/tests/physical_optimizer/output_requirements.rs`
with a focused test that invokes `new_add_mode` twice on a bare parquet
scan and asserts structural equality. Fails before this fix (two stacked
wrappers); passes after.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions github-actions Bot added optimizer Optimizer rules core Core DataFusion crate labels May 26, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Core DataFusion crate optimizer Optimizer rules

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant