Skip to content

Conversation

@geoffreyclaude
Copy link
Contributor

Which issue does this PR close?

This PR fixes an oversight in #16469 that broke wrapper nodes in recursive queries.

When with_new_state() was added as a generic state injection mechanism, reset_plan_states() kept a concrete type check for WorkTableExec. This means wrapper nodes that delegate as_any() to their inner node won't be recognized, causing them to keep stale state across iterations. This breaks recursive queries for external crates that wrap execution plans (like tracing or monitoring tools).

Rationale for this change

The reset_plan_states() function uses plan.as_any().is::<WorkTableExec>() to decide which nodes to skip resetting. This works fine for bare WorkTableExec but fails when it's wrapped by custom nodes as the type check can't see through the wrapper.

This defeats the whole point of with_new_state(), which was designed to let wrapper and third-party nodes participate in recursive queries without concrete type checks.

The special case was added to save one Arc::clone() per iteration, but it's not worth it since WorkTableExec::with_new_children() already just returns Arc::clone(&self) anyway. The shared WorkTable state is preserved through the Arc<WorkTable> reference, so the optimization doesn't buy us much while breaking wrapper nodes.

What changes are included in this PR?

  • Remove the WorkTableExec type check in reset_plan_states() - just reset all nodes uniformly via reset_state().
  • Simplify the function from 9 lines to 4 lines.
  • The shared WorkTable state stays correct because WorkTableExec::with_new_children() returns Arc::clone(&self).

Are these changes tested?

Yes, covered by existing tests:

  • All recursive query tests pass, so backward compatibility is maintained.
  • The behavior for bare WorkTableExec is identical.
  • This fixes wrapper nodes that were broken before.

Since this restores functionality for wrapper nodes rather than changing core DataFusion behavior, it fixes test failures in external crates (like datafusion-tracing) without needing new tests here.

Are there any user-facing changes?

No Breaking Changes:

  • Everything that worked before still works.
  • This is purely a bug fix.

Benefits:

  • External crates can now wrap WorkTableExec and implement with_new_state() without breaking recursive queries.
  • Tracing, monitoring, and instrumentation wrappers now work correctly with recursive queries.
  • Restores the extensibility that fix: make with_new_state a trait method for ExecutionPlan #16469 was designed to provide.

@github-actions github-actions bot added the physical-plan Changes to the physical-plan crate label Nov 18, 2025
Remove the concrete type check in `reset_plan_states` that prevented
`WorkTableExec` from being reset, enabling wrapper and custom nodes to
participate correctly in recursive query execution.

Motivation:
Commit 921f4a0 introduced `with_new_state()` as a generic mechanism for
injecting run-time state into execution plans, enabling any custom node
to participate in recursive query execution. However, `reset_plan_states()`
retained a concrete type check `plan.as_any().is::<WorkTableExec>()` that
only recognized unwrapped `WorkTableExec` instances. This prevented
wrapper or third-party nodes from being properly reset across iterations,
undermining the extensibility goals of commit 921f4a0.

Changes:
- Remove the special case for `WorkTableExec` in `reset_plan_states()`,
  resetting all nodes uniformly via `reset_state()`.
- Simplify the function from 9 lines to 4 lines, eliminating the type check
  and conditional logic.
- Preserve `WorkTable` state correctly through the `Arc<WorkTable>`
  reference established by `with_new_state()` during initialization.
- Rely on `WorkTableExec::with_new_children()` returning `Arc::clone(&self)`
  to avoid unnecessary recreation of the underlying plan.

This change restores full support for wrapper nodes in recursive query
execution, allowing external crates and custom execution plans to wrap
`WorkTableExec` without breaking state management. The removed special
case was a micro-optimization that saved one `Arc::clone()` per iteration
but compromised correctness for wrapper nodes.
@geoffreyclaude geoffreyclaude marked this pull request as ready for review November 18, 2025 13:23
Copy link
Contributor

@gabotechs gabotechs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 If tests pass LGTM

@gabotechs gabotechs added this pull request to the merge queue Nov 19, 2025
Merged via the queue into apache:main with commit 373b0ad Nov 19, 2025
32 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

physical-plan Changes to the physical-plan crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants