Skip to content

BufferExec and AnalyzeExec should report eager evaluation #22708

@geoffreyclaude

Description

@geoffreyclaude

Describe the bug

EvaluationType::Eager is documented as an operator stream that eagerly generates RecordBatch values in one or more spawned Tokio tasks. BufferExec and AnalyzeExec both appear to match that behavior, but their PlanProperties do not report eager evaluation.

In 5c92390:

Similarly:

  • AnalyzeExec::compute_properties(...) calls PlanProperties::new(...), leaving evaluation_type at the default Lazy.
  • AnalyzeExec::execute(...) uses RecordBatchReceiverStream::builder(...) and calls builder.run_input(...) for each input partition.
  • The comments in AnalyzeExec::execute(...) describe those futures as running input partitions in parallel on separate Tokio tasks.

This makes EvaluationType less reliable for physical-plan analysis. DataFusion already exposes need_data_exchange(plan), implemented as:

plan.properties().evaluation_type == EvaluationType::Eager

so stale or incomplete EvaluationType metadata can cause eager child-polling operators to be missed.

To Reproduce

Inspect the physical properties of these operators:

  1. Construct a BufferExec over a child whose evaluation_type is Lazy.
  2. Check buffer_exec.properties().evaluation_type.
  3. Construct an AnalyzeExec with a normal input plan.
  4. Check analyze_exec.properties().evaluation_type.

Both report EvaluationType::Lazy in these cases, even though their execute(...) paths drive input polling from spawned tasks.

Expected behavior

If the documented contract for EvaluationType::Eager is intended to identify operators that drive child stream polling from spawned Tokio tasks, then BufferExec and AnalyzeExec should set PlanProperties::with_evaluation_type(EvaluationType::Eager).

BufferExec should probably always report eager evaluation because its buffering stream creates a background task to poll the input stream.

AnalyzeExec should probably report eager evaluation because it runs input partitions through RecordBatchReceiverStream::builder(...).run_input(...).

If this is not the intended meaning of EvaluationType::Eager, then the docs for EvaluationType and/or need_data_exchange(...) should be clarified so callers know which eager child-polling operators are intentionally excluded.

Additional context

This does not appear to be a query-result correctness issue. It is a physical-plan metadata consistency issue for optimizers and integrations that use DataFusion metadata to reason about execution topology.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions