Describe the bug
EvaluationType::Eager is documented as an operator stream that eagerly generates RecordBatch values in one or more spawned Tokio tasks. BufferExec and AnalyzeExec both appear to match that behavior, but their PlanProperties do not report eager evaluation.
In 5c92390:
Similarly:
AnalyzeExec::compute_properties(...) calls PlanProperties::new(...), leaving evaluation_type at the default Lazy.
AnalyzeExec::execute(...) uses RecordBatchReceiverStream::builder(...) and calls builder.run_input(...) for each input partition.
- The comments in
AnalyzeExec::execute(...) describe those futures as running input partitions in parallel on separate Tokio tasks.
This makes EvaluationType less reliable for physical-plan analysis. DataFusion already exposes need_data_exchange(plan), implemented as:
plan.properties().evaluation_type == EvaluationType::Eager
so stale or incomplete EvaluationType metadata can cause eager child-polling operators to be missed.
To Reproduce
Inspect the physical properties of these operators:
- Construct a
BufferExec over a child whose evaluation_type is Lazy.
- Check
buffer_exec.properties().evaluation_type.
- Construct an
AnalyzeExec with a normal input plan.
- Check
analyze_exec.properties().evaluation_type.
Both report EvaluationType::Lazy in these cases, even though their execute(...) paths drive input polling from spawned tasks.
Expected behavior
If the documented contract for EvaluationType::Eager is intended to identify operators that drive child stream polling from spawned Tokio tasks, then BufferExec and AnalyzeExec should set PlanProperties::with_evaluation_type(EvaluationType::Eager).
BufferExec should probably always report eager evaluation because its buffering stream creates a background task to poll the input stream.
AnalyzeExec should probably report eager evaluation because it runs input partitions through RecordBatchReceiverStream::builder(...).run_input(...).
If this is not the intended meaning of EvaluationType::Eager, then the docs for EvaluationType and/or need_data_exchange(...) should be clarified so callers know which eager child-polling operators are intentionally excluded.
Additional context
This does not appear to be a query-result correctness issue. It is a physical-plan metadata consistency issue for optimizers and integrations that use DataFusion metadata to reason about execution topology.
Describe the bug
EvaluationType::Eageris documented as an operator stream that eagerly generatesRecordBatchvalues in one or more spawned Tokio tasks.BufferExecandAnalyzeExecboth appear to match that behavior, but theirPlanPropertiesdo not report eager evaluation.In 5c92390:
BufferExec::new(...)clones the input properties and only changesSchedulingTypetoCooperative, so it keeps the input evaluation type.BufferExec::execute(...)wraps the input stream inMemoryBufferedStream::new(...).MemoryBufferedStream::new(...)immediately creates aSpawnedTaskthat polls the input stream into an internal queue.Similarly:
AnalyzeExec::compute_properties(...)callsPlanProperties::new(...), leavingevaluation_typeat the defaultLazy.AnalyzeExec::execute(...)usesRecordBatchReceiverStream::builder(...)and callsbuilder.run_input(...)for each input partition.AnalyzeExec::execute(...)describe those futures as running input partitions in parallel on separate Tokio tasks.This makes
EvaluationTypeless reliable for physical-plan analysis. DataFusion already exposesneed_data_exchange(plan), implemented as:so stale or incomplete
EvaluationTypemetadata can cause eager child-polling operators to be missed.To Reproduce
Inspect the physical properties of these operators:
BufferExecover a child whoseevaluation_typeisLazy.buffer_exec.properties().evaluation_type.AnalyzeExecwith a normal input plan.analyze_exec.properties().evaluation_type.Both report
EvaluationType::Lazyin these cases, even though theirexecute(...)paths drive input polling from spawned tasks.Expected behavior
If the documented contract for
EvaluationType::Eageris intended to identify operators that drive child stream polling from spawned Tokio tasks, thenBufferExecandAnalyzeExecshould setPlanProperties::with_evaluation_type(EvaluationType::Eager).BufferExecshould probably always report eager evaluation because its buffering stream creates a background task to poll the input stream.AnalyzeExecshould probably report eager evaluation because it runs input partitions throughRecordBatchReceiverStream::builder(...).run_input(...).If this is not the intended meaning of
EvaluationType::Eager, then the docs forEvaluationTypeand/orneed_data_exchange(...)should be clarified so callers know which eager child-polling operators are intentionally excluded.Additional context
This does not appear to be a query-result correctness issue. It is a physical-plan metadata consistency issue for optimizers and integrations that use DataFusion metadata to reason about execution topology.