feat: task_manager delegats physical plan creation to execution graph by milenkovicm · Pull Request #1726 · apache/datafusion-ballista

milenkovicm · 2026-05-18T18:17:12Z

Which issue does this PR close?

Closes #.

Rationale for this change

Currently, ballista schedule creates and optimizes physical plan before it is delegated to execution graph for execution. this makes plan change a bit more complicated than it needs to be.

Propagating logical plan to execution graph will give ability to it to transform logical plan (or non optimized physical plan) in any way it needs, simplifying planning rules.

There is a bit of refactoring of existing code, moving some of the code (EXPLAIN handling) to its own method.

What changes are included in this PR?

change method parameters across few methods around task manager and execution graph creation
move EXPLAIN related code to a function

Are there any user-facing changes?

yes if users are implementing execution graph interface

…ph implementation Currently, ballista schedule creates and optimizes physical plan before it is delegated to execution graph for execution. this makes plan change a bit more complicated than it needs to be. Propagating logical plan to execution graph will give ability to it to transform logical plan (or non optimized physical plan) in any way it needs, simplifying planning rules. There is a bit of refactoring of existing code, moving some of the code (`EXPLAIN` handling) to its own method.

martin-g · 2026-05-19T09:05:23Z

-                        .filter(|url| url.as_str().starts_with("file:///"))
-                        .collect();
-                    if !local_paths.is_empty() {
-                        // These are local files rather than remote object stores, so we


Is it intentional that this check for local files is no more made ?

yes, i don't think this rule get triggered at all

martin-g · 2026-05-19T09:12:24Z

-            // optimizing the plan here is redundant because the physical planner will do this again
-            // but it is helpful to see what the optimized plan will be
-            let optimized_plan = session_ctx.state().optimize(plan)?;
-            debug!("Optimized plan: {}", optimized_plan.display_indent());


It would be good to still log the optimized plan for better diagnostics.

plan has been optimized but not used later, so log does not have a great value

martin-g · 2026-05-19T09:20:44Z

-            if node.output_partitioning().partition_count() == 0 {
-                let empty: Arc<dyn ExecutionPlan> =
-                    Arc::new(EmptyExec::new(node.schema()));


Should this logic be preserved somewhere in the new implementation ?

there should be no nodes with output partitioning 0,

Try to reproduce a case for this, couldnt make it. I also think this is safe.

metegenez

Should this change also affect StaticExecutionGraph? @milenkovicm

milenkovicm · 2026-05-19T14:27:25Z

Should this change also affect StaticExecutionGraph? @milenkovicm

it could but i dont want to change it in this PR

metegenez

That make sense, so overall this make ExecutionGraph trait more responsible for getting optimal physical plans. I couldnt find anything that doesnt make sense here. LGTM.

milenkovicm · 2026-05-19T14:42:16Z

thanks @metegenez & @martin-g
i plan to keep this PR open until after ballista 53 release which i plan for later

milenkovicm requested a review from metegenez May 18, 2026 18:17

milenkovicm changed the title ~~feat: task_manager delegating physical plan creation to execution graph~~ feat: task_manager delegats physical plan creation to execution graph May 18, 2026

martin-g reviewed May 19, 2026

View reviewed changes

address review comments

2d0494d

metegenez reviewed May 19, 2026

View reviewed changes

metegenez approved these changes May 19, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: task_manager delegats physical plan creation to execution graph#1726

feat: task_manager delegats physical plan creation to execution graph#1726
milenkovicm wants to merge 2 commits into
apache:mainfrom
milenkovicm:feat_refactor_submit_job

milenkovicm commented May 18, 2026

Uh oh!

martin-g May 19, 2026

Uh oh!

milenkovicm May 19, 2026

Uh oh!

Uh oh!

martin-g May 19, 2026

Uh oh!

milenkovicm May 19, 2026

Uh oh!

martin-g May 19, 2026

Uh oh!

milenkovicm May 19, 2026

Uh oh!

metegenez May 19, 2026

Uh oh!

metegenez left a comment

Uh oh!

milenkovicm commented May 19, 2026

Uh oh!

metegenez left a comment

Uh oh!

milenkovicm commented May 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

milenkovicm commented May 18, 2026

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

Uh oh!

martin-g May 19, 2026

Choose a reason for hiding this comment

Uh oh!

milenkovicm May 19, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

martin-g May 19, 2026

Choose a reason for hiding this comment

Uh oh!

milenkovicm May 19, 2026

Choose a reason for hiding this comment

Uh oh!

martin-g May 19, 2026

Choose a reason for hiding this comment

Uh oh!

milenkovicm May 19, 2026

Choose a reason for hiding this comment

Uh oh!

metegenez May 19, 2026

Choose a reason for hiding this comment

Uh oh!

metegenez left a comment

Choose a reason for hiding this comment

Uh oh!

milenkovicm commented May 19, 2026

Uh oh!

metegenez left a comment

Choose a reason for hiding this comment

Uh oh!

milenkovicm commented May 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants