Skip to content

Improve 54.0.0 upgrade guide for ExecutionPlan::apply_expressions#22415

Open
alamb wants to merge 3 commits into
apache:mainfrom
alamb:alamb/improve_upgrade_guide
Open

Improve 54.0.0 upgrade guide for ExecutionPlan::apply_expressions#22415
alamb wants to merge 3 commits into
apache:mainfrom
alamb:alamb/improve_upgrade_guide

Conversation

@alamb
Copy link
Copy Markdown
Contributor

@alamb alamb commented May 21, 2026

Which issue does this PR close?

Rationale for this change

As @milenkovicm noted after upgrading Ballista, the upgrade guide for the new required ExecutionPlan::apply_expressions method explains what the method does but not why downstream implementors need to implement it. Without that motivation, it's not obvious whether a no-op Ok(TreeNodeRecursion::Continue) is safe, or what the consequences of skipping expressions are.

What changes are included in this PR?

This PR tightens the section and adds a brief explanation of the purpose of the method (visiting every PhysicalExpr owned by the node so optimizer passes/analyzers — e.g. dynamic filter discovery — can see them).

Are these changes tested?

By CI

Are there any user-facing changes?

Yes — clarifies the upgrade guide for users implementing custom ExecutionPlan, FileSource, or DataSource traits. No API change.

@github-actions github-actions Bot added the documentation Improvements or additions to documentation label May 21, 2026
### `ExecutionPlan::apply_expressions` is now a required method

`apply_expressions` has been added as a **required** method on the `ExecutionPlan` trait (no default implementation). The same applies to the `FileSource` and `DataSource` traits. Any custom implementation of these traits must now implement `apply_expressions`.
`apply_expressions` is now **required** on the `ExecutionPlan`, `FileSource`,
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this context will help understand what is needed

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe its up to implementor to decide if it exposes expressions through this interfaces, or to put it differently, ExecutionPlan implementation can have expressions but not handle this call, I'm i correct ?

@alamb alamb marked this pull request as ready for review May 21, 2026 11:43
@alamb
Copy link
Copy Markdown
Contributor Author

alamb commented May 21, 2026

@LiaCastaneda could you double check the content of this PR?

Copy link
Copy Markdown
Contributor

@milenkovicm milenkovicm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @alamb just one comment for clarification

### `ExecutionPlan::apply_expressions` is now a required method

`apply_expressions` has been added as a **required** method on the `ExecutionPlan` trait (no default implementation). The same applies to the `FileSource` and `DataSource` traits. Any custom implementation of these traits must now implement `apply_expressions`.
`apply_expressions` is now **required** on the `ExecutionPlan`, `FileSource`,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe its up to implementor to decide if it exposes expressions through this interfaces, or to put it differently, ExecutionPlan implementation can have expressions but not handle this call, I'm i correct ?

`apply_expressions` has been added as a **required** method on the `ExecutionPlan` trait (no default implementation). The same applies to the `FileSource` and `DataSource` traits. Any custom implementation of these traits must now implement `apply_expressions`.
`apply_expressions` is now **required** on the `ExecutionPlan`, `FileSource`,
and `DataSource` traits. It visits every `PhysicalExpr` owned by the node so
callers can analyze or rewrite them (e.g. to discover dynamic filters).
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is accurate, I think at the end we didn't use it to serialize dynamic filters yet (bc we needed map_expressions as well? I haven't read the PR in detail yet) and it will also be used to make BufferExec and Dyn Filtering work together (by discovering dynamic filters below BufferExec) #21350

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be very helpful if we can explain to an end user when they should include their nodes expressions in apply_expressions (or alternately when it is safe to NOT include expressions)

I double checked the docs on apply_expressions and it implies to me (though does not explicitly say) that all the expressions in a node should be included.

/// Apply a closure `f` to each expression (non-recursively) in the current
/// physical plan node. This does not include expressions in any children.
///
/// The closure `f` is applied to expressions in the order they appear in the plan.
/// The closure can return `TreeNodeRecursion::Continue` to continue visiting,
/// `TreeNodeRecursion::Stop` to stop visiting immediately, or `TreeNodeRecursion::Jump`
/// to skip any remaining expressions (though typically all expressions are visited).
///
/// The expressions visited do not necessarily represent or even contribute
/// to the output schema of this node. For example, `FilterExec` visits the
/// filter predicate even though the output of a Filter has the same columns
/// as the input.

`apply_expressions` has been added as a **required** method on the `ExecutionPlan` trait (no default implementation). The same applies to the `FileSource` and `DataSource` traits. Any custom implementation of these traits must now implement `apply_expressions`.
`apply_expressions` is now **required** on the `ExecutionPlan`, `FileSource`,
and `DataSource` traits. It visits every `PhysicalExpr` owned by the node so
callers can analyze or rewrite them (e.g. to discover dynamic filters).
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add a small note that a no-op implementation can cause silent bugs? Something like: "Returning Continue without calling f is only safe when the node owns no PhysicalExprs -- otherwise dynamic filter discovery (and any expression-walking pass) will silently skip your node."

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So if I get this correctly, as we do not know which function is going to be used as a parameter, the safest approach, for implementors, is to pick one of three provided patterns, based on the number of expressions the plan has, as this function could be used to do more than one thing. Do i get this correctly ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes -- that is a good idea. I will add that

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if a function mutating internal expression could break operator encapsulation, but this is totally different discussion.

Thanks @alamb for adding this

Copy link
Copy Markdown
Contributor

@LiaCastaneda LiaCastaneda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for looking into this @alamb

@alamb
Copy link
Copy Markdown
Contributor Author

alamb commented May 21, 2026

I believe its up to implementor to decide if it exposes expressions through this interfaces, or to put it differently, ExecutionPlan implementation can have expressions but not handle this call, I'm i correct ?

I am not sure @milenkovicm -- maybe @LiaCastaneda or @adriangb (who approved the original PR) can help us add some more context about what is required to be returned and what the implications of not returning expressions are

@milenkovicm
Copy link
Copy Markdown
Contributor

I believe its up to implementor to decide if it exposes expressions through this interfaces, or to put it differently, ExecutionPlan implementation can have expressions but not handle this call, I'm i correct ?

I am not sure @milenkovicm -- maybe @LiaCastaneda or @adriangb (who approved the original PR) can help us add some more context about what is required to be returned and what the implications of not returning expressions are

This would be great to get clarified in the upgrade manual.

Ive updated ballista to 54 and left the method implementation with todo!() and no tests panicked.

@LiaCastaneda
Copy link
Copy Markdown
Contributor

LiaCastaneda commented May 21, 2026

Right now nothing in DF actually uses apply_expressions so it does not fail. The original intention was to use it for serializing dynamic filters, but #22011 ended up going with typed getters instead. I suspect it'll get used for making Dynamic Filters and BufferExec work in #21350, and there's also interest in a read/write pair of expressions on the plan (#20009).
So as of today, returning Ok(Continue) and skipping all expressions is actually safe. But I don't think implementing it partially (visiting some expressions but not others) is safe -- the API is intended to be generic (not only for dynamic filters), so even if your custom node doesn't have dynamic filters, you should still visit any PhysicalExprs it owns. Since any future caller can assume you've visited everything you own.

@milenkovicm
Copy link
Copy Markdown
Contributor

Right now nothing in DF actually uses apply_expressions so it does not fail. The original intention was to use it for serializing dynamic filters, but #22011 ended up going with typed getters instead. I suspect it'll get used for making Dynamic Filters and BufferExec work in #21350, and there's also interest in a read/write pair of expressions on the plan (#20009).

if so, would it be safer if you have added new trait which must be implemented by the operators which actually allow introspection of the expressions? currently we have a method which we cant really explain what it may or may not be used for, and we're forcing everybody to implement it.

If nothing is using it do we want to push everybody to implement it just now?

So as of today, returning Ok(Continue) and skipping all expressions is actually safe. But I don't think implementing it partially (visiting some expressions but not others) is safe since the API is intended to be generic (the intention is not only for dynamic filters) and any future caller can assume you've visited everything you own.

generic is never great in public API, talking from experience.

@alamb would you reconsider adding default implementation just for now until we're sure its going to be useful? we can always force users to implement it in the next release cycle once the design settles down (or remove it, with minor impact)

@LiaCastaneda
Copy link
Copy Markdown
Contributor

Also, I'm not against reverting and adding it back when one of these PRs get merged. When we landed #20337 we thought it was the right shape for serializing dynamic filters but we used a different approach.

@adriangb
Copy link
Copy Markdown
Contributor

We intend to use the method but the work has been blocked on other things. I understand why releasing a new public method that forces all implementers to make a code change with no immediate benefit is frustrating. @LiaCastaneda if we are not using this method yet what do you think or reverting / removing the API and re-adding it when we have the concrete use cases ready to merge?

FWIW this API mirrors LogicalPlan exactly:

impl LogicalPlan {
/// Calls `f` on all expressions in the current `LogicalPlan` node.
///
/// # Notes
/// * Similar to [`TreeNode::apply`] but for this node's expressions.
/// * Does not include expressions in input `LogicalPlan` nodes
/// * Visits only the top level expressions (Does not recurse into each expression)
pub fn apply_expressions<F: FnMut(&Expr) -> Result<TreeNodeRecursion>>(

It's a reasonable generic method to have, there are several use cases (that like I said above got blocked, e.g. #20009 and #21350) that wanted it. It's generic to the extent that we have generic concepts for expressions and plan nodes, we did not add extra genericness for some abstract notion.

@adriangb
Copy link
Copy Markdown
Contributor

@LiaCastaneda I don't think serializing dynamic filters in protobuf was ever the intention. The intention w.r.t. serializing dyanamic filters was being able to find all of the ones in a plan / plan subtree, hook into their update() method and brodcast updates across the network. That is something that is not possible without this API.

@LiaCastaneda
Copy link
Copy Markdown
Contributor

I think #20009 also needs the write version (map_expressions), unfortunately I haven't had time to get beck to it yet. We can revert and treat the apply_expressions PR as a dependency of whichever PR (#21350, #20899, etc.) actually needs it, so both ship in the same release.

@milenkovicm
Copy link
Copy Markdown
Contributor

would it make sense to add default implementation for it @adriangb for now, we can always make it mandatory to implement later once other things land, wdyt ?

@adriangb
Copy link
Copy Markdown
Contributor

adriangb commented May 21, 2026

If we made the default implementation an error then yes I think that's a viable alternative to reverting. The nice thing about reverting is that we avoid adding an unused public API to DF 54. So I think reverting would be better, but I appreciate the thought on how we can minimize the code churn 😄

@milenkovicm
Copy link
Copy Markdown
Contributor

would Ok(TreeNodeRecursion::Continue) as no-op default implementation be ok, rather than throw an error ?

@adriangb
Copy link
Copy Markdown
Contributor

I share the same concern as @alamb that it would not be safe

@LiaCastaneda
Copy link
Copy Markdown
Contributor

@LiaCastaneda I don't think serializing dynamic filters in protobuf was ever the intention. The intention w.r.t. serializing dyanamic filters was being able to find all of the ones in a plan / plan subtree, hook into their update() method and brodcast updates across the network. That is something that is not possible without this API.

Ah, I thought it was mainly for dynamic filter discovery in the serialization process (sorry, I'm a bit rusty on this). The runtime discovery + update & broadcast use case makes sense. I don't have a strong opinion about keeping it or not -- datafusion-distributed (or datafusion) will need it, but it's a different project and we haven't started the networking work yet (epic: datafusion-contrib/datafusion-distributed#180).

@milenkovicm
Copy link
Copy Markdown
Contributor

#22415 (comment)

Right now nothing in DF actually uses apply_expressions so it does not fail.

in version 54 is not used at all as @LiaCastaneda pointed out

So as of today, returning Ok(Continue) and skipping all expressions is actually safe.

and implementors implementing Ok(Continue) are ok ?

Its fine with me, i just want to understand whats the consequences of implementing Ok(Continue)

@adriangb
Copy link
Copy Markdown
Contributor

adriangb commented May 21, 2026

In theory we could add an optimizer rule or other use in DataFusion at any moment which would make Ok(Continue) go from okay to some sort of not okay, and it’s hard to predict if that would be an error or some sort of silent corruption / failure.

@milenkovicm
Copy link
Copy Markdown
Contributor

Thats main point i'm trying to get across, it's a bit undefined whats expected from this method. please dont get me wrong I'm not trying to make a case this is not useful.
I'm a bit confused how to handle df.54 upgrade, when it is safe to ignore it.

if Ok(Continue) is undeterministic, is it the right return type ?

I share the same concern as @alamb that it would not be safe

Can we mention in the upgrade guide in which cases this method is safe/unsafe to ignore ?

@alamb
Copy link
Copy Markdown
Contributor Author

alamb commented May 21, 2026

I'm a bit confused how to handle df.54 upgrade, when it is safe to ignore it.

I am also pretty confused.

if we tell people "It is ok to leave Ok()" in your implementation (or add a default implementation) then aren't we leaving a time bomb for them if we do start using apply_expressions in the future?

If we aren't using apply_expressions, I think we should remove it (and we can add it back in when we are actually going to use it). I can make a PR to do so

@adriangb
Copy link
Copy Markdown
Contributor

It's not that Ok(Continue) is undefined. It just means "this node contains no expressions". If the node contains expressions it is the wrong implementation. But of course if nothing is calling this implementation there will be no failures. If we did implement something that called this method what fails and how would depend on the implementation: if the implementation is calling apply_expressions and then downcasting to a specific expression type but the node had none of those expressions that won't cause an issue. But if it is e.g. looking for all Column references and the expression has columns but does not declare them via this API it's a ticking time bomb.

@adriangb
Copy link
Copy Markdown
Contributor

adriangb commented May 21, 2026

if we tell people "It is ok to leave Ok()" in your implementation (or add a default implementation) then aren't we leaving a time bomb for them if we do start using apply_expressions in the future?

yes agreed hence why the plan was never to tell people "it is ok to leave Ok()" and the implementation forces users to implement it. @milenkovicm suggested leaving Ok() (contradicting the current implementation and documentation) and I've been trying to help clarify why that would not cause issues today but would cause issues in the future as soon as this method starts being used, I did not mean to suggest all implementers should do that.

@adriangb
Copy link
Copy Markdown
Contributor

If we aren't using apply_expressions, I think we should remove it (and we can add it back in when we are actually going to use it). I can make a PR to do so

it sounds like this is the conclusion for now. if you open the pr ping me and I can review it.

@alamb
Copy link
Copy Markdown
Contributor Author

alamb commented May 21, 2026

✍️

@milenkovicm
Copy link
Copy Markdown
Contributor

yes agreed hence why the plan was never to tell people "it is ok to leave Ok()" and the implementation forces users to implement it.
@milenkovicm suggested leaving Ok() (contradicting the current implementation and documentation) and I've been trying to help clarify why that would not cause issues today but would cause issues in the future as soon as this method starts being used, I did not mean to suggest all implementers should do that.

point i was trying to make, if it is not used at the moment, having default implementation in the trait which returns whatever may be wrong but harmless. This way, the method remains available for experiments and users who need it can implement/override it themselves. It won’t force others to implement it just yet. Once it starts to be useful, (df 55, or later) default implementation can be removed forcing users to implement it, or it can quietly be removed without big impact if it proves to be not so useful

anyway, its your call

@adriangb
Copy link
Copy Markdown
Contributor

I think the point of "add a default implementation, even if unsafe so we can experiment and remove it once we plan to start using the method" is a really neat idea, I had not thought about it along those lines. It is a bit dangerous (someone could start using the method ignoring any comments or warnings) but pragmatically very useful. Let's keep it in mind when we add this back.

@alamb
Copy link
Copy Markdown
Contributor Author

alamb commented May 21, 2026

I think the point of "add a default implementation, even if unsafe so we can experiment and remove it once we plan to start using the method" is a really neat idea, I had not thought about it along those lines. It is a bit dangerous (someone could start using the method ignoring any comments or warnings) but pragmatically very useful. Let's keep it in mind when we add this back.

I agree -- thank you for the idea @milenkovicm

I still prefer we remove the unused code as i think it will avoid potential confusion (even if we added a default impl)

The revert PR is here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants