-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Improve 54.0.0 upgrade guide for ExecutionPlan::apply_expressions #22415
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
@@ -171,7 +171,9 @@ where string types are preferred (`UNION`, `CASE THEN/ELSE`, `NVL2`). | |||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||
| ### `ExecutionPlan::apply_expressions` is now a required method | ||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||
| `apply_expressions` has been added as a **required** method on the `ExecutionPlan` trait (no default implementation). The same applies to the `FileSource` and `DataSource` traits. Any custom implementation of these traits must now implement `apply_expressions`. | ||||||||||||||||||||||||||||
| `apply_expressions` is now **required** on the `ExecutionPlan`, `FileSource`, | ||||||||||||||||||||||||||||
| and `DataSource` traits. It visits every `PhysicalExpr` owned by the node so | ||||||||||||||||||||||||||||
| callers can analyze or rewrite them (e.g. to discover dynamic filters). | ||||||||||||||||||||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is accurate, I think at the end we didn't use it to serialize dynamic filters yet (bc we needed map_expressions as well? I haven't read the PR in detail yet) and it will also be used to make BufferExec and Dyn Filtering work together (by discovering dynamic filters below
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think it would be very helpful if we can explain to an end user when they should include their nodes expressions in I double checked the docs on apply_expressions and it implies to me (though does not explicitly say) that all the expressions in a node should be included. datafusion/datafusion/physical-plan/src/execution_plan.rs Lines 205 to 217 in dd8760d
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Should we add a small note that a no-op implementation can cause silent bugs? Something like: "Returning
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So if I get this correctly, as we do not know which function is going to be used as a parameter, the safest approach, for implementors, is to pick one of three provided patterns, based on the number of expressions the plan has, as this function could be used to do more than one thing. Do i get this correctly ?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes -- that is a good idea. I will add that
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I wonder if a function mutating internal expression could break operator encapsulation, but this is totally different discussion. Thanks @alamb for adding this |
||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||
| **Who is affected:** | ||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||
|
|
@@ -180,7 +182,7 @@ where string types are preferred (`UNION`, `CASE THEN/ELSE`, `NVL2`). | |||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||
| **Migration guide:** | ||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||
| Add `apply_expressions` to your implementation. Call `f` on each top-level `PhysicalExpr` your node owns, using `visit_sibling` to correctly propagate `TreeNodeRecursion`: | ||||||||||||||||||||||||||||
| Call `f` on each top-level `PhysicalExpr` your node owns, using `visit_sibling` to propagate `TreeNodeRecursion` when iterating: | ||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||
| **Node with no expressions:** | ||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||
|
|
@@ -219,7 +221,7 @@ fn apply_expressions( | |||||||||||||||||||||||||||
| } | ||||||||||||||||||||||||||||
| ``` | ||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||
| **Node whose only expressions are in `output_ordering()` (e.g. a synthetic test node with no owned expression fields):** | ||||||||||||||||||||||||||||
| **Node whose expressions live in `output_ordering()`:** | ||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||
| ```rust,ignore | ||||||||||||||||||||||||||||
| fn apply_expressions( | ||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this context will help understand what is needed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe its up to implementor to decide if it exposes expressions through this interfaces, or to put it differently,
ExecutionPlanimplementation can have expressions but not handle this call, I'm i correct ?