Skip to content
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 5 additions & 3 deletions docs/source/library-user-guide/upgrading/54.0.0.md
Original file line number Diff line number Diff line change
Expand Up @@ -171,7 +171,9 @@ where string types are preferred (`UNION`, `CASE THEN/ELSE`, `NVL2`).

### `ExecutionPlan::apply_expressions` is now a required method

`apply_expressions` has been added as a **required** method on the `ExecutionPlan` trait (no default implementation). The same applies to the `FileSource` and `DataSource` traits. Any custom implementation of these traits must now implement `apply_expressions`.
`apply_expressions` is now **required** on the `ExecutionPlan`, `FileSource`,
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this context will help understand what is needed

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe its up to implementor to decide if it exposes expressions through this interfaces, or to put it differently, ExecutionPlan implementation can have expressions but not handle this call, I'm i correct ?

and `DataSource` traits. It visits every `PhysicalExpr` owned by the node so
callers can analyze or rewrite them (e.g. to discover dynamic filters).
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is accurate, I think at the end we didn't use it to serialize dynamic filters yet (bc we needed map_expressions as well? I haven't read the PR in detail yet) and it will also be used to make BufferExec and Dyn Filtering work together (by discovering dynamic filters below BufferExec) #21350

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be very helpful if we can explain to an end user when they should include their nodes expressions in apply_expressions (or alternately when it is safe to NOT include expressions)

I double checked the docs on apply_expressions and it implies to me (though does not explicitly say) that all the expressions in a node should be included.

/// Apply a closure `f` to each expression (non-recursively) in the current
/// physical plan node. This does not include expressions in any children.
///
/// The closure `f` is applied to expressions in the order they appear in the plan.
/// The closure can return `TreeNodeRecursion::Continue` to continue visiting,
/// `TreeNodeRecursion::Stop` to stop visiting immediately, or `TreeNodeRecursion::Jump`
/// to skip any remaining expressions (though typically all expressions are visited).
///
/// The expressions visited do not necessarily represent or even contribute
/// to the output schema of this node. For example, `FilterExec` visits the
/// filter predicate even though the output of a Filter has the same columns
/// as the input.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add a small note that a no-op implementation can cause silent bugs? Something like: "Returning Continue without calling f is only safe when the node owns no PhysicalExprs -- otherwise dynamic filter discovery (and any expression-walking pass) will silently skip your node."

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So if I get this correctly, as we do not know which function is going to be used as a parameter, the safest approach, for implementors, is to pick one of three provided patterns, based on the number of expressions the plan has, as this function could be used to do more than one thing. Do i get this correctly ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes -- that is a good idea. I will add that

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if a function mutating internal expression could break operator encapsulation, but this is totally different discussion.

Thanks @alamb for adding this


**Who is affected:**

Expand All @@ -180,7 +182,7 @@ where string types are preferred (`UNION`, `CASE THEN/ELSE`, `NVL2`).

**Migration guide:**

Add `apply_expressions` to your implementation. Call `f` on each top-level `PhysicalExpr` your node owns, using `visit_sibling` to correctly propagate `TreeNodeRecursion`:
Call `f` on each top-level `PhysicalExpr` your node owns, using `visit_sibling` to propagate `TreeNodeRecursion` when iterating:

**Node with no expressions:**

Expand Down Expand Up @@ -219,7 +221,7 @@ fn apply_expressions(
}
```

**Node whose only expressions are in `output_ordering()` (e.g. a synthetic test node with no owned expression fields):**
**Node whose expressions live in `output_ordering()`:**

```rust,ignore
fn apply_expressions(
Expand Down
Loading