-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Add constant expression evaluator to physical expression simplifier #19130
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| .await?; | ||
| // There will be data: the filter is (null) is not null or a = 24. | ||
| // Statistics pruning doesn't handle `null is not null` so it resolves to `true or a = 24` -> `true` so no row groups are pruned | ||
| // There should be zero batches |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're now able to prune this 😄
|
@alamb I wonder if there could be any way to express that an expression is "constant" in terms of volatility? We currently have Looking at the logical expression const evaluator it seems like it does manual matching: datafusion/datafusion/optimizer/src/simplify_expressions/expr_simplifier.rs Lines 637 to 682 in deaccb7
It looks like we have a datafusion/datafusion/expr-common/src/signature.rs Lines 53 to 85 in c479dee
But it does not have a "constant" option (nor would that really make sense for a function).
(Generally because it has no input but I guess you could have an expression that takes an input but returns a constant / ignores the input?) Which makes me wonder if it wouldn't make sense to an enum along the lines of: pub enum ExprVolatility {
Constant,
Immutable,
Stable,
Volatile,
}And augment impl Expr {
pub fn node_volatility(&self) -> ExprVolatility {
match self { ... }
}
pub fn volatility(&self) -> ExprVolatility {
self.apply(...) // recursive
}
}pub trait PhysicalExpr {
fn node_volatility(&self) -> ExprVolatility {
ExprVolatility::Volatilve
}
fn volatility(&self) -> ExprVolatility {
self.apply(...) // recursive
}
}This seems like it would be required for The main alternative I see to this is to define "input" as "columns/data" (as opposed to e.g. |
alamb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense to me -- thank you @adriangb
| .await?; | ||
| // There should be zero batches | ||
| assert_eq!(batches.len(), 0); | ||
| // On the other hand `b is null and a = 2` should prune only the second row group with stats only pruning |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is pruned because now b is null gets folded to null is null AND a = 2 --> a = 2?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep!
## Which issue does this PR close? <!-- We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes #123` indicates that this PR will close issue #123. --> - Closes #. ## Rationale for this change `cargo fmt` is failing on main. Example https://github.com/apache/datafusion/actions/runs/20027361509/job/57427769604 I believe it is a logical conflict between: - #19091 - #19130 ## What changes are included in this PR? Ran `cargo fmt` and committed the result ## Are these changes tested? <!-- We typically require tests for all PRs in order to: 1. Prevent the code from being accidentally broken by subsequent changes 2. Serve as another way to document the expected behavior of the code If tests are not included in your PR, please explain why (for example, are they covered by existing tests)? --> ## Are there any user-facing changes? <!-- If there are user-facing changes then we may require documentation to be updated before approving the PR. --> <!-- If there are any breaking changes to public APIs, please add the `api change` label. -->
This improves handling of constant expressions during pruning by trying to evaluate them in the simplifier and the pruning machinery. This is somewhat redundant with #19129 in the simple case of our Parquet implementation but since there may be edge cases where one is hit and not the other, or where users are using them independently I thought it best to implement both approaches.