-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Add support for boolean columns in pruning logic #500
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| self.scalar_expr | ||
| } | ||
|
|
||
| fn is_stat_column_missing(&self, statistics_type: StatisticsType) -> bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This logic was just refactored into RequiredStatColumns so I could reuse it
| use crate::logical_plan; | ||
| let field = schema.field_with_name(column_name).ok()?; | ||
|
|
||
| if matches!(field.data_type(), &DataType::Boolean) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is the actual logic / rules
| let result = p.prune(&statistics).unwrap_err(); | ||
| assert!( | ||
| result.to_string().contains( | ||
| "Data type Boolean not supported for scalar operation on dyn array" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these aren't great messages, but they are what happens on master today, and I figured I would document them for posterity (and maybe inspire people to help fix them)
| Expr::BinaryExpr { left, op, right } => (left, *op, right), | ||
| Expr::Column(name) => { | ||
| if let Some(expr) = | ||
| build_single_column_expr(&name, schema, required_columns, false) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This kind of pattern probably can be written a bit shorter with some combinators. Something like:
build_single_column_expr(&name, schema, required_columns).ok().or(Ok(unhandled))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An excellent idea -- I will do so
Codecov Report
@@ Coverage Diff @@
## master #500 +/- ##
==========================================
+ Coverage 75.92% 76.02% +0.09%
==========================================
Files 154 154
Lines 26195 26421 +226
==========================================
+ Hits 19889 20087 +198
- Misses 6306 6334 +28
Continue to review full report at Codecov.
|
Dandandan
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice feature + great refactoring
|
I rebased this PR and added a few more tests. The code is unchanged |
jorgecarleitao
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great to me. 👍 Very well documented also 💯
* change proto msg * QueryPlanSerde with eval mode * Move eval mode * Add abs in planner * CometAbsFunc wrapper * Add error management * Add tests * Add license * spotless apply * format * Fix clippy * error msg for all spark versions * Fix benches * Use enum to ansi mode * Fix format * Add more tests * Format * Refactor * refactor * fix merge * fix merge
Closes #490
This PR adds support for pruning of boolean predicates such as
flag_col, andnot flag_colso that they can be used to prune row groups from parquet files and other predicatesIt does not add code to handle
flag_col = trueandflag_col != false(which currently error and continue to do so) as those are simplified in the ConstantEvaluation pass.This ended up being a larger change than I wanted because the logic to create
col_minandcol_maxreferences was intertwined inPruningExpressionBuilderRationale for this change
See #490
What changes are included in this PR?
Major changes:
stat_column_reqinto a newRequiredStatColumnsstructStatisticsColumnsAre there any user-facing changes?
Additional predicates can be used to prune