-
Notifications
You must be signed in to change notification settings - Fork 1.8k
#723 limit pruning rule to simple expression #764
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| column_expr: &Expr, | ||
| op: Operator, | ||
| scalar_expr: &Expr, | ||
| ) -> Result<(Expr, Operator, Expr)> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alamb I'm still trying to accomplish my plan at this place. But the real-world situation is more complicated than I thought, I'm afraid I cannot make all of my plans working as expected.
for now, the expression I support:
- bool column
- ! bool column
col op literal-col op literal(transform tocol reverse(op) -literal)- the logical and/or expression compound with 2 and 3
before further work, I plan to make some e2e test cases to ensure such simple pruning rules work correctly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @lvheyang -- given the current implementation gives incorrect results, I suggest we take a conservative here. A wise mentor of mine once told me "I can make it go as fast as you want if you don't constrain me to be correct"
Thus, what I recommend is to update this PR so that it works for expr types we know work (e.g. just the ones you list above is more than adequate).
Then we can expand the list of supported Expr types as future PRs.
If you try and special case as many Expr types as possible in this PR I think it is going to be quite challenging.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alamb you're right. This pr would not add more pruning rules. I will make sure there are enough test cases to cover situations we met for now. Thank you ~
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for working on this @lvheyang
6050046 to
dd8a0db
Compare
|
I've committed some e2e test cases in parquet_pruing.rs: for int32 column:
for float64 column
|
alamb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice work @lvheyang -- the end to end tests are also very complete. Thank you so much.
For anyone else reading this, I found the "ignore whitespace" diff easier to review: https://github.com/apache/arrow-datafusion/pull/764/files?w=1 easier to review
FYI @yordan-pavlov and @Dandandan
| println!("{}", output.description()); | ||
| // This should prune out groups without error | ||
| assert_eq!(output.predicate_evaluation_errors(), Some(0)); | ||
| assert_eq!(output.row_groups_pruned(), Some(3)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 I forgot initially that -4..1 doesn't actually include 1 (it is -4, -3, -2, -1, 0 and thus should be pruned) 👍
| .await; | ||
|
|
||
| println!("{}", output.description()); | ||
| // This should prune out groups with error, because there is not col to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, and in the future we can add some more sophistication to the predicate builder to handle these types of monotonic functions (e.g. +1)
| } | ||
| }; | ||
|
|
||
| let (column_expr, correct_operator, scalar_expr) = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a nice change and improves readability
|
nice work indeed @lvheyang , thank you so much for making these changes; you've done a much better job to validate which predicate expressions are compatible with parquet predicate push-down compared to the naiive implementation in the PR where I introduced it; hopefully this can soon be extended for even more expressions. |
|
Thanks for your kind help @yordan-pavlov @alamb , I will try to extend more expressions in future works. 😄 |
Which issue does this PR close?
Closes #723.
Rationale for this change
see in issue #723 step 2
What changes are included in this PR?
Are there any user-facing changes?