Skip to content

Conversation

@lvheyang
Copy link
Contributor

Which issue does this PR close?

Closes #723.

Rationale for this change

see in issue #723 step 2

What changes are included in this PR?

Are there any user-facing changes?

column_expr: &Expr,
op: Operator,
scalar_expr: &Expr,
) -> Result<(Expr, Operator, Expr)> {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alamb I'm still trying to accomplish my plan at this place. But the real-world situation is more complicated than I thought, I'm afraid I cannot make all of my plans working as expected.

for now, the expression I support:

  1. bool column
  2. ! bool column
  3. col op literal
  4. -col op literal (transform to col reverse(op) -literal)
  5. the logical and/or expression compound with 2 and 3

before further work, I plan to make some e2e test cases to ensure such simple pruning rules work correctly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @lvheyang -- given the current implementation gives incorrect results, I suggest we take a conservative here. A wise mentor of mine once told me "I can make it go as fast as you want if you don't constrain me to be correct"

Thus, what I recommend is to update this PR so that it works for expr types we know work (e.g. just the ones you list above is more than adequate).

Then we can expand the list of supported Expr types as future PRs.

If you try and special case as many Expr types as possible in this PR I think it is going to be quite challenging.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alamb you're right. This pr would not add more pruning rules. I will make sure there are enough test cases to cover situations we met for now. Thank you ~

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for working on this @lvheyang

@lvheyang lvheyang marked this pull request as draft July 21, 2021 15:34
@lvheyang lvheyang force-pushed the simple_pruning branch 3 times, most recently from 6050046 to dd8a0db Compare July 22, 2021 15:54
@lvheyang lvheyang marked this pull request as ready for review July 22, 2021 16:22
@lvheyang
Copy link
Contributor Author

lvheyang commented Jul 23, 2021

I've committed some e2e test cases in parquet_pruing.rs:

for int32 column:

  1. SELECT * FROM t where i < 1
  2. SELECT * FROM t where -i > -1
  3. SELECT * FROM t where i = 1
  4. SELECT * FROM t where abs(i) = 1 and i = 1 : use i=1 as prune predicate
  5. SELECT * FROM t where abs(i) = 1 : not supported
  6. SELECT * FROM t where i+1 = 1 : not supported
  7. SELECT * FROM t where 1-i > 1: not supported

for float64 column

  1. SELECT * FROM t where f < 1
  2. SELECT * FROM t where -f > -1
  3. SELECT * FROM t where abs(f - 1) <= 0.000001 and f >= 0.1 : use f>=0.1 as prune predicate
  4. SELECT * FROM t where abs(f-1) <= 0.000001 : not supported
  5. SELECT * FROM t where f+1 > 1.1 : not supported
  6. SELECT * FROM t where 1-f > 1 : not supported

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice work @lvheyang -- the end to end tests are also very complete. Thank you so much.

For anyone else reading this, I found the "ignore whitespace" diff easier to review: https://github.com/apache/arrow-datafusion/pull/764/files?w=1 easier to review

FYI @yordan-pavlov and @Dandandan

println!("{}", output.description());
// This should prune out groups without error
assert_eq!(output.predicate_evaluation_errors(), Some(0));
assert_eq!(output.row_groups_pruned(), Some(3));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 I forgot initially that -4..1 doesn't actually include 1 (it is -4, -3, -2, -1, 0 and thus should be pruned) 👍

.await;

println!("{}", output.description());
// This should prune out groups with error, because there is not col to
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, and in the future we can add some more sophistication to the predicate builder to handle these types of monotonic functions (e.g. +1)

}
};

let (column_expr, correct_operator, scalar_expr) =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a nice change and improves readability

@alamb alamb merged commit 31d16d2 into apache:master Jul 23, 2021
@yordan-pavlov
Copy link
Contributor

nice work indeed @lvheyang , thank you so much for making these changes; you've done a much better job to validate which predicate expressions are compatible with parquet predicate push-down compared to the naiive implementation in the PR where I introduced it; hopefully this can soon be extended for even more expressions.

@lvheyang
Copy link
Contributor Author

Thanks for your kind help @yordan-pavlov @alamb , I will try to extend more expressions in future works. 😄

@houqp houqp added the bug Something isn't working label Jul 30, 2021
unkloud pushed a commit to unkloud/datafusion that referenced this pull request Mar 23, 2025
H0TB0X420 pushed a commit to H0TB0X420/datafusion that referenced this pull request Oct 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ABS() function in WHERE clause gives unexpected results

4 participants