Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for != predicate in pruning predicates #420

Closed
alamb opened this issue May 24, 2021 · 3 comments · Fixed by #544
Closed

Support for != predicate in pruning predicates #420

alamb opened this issue May 24, 2021 · 3 comments · Fixed by #544
Assignees
Labels
enhancement New feature or request

Comments

@alamb
Copy link
Contributor

alamb commented May 24, 2021

Is your feature request related to a problem or challenge? Please describe what you are trying to do.

While evaluating queries against data stored in containers / multiple files, it is helpful to prune entire files using statistics (see #363 for more details). DataFusion has this logic already for ==, <, <=, > and >= operators in the pruning predicate.

However, as @NGA-TRAN noticed, there is no support for != at the moment.

https://github.com/apache/arrow-datafusion/blob/14f1eebef068a9e65f556ed74d2b6d98376c97f4/datafusion/src/physical_plan/parquet.rs#L683

Describe the solution you'd like
Add support and tests for != in predicate pruning logic

@alamb alamb added the enhancement New feature or request label May 24, 2021
jgoday added a commit to jgoday/arrow-datafusion that referenced this issue Jun 10, 2021
@jgoday
Copy link
Contributor

jgoday commented Jun 10, 2021

@alamb Can I try to solve this issue ?

If I understand it correctly, for the non equal predicate the expression should be pruned if the literal value does not fall between the min and max values, Am I right ?
(I have already tried to implement it here (jgoday@3bb55a4), I can make a PR if you approve it)

@alamb
Copy link
Contributor Author

alamb commented Jun 10, 2021

Thanks @jgoday ! That would be great

For != I think we would prune the container (aka return false or NULL) from the predicate if the constant value did fall within the min/max bounds but I might be misunderstanding what you are saying; I'll love to check out the PR

@jgoday
Copy link
Contributor

jgoday commented Jun 11, 2021

@alamb I think that I express myself incorrectly :)
Created #544 PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants