-
Notifications
You must be signed in to change notification settings - Fork 28.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-13869][SQL] Remove redundant conditions while combining filters #11670
Conversation
cc @yhuai |
Test build #52973 has finished for PR 11670 at commit
|
e946c1e
to
f9aba06
Compare
Test build #53092 has finished for PR 11670 at commit
|
LGTM How did you decide to do this? Are there other rules that are generating things like this? |
LGTM |
Can you be more specific? When does this generate a redundant one? Let's make sure we don't continuously generate them and collapse them. |
@sameeragarwal I plan to submit a PR for avoiding pushing redundant predicates through the operators, since it is blocking another PR: #11714. Are you doing this? |
I actually found this issue while playing around with some tests. @gatorsmile can you give an example for a case where we generate redundant filters? The code does seem to check the constraints of its children (which are derived from their condition) before generating additional filters. |
I thought you are doing this PR for avoiding to push down the redundant inferred/isnotnull conditions. It sounds like my understanding is wrong. As shown in the #11714, the test case failure is caused by this. That is what I plan to fix. |
oh I see, a union -- that was a tricky one! The issue with union is that its constraints are an "intersection" of the constraints of its individual children (which means that our filter inference might push redundant filters to one of more of its child nodes). To fix this, we should make sure that we check the constraints of the children before pushing filters down. Should I make a fix? |
I think we need a more general fix for all the predicate push down. I will submit a PR for a general issue instead of fixing |
yes, I agree. |
## What changes were proposed in this pull request? **[I'll link it to the JIRA once ASF JIRA is back online]** This PR modifies the existing `CombineFilters` rule to remove redundant conditions while combining individual filter predicates. For instance, queries of the form `table.where('a === 1 && 'b === 1).where('a === 1 && 'c === 1)` will now be optimized to ` table.where('a === 1 && 'b === 1 && 'c === 1)` (instead of ` table.where('a === 1 && 'a === 1 && 'b === 1 && 'c === 1)`) ## How was this patch tested? Unit test in `FilterPushdownSuite` Author: Sameer Agarwal <sameer@databricks.com> Closes apache#11670 from sameeragarwal/combine-filters.
What changes were proposed in this pull request?
[I'll link it to the JIRA once ASF JIRA is back online]
This PR modifies the existing
CombineFilters
rule to remove redundant conditions while combining individual filter predicates. For instance, queries of the formtable.where('a === 1 && 'b === 1).where('a === 1 && 'c === 1)
will now be optimized totable.where('a === 1 && 'b === 1 && 'c === 1)
(instead oftable.where('a === 1 && 'a === 1 && 'b === 1 && 'c === 1)
)How was this patch tested?
Unit test in
FilterPushdownSuite