Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-13869][SQL] Remove redundant conditions while combining filters #11670

Closed
wants to merge 2 commits into from

Conversation

sameeragarwal
Copy link
Member

What changes were proposed in this pull request?

[I'll link it to the JIRA once ASF JIRA is back online]

This PR modifies the existing CombineFilters rule to remove redundant conditions while combining individual filter predicates. For instance, queries of the form table.where('a === 1 && 'b === 1).where('a === 1 && 'c === 1) will now be optimized to table.where('a === 1 && 'b === 1 && 'c === 1) (instead of table.where('a === 1 && 'a === 1 && 'b === 1 && 'c === 1))

How was this patch tested?

Unit test in FilterPushdownSuite

@sameeragarwal
Copy link
Member Author

cc @yhuai

@SparkQA
Copy link

SparkQA commented Mar 12, 2016

Test build #52973 has finished for PR 11670 at commit e946c1e.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@sameeragarwal sameeragarwal changed the title [SPARK-XXXX][SQL] Remove redundant conditions while combining filters [SPARK-13869][SQL] Remove redundant conditions while combining filters Mar 14, 2016
@SparkQA
Copy link

SparkQA commented Mar 14, 2016

Test build #53092 has finished for PR 11670 at commit f9aba06.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@nongli
Copy link
Contributor

nongli commented Mar 15, 2016

LGTM

How did you decide to do this? Are there other rules that are generating things like this?

@gatorsmile
Copy link
Member

LGTM

@nongli
Copy link
Contributor

nongli commented Mar 15, 2016

Can you be more specific? When does this generate a redundant one? Let's make sure we don't continuously generate them and collapse them.

@gatorsmile
Copy link
Member

@sameeragarwal I plan to submit a PR for avoiding pushing redundant predicates through the operators, since it is blocking another PR: #11714. Are you doing this?

@sameeragarwal
Copy link
Member Author

I actually found this issue while playing around with some tests. @gatorsmile can you give an example for a case where we generate redundant filters? The code does seem to check the constraints of its children (which are derived from their condition) before generating additional filters.

@gatorsmile
Copy link
Member

I thought you are doing this PR for avoiding to push down the redundant inferred/isnotnull conditions. It sounds like my understanding is wrong. As shown in the #11714, the test case failure is caused by this. That is what I plan to fix.

@sameeragarwal
Copy link
Member Author

oh I see, a union -- that was a tricky one! The issue with union is that its constraints are an "intersection" of the constraints of its individual children (which means that our filter inference might push redundant filters to one of more of its child nodes). To fix this, we should make sure that we check the constraints of the children before pushing filters down. Should I make a fix?

@gatorsmile
Copy link
Member

I think we need a more general fix for all the predicate push down. I will submit a PR for a general issue instead of fixing Union only.

@sameeragarwal
Copy link
Member Author

yes, I agree.

@asfgit asfgit closed this in 77ba302 Mar 16, 2016
roygao94 pushed a commit to roygao94/spark that referenced this pull request Mar 22, 2016
## What changes were proposed in this pull request?

**[I'll link it to the JIRA once ASF JIRA is back online]**

This PR modifies the existing `CombineFilters` rule to remove redundant conditions while combining individual filter predicates. For instance, queries of the form `table.where('a === 1 && 'b === 1).where('a === 1 && 'c === 1)` will now be optimized to ` table.where('a === 1 && 'b === 1 && 'c === 1)` (instead of ` table.where('a === 1 && 'a === 1 && 'b === 1 && 'c === 1)`)

## How was this patch tested?

Unit test in `FilterPushdownSuite`

Author: Sameer Agarwal <sameer@databricks.com>

Closes apache#11670 from sameeragarwal/combine-filters.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants