Eliminate common subfilters when converting it to a CNF#9608
Merged
himanshug merged 1 commit intoapache:masterfrom Apr 6, 2020
Merged
Eliminate common subfilters when converting it to a CNF#9608himanshug merged 1 commit intoapache:masterfrom
himanshug merged 1 commit intoapache:masterfrom
Conversation
Contributor
Author
|
I'm tagging this PR as 0.18.0 since it could affect to Join performance significantly which is a new feature of 0.18.0. |
jihoonson
added a commit
to jihoonson/druid
that referenced
this pull request
Apr 6, 2020
fjy
pushed a commit
that referenced
this pull request
Apr 7, 2020
9 tasks
JulianJaffePinterest
pushed a commit
to JulianJaffePinterest/druid
that referenced
this pull request
Jun 12, 2020
gianm
added a commit
to gianm/druid
that referenced
this pull request
Jan 14, 2021
If we retain the order, it enables short-circuiting. People can put a more selective filter earlier in the list and lower the chance that later filters will need to be evaluated. Short-circuiting was working before apache#9608, which switched to unordered sets to solve a different problem. This patch tries to solve that problem a different way. This patch moves filter simplification logic from "optimize" to "toFilter", because that allows the code to be shared with Filters.and and Filters.or. The simplification has become more complicated and so it's useful to share it. This patch also removes code from CalciteCnfHelper that is no longer necessary because Filters.and and Filters.or are now doing the work.
suneet-s
pushed a commit
that referenced
this pull request
Jan 20, 2021
* Retain order of AND, OR filter children. If we retain the order, it enables short-circuiting. People can put a more selective filter earlier in the list and lower the chance that later filters will need to be evaluated. Short-circuiting was working before #9608, which switched to unordered sets to solve a different problem. This patch tries to solve that problem a different way. This patch moves filter simplification logic from "optimize" to "toFilter", because that allows the code to be shared with Filters.and and Filters.or. The simplification has become more complicated and so it's useful to share it. This patch also removes code from CalciteCnfHelper that is no longer necessary because Filters.and and Filters.or are now doing the work. * Fixes for inspections. * Fix tests. * Back to a Set.
JulianJaffePinterest
pushed a commit
to JulianJaffePinterest/druid
that referenced
this pull request
Jan 22, 2021
* Retain order of AND, OR filter children. If we retain the order, it enables short-circuiting. People can put a more selective filter earlier in the list and lower the chance that later filters will need to be evaluated. Short-circuiting was working before apache#9608, which switched to unordered sets to solve a different problem. This patch tries to solve that problem a different way. This patch moves filter simplification logic from "optimize" to "toFilter", because that allows the code to be shared with Filters.and and Filters.or. The simplification has become more complicated and so it's useful to share it. This patch also removes code from CalciteCnfHelper that is no longer necessary because Filters.and and Filters.or are now doing the work. * Fixes for inspections. * Fix tests. * Back to a Set.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Filters.toCNF()is used in 2 places, 1) to find prefilters when scanning segments that bitmaps are available to evaluate them and 2) to find filters which can be pushed down in a Join query. This function was adopted from Hive, but have some issues with optimization. One of issues is that it didn't eliminate common subfilters, which sometimes resulted in a huge number of subfilters which in turn could cause an OOM error. This PR is to eliminate those common subfilters by changing the type of subfilters ofAndFilterandOrFilterfromListtoSet. This is to detect common subfilters of different orders, otherwise those subfilters should be sorted in some order which seems more complicated than using aSet.The approach used in this PR can create a filter in a suboptimal CNF. There are at least 2 cases I'm aware of.
x IN (1,2,3)is equivalent tox = 1 OR x = 2 OR x = 3, but this case is not handled yet. We may need to decide which form is more optimal (probably usingINfilter is better since it will use less memory).(A || ~(E)) && (A || ~(F)) && (A || ~(E) || ~(F))can be reduced as below. This case is not handled yet, but I left a comment about it inFiltersTest.testToCNFWithComplexFilterIncludingNotAndOr().This PR has: