Skip to content

Eliminate common subfilters when converting it to a CNF#9608

Merged
himanshug merged 1 commit intoapache:masterfrom
jihoonson:eliminate-duplicate-filters
Apr 6, 2020
Merged

Eliminate common subfilters when converting it to a CNF#9608
himanshug merged 1 commit intoapache:masterfrom
jihoonson:eliminate-duplicate-filters

Conversation

@jihoonson
Copy link
Contributor

Description

Filters.toCNF() is used in 2 places, 1) to find prefilters when scanning segments that bitmaps are available to evaluate them and 2) to find filters which can be pushed down in a Join query. This function was adopted from Hive, but have some issues with optimization. One of issues is that it didn't eliminate common subfilters, which sometimes resulted in a huge number of subfilters which in turn could cause an OOM error. This PR is to eliminate those common subfilters by changing the type of subfilters of AndFilter and OrFilter from List to Set. This is to detect common subfilters of different orders, otherwise those subfilters should be sorted in some order which seems more complicated than using a Set.

The approach used in this PR can create a filter in a suboptimal CNF. There are at least 2 cases I'm aware of.

  • Even though x IN (1,2,3) is equivalent to x = 1 OR x = 2 OR x = 3, but this case is not handled yet. We may need to decide which form is more optimal (probably using IN filter is better since it will use less memory).
  • There are some missing cases which can be further reduced. For example, (A || ~(E)) && (A || ~(F)) && (A || ~(E) || ~(F)) can be reduced as below. This case is not handled yet, but I left a comment about it in FiltersTest.testToCNFWithComplexFilterIncludingNotAndOr().
(A || ~(E)) && (A || ~(F)) && (A || ~(E) || ~(F))
=> ((A && (~(E) || ~(F))) && (A || ~(E) || ~(F)))
=> (A && (~(E) || ~(F))
=> (A || ~(E)) && (A || ~(F))

This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths.
  • added integration tests.
  • been tested in a test Druid cluster.

@jihoonson
Copy link
Contributor Author

I'm tagging this PR as 0.18.0 since it could affect to Join performance significantly which is a new feature of 0.18.0.

@himanshug himanshug merged commit 40e84a1 into apache:master Apr 6, 2020
jihoonson added a commit to jihoonson/druid that referenced this pull request Apr 6, 2020
JulianJaffePinterest pushed a commit to JulianJaffePinterest/druid that referenced this pull request Jun 12, 2020
gianm added a commit to gianm/druid that referenced this pull request Jan 14, 2021
If we retain the order, it enables short-circuiting. People can put a
more selective filter earlier in the list and lower the chance that
later filters will need to be evaluated.

Short-circuiting was working before apache#9608, which switched to unordered
sets to solve a different problem. This patch tries to solve that
problem a different way.

This patch moves filter simplification logic from "optimize" to
"toFilter", because that allows the code to be shared with Filters.and
and Filters.or. The simplification has become more complicated and so
it's useful to share it.

This patch also removes code from CalciteCnfHelper that is no longer
necessary because Filters.and and Filters.or are now doing the work.
suneet-s pushed a commit that referenced this pull request Jan 20, 2021
* Retain order of AND, OR filter children.

If we retain the order, it enables short-circuiting. People can put a
more selective filter earlier in the list and lower the chance that
later filters will need to be evaluated.

Short-circuiting was working before #9608, which switched to unordered
sets to solve a different problem. This patch tries to solve that
problem a different way.

This patch moves filter simplification logic from "optimize" to
"toFilter", because that allows the code to be shared with Filters.and
and Filters.or. The simplification has become more complicated and so
it's useful to share it.

This patch also removes code from CalciteCnfHelper that is no longer
necessary because Filters.and and Filters.or are now doing the work.

* Fixes for inspections.

* Fix tests.

* Back to a Set.
JulianJaffePinterest pushed a commit to JulianJaffePinterest/druid that referenced this pull request Jan 22, 2021
* Retain order of AND, OR filter children.

If we retain the order, it enables short-circuiting. People can put a
more selective filter earlier in the list and lower the chance that
later filters will need to be evaluated.

Short-circuiting was working before apache#9608, which switched to unordered
sets to solve a different problem. This patch tries to solve that
problem a different way.

This patch moves filter simplification logic from "optimize" to
"toFilter", because that allows the code to be shared with Filters.and
and Filters.or. The simplification has become more complicated and so
it's useful to share it.

This patch also removes code from CalciteCnfHelper that is no longer
necessary because Filters.and and Filters.or are now doing the work.

* Fixes for inspections.

* Fix tests.

* Back to a Set.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants