Retain order of AND, OR filter children.#10758
Conversation
If we retain the order, it enables short-circuiting. People can put a more selective filter earlier in the list and lower the chance that later filters will need to be evaluated. Short-circuiting was working before apache#9608, which switched to unordered sets to solve a different problem. This patch tries to solve that problem a different way. This patch moves filter simplification logic from "optimize" to "toFilter", because that allows the code to be shared with Filters.and and Filters.or. The simplification has become more complicated and so it's useful to share it. This patch also removes code from CalciteCnfHelper that is no longer necessary because Filters.and and Filters.or are now doing the work.
Note on this part: in theory, there is a performance risk here, since if Doing the above analysis made me feel like collapsing DimFilter and Filter would be a good idea. |
| ) | ||
| ); | ||
| final Filter expected = FilterTestUtils.and( | ||
| // The below OR filter could be eliminated because this filter also has |
There was a problem hiding this comment.
Note: these result changes were just re-orderings.
| ) | ||
| ); | ||
| final Set<Filter> expected = ImmutableSet.of( | ||
| final List<Filter> expected = ImmutableList.of( |
There was a problem hiding this comment.
Another note: these were also just reorderings.
| new InDimFilter("regionIsoCode", ImmutableSet.of("CA"), null, null).toFilter() | ||
| ) | ||
| ) | ||
| new InDimFilter("countryIsoCode", ImmutableSet.of("US"), null, null).toFilter(), |
There was a problem hiding this comment.
This rewrite got better!
| assert !filters.isEmpty(); | ||
| // Original "filters" list must have been 100% literally-true filters. | ||
| return Optional.of(TrueFilter.instance()); | ||
| } else if (filtersToUse.stream().anyMatch(filter -> filter instanceof FalseFilter)) { |
There was a problem hiding this comment.
nit: You can avoid this anyMatch check by doing the FalseFilter optimization in flattenAndChildren by making it return a single FalseFilter
There was a problem hiding this comment.
I'm not sure exactly what you're suggesting, so I left this as-is.
| assert !filters.isEmpty(); | ||
| // Original "filters" list must have been 100% literally-false filters. | ||
| return Optional.of(FalseFilter.instance()); | ||
| } else if (filtersToUse.stream().anyMatch(filter -> filter instanceof TrueFilter)) { |
There was a problem hiding this comment.
nit: similar comment to line 518
| // Order of children doesn't matter for equivalence. | ||
| final Set<Filter> ourFilterSet = new HashSet<>(((BooleanFilter) filter).getFilters()); | ||
| final Set<Filter> theirFilterSet = new HashSet<>(((BooleanFilter) that.filter).getFilters()); | ||
| return ourFilterSet.equals(theirFilterSet); |
There was a problem hiding this comment.
Does this handle nested BooleanFilters correctly?
There was a problem hiding this comment.
It handles them correctly in that it's not wrong. But, it isn't necessarily doing the best possible optimization. I edited it to use EquivalenceCheckedFilter recursively and added some more tests (see FiltersTest for the new ones).
There was a problem hiding this comment.
Nevermind, I decided to take your advice and switch the List to a LinkedHashSet, which makes this EquivalenceCheckedFilter technique no longer necessary.
| ValueMatcher[] EMPTY_VALUE_MATCHER_ARRAY = new ValueMatcher[0]; | ||
|
|
||
| Set<Filter> getFilters(); | ||
| List<Filter> getFilters(); |
There was a problem hiding this comment.
Could you add a javadoc to this function explaining why we a list is used here so that we can preserve the order of the filters to enable short-circuiting.
Writing out this comment, made me think perhaps we could achieve the same outcome by using a LinkedHashSet instead - that would make it clearer in the rest of the code that there are no duplicates in the filters
There was a problem hiding this comment.
I think doing the LinkedHashSet makes sense, so I went with that approach. It simplified the code a bit, and let me delete the EquivalenceCheckedFilter stuff.
|
|
||
| if (filter instanceof BooleanFilter && filter.getClass().equals(that.filter.getClass())) { | ||
| // Order of children doesn't matter for equivalence. | ||
| final Set<Filter> ourFilterSet = new HashSet<>(((BooleanFilter) filter).getFilters()); |
There was a problem hiding this comment.
will your changes fix #10316? I am guessing, your changes will make the situations describe in that PR less likely as Set is being created only when two AndFilter/OrFilters are compared. could we also compare the size of child filter and return early before creating a possible expensive hashSet?
There was a problem hiding this comment.
We're back to a Set due to #10758 (comment), so this won't change much. Probably memoizing the hashCode is still the way to go.
* Retain order of AND, OR filter children. If we retain the order, it enables short-circuiting. People can put a more selective filter earlier in the list and lower the chance that later filters will need to be evaluated. Short-circuiting was working before apache#9608, which switched to unordered sets to solve a different problem. This patch tries to solve that problem a different way. This patch moves filter simplification logic from "optimize" to "toFilter", because that allows the code to be shared with Filters.and and Filters.or. The simplification has become more complicated and so it's useful to share it. This patch also removes code from CalciteCnfHelper that is no longer necessary because Filters.and and Filters.or are now doing the work. * Fixes for inspections. * Fix tests. * Back to a Set.
If we retain the order, it enables short-circuiting. People can put a
more selective filter earlier in the list and lower the chance that
later filters will need to be evaluated.
Ordering was working before #9608, which switched to unordered
sets to solve a different problem. This patch tries to solve that
problem a different way.
This patch moves filter simplification logic from "optimize" to
"toFilter", because that allows the code to be shared with Filters.and
and Filters.or. The simplification has become more complicated and so
it's useful to share it.
This patch also removes code from CalciteCnfHelper that is no longer
necessary because Filters.and and Filters.or are now doing the work.