Skip to content

Retain order of AND, OR filter children.#10758

Merged
suneet-s merged 5 commits intoapache:masterfrom
gianm:filter-and-or-retain-order
Jan 20, 2021
Merged

Retain order of AND, OR filter children.#10758
suneet-s merged 5 commits intoapache:masterfrom
gianm:filter-and-or-retain-order

Conversation

@gianm
Copy link
Contributor

@gianm gianm commented Jan 14, 2021

If we retain the order, it enables short-circuiting. People can put a
more selective filter earlier in the list and lower the chance that
later filters will need to be evaluated.

Ordering was working before #9608, which switched to unordered
sets to solve a different problem. This patch tries to solve that
problem a different way.

This patch moves filter simplification logic from "optimize" to
"toFilter", because that allows the code to be shared with Filters.and
and Filters.or. The simplification has become more complicated and so
it's useful to share it.

This patch also removes code from CalciteCnfHelper that is no longer
necessary because Filters.and and Filters.or are now doing the work.

If we retain the order, it enables short-circuiting. People can put a
more selective filter earlier in the list and lower the chance that
later filters will need to be evaluated.

Short-circuiting was working before apache#9608, which switched to unordered
sets to solve a different problem. This patch tries to solve that
problem a different way.

This patch moves filter simplification logic from "optimize" to
"toFilter", because that allows the code to be shared with Filters.and
and Filters.or. The simplification has become more complicated and so
it's useful to share it.

This patch also removes code from CalciteCnfHelper that is no longer
necessary because Filters.and and Filters.or are now doing the work.
@gianm
Copy link
Contributor Author

gianm commented Jan 14, 2021

This patch moves filter simplification logic from "optimize" to
"toFilter", because that allows the code to be shared with Filters.and
and Filters.or. The simplification has become more complicated and so
it's useful to share it.

Note on this part: in theory, there is a performance risk here, since if toFilter() is called more often than optimize(), it means we'd do the simplification work more often. As far as I can tell, this isn't a big risk, since calls to toFilter() are generally either going through AbstractOptimizableDimFilter.toOptimizedFilter (which are cached) or are happening in places that are once per query, not once per segment (like FilteredAggregatorFactory's constructor).

Doing the above analysis made me feel like collapsing DimFilter and Filter would be a good idea.

)
);
final Filter expected = FilterTestUtils.and(
// The below OR filter could be eliminated because this filter also has
Copy link
Contributor Author

@gianm gianm Jan 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: these result changes were just re-orderings.

)
);
final Set<Filter> expected = ImmutableSet.of(
final List<Filter> expected = ImmutableList.of(
Copy link
Contributor Author

@gianm gianm Jan 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another note: these were also just reorderings.

new InDimFilter("regionIsoCode", ImmutableSet.of("CA"), null, null).toFilter()
)
)
new InDimFilter("countryIsoCode", ImmutableSet.of("US"), null, null).toFilter(),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This rewrite got better!

@gianm gianm requested a review from jihoonson January 15, 2021 10:38
assert !filters.isEmpty();
// Original "filters" list must have been 100% literally-true filters.
return Optional.of(TrueFilter.instance());
} else if (filtersToUse.stream().anyMatch(filter -> filter instanceof FalseFilter)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: You can avoid this anyMatch check by doing the FalseFilter optimization in flattenAndChildren by making it return a single FalseFilter

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure exactly what you're suggesting, so I left this as-is.

assert !filters.isEmpty();
// Original "filters" list must have been 100% literally-false filters.
return Optional.of(FalseFilter.instance());
} else if (filtersToUse.stream().anyMatch(filter -> filter instanceof TrueFilter)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: similar comment to line 518

// Order of children doesn't matter for equivalence.
final Set<Filter> ourFilterSet = new HashSet<>(((BooleanFilter) filter).getFilters());
final Set<Filter> theirFilterSet = new HashSet<>(((BooleanFilter) that.filter).getFilters());
return ourFilterSet.equals(theirFilterSet);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this handle nested BooleanFilters correctly?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It handles them correctly in that it's not wrong. But, it isn't necessarily doing the best possible optimization. I edited it to use EquivalenceCheckedFilter recursively and added some more tests (see FiltersTest for the new ones).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nevermind, I decided to take your advice and switch the List to a LinkedHashSet, which makes this EquivalenceCheckedFilter technique no longer necessary.

ValueMatcher[] EMPTY_VALUE_MATCHER_ARRAY = new ValueMatcher[0];

Set<Filter> getFilters();
List<Filter> getFilters();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a javadoc to this function explaining why we a list is used here so that we can preserve the order of the filters to enable short-circuiting.

Writing out this comment, made me think perhaps we could achieve the same outcome by using a LinkedHashSet instead - that would make it clearer in the rest of the code that there are no duplicates in the filters

Copy link
Contributor Author

@gianm gianm Jan 20, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think doing the LinkedHashSet makes sense, so I went with that approach. It simplified the code a bit, and let me delete the EquivalenceCheckedFilter stuff.


if (filter instanceof BooleanFilter && filter.getClass().equals(that.filter.getClass())) {
// Order of children doesn't matter for equivalence.
final Set<Filter> ourFilterSet = new HashSet<>(((BooleanFilter) filter).getFilters());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will your changes fix #10316? I am guessing, your changes will make the situations describe in that PR less likely as Set is being created only when two AndFilter/OrFilters are compared. could we also compare the size of child filter and return early before creating a possible expensive hashSet?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're back to a Set due to #10758 (comment), so this won't change much. Probably memoizing the hashCode is still the way to go.

Copy link
Contributor

@suneet-s suneet-s left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@suneet-s suneet-s merged commit 8b808c4 into apache:master Jan 20, 2021
@gianm gianm added this to the 0.21.0 milestone Jan 20, 2021
JulianJaffePinterest pushed a commit to JulianJaffePinterest/druid that referenced this pull request Jan 22, 2021
* Retain order of AND, OR filter children.

If we retain the order, it enables short-circuiting. People can put a
more selective filter earlier in the list and lower the chance that
later filters will need to be evaluated.

Short-circuiting was working before apache#9608, which switched to unordered
sets to solve a different problem. This patch tries to solve that
problem a different way.

This patch moves filter simplification logic from "optimize" to
"toFilter", because that allows the code to be shared with Filters.and
and Filters.or. The simplification has become more complicated and so
it's useful to share it.

This patch also removes code from CalciteCnfHelper that is no longer
necessary because Filters.and and Filters.or are now doing the work.

* Fixes for inspections.

* Fix tests.

* Back to a Set.
@gianm gianm deleted the filter-and-or-retain-order branch September 23, 2022 19:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants