Search: Track most commonly used filters and cache them #8449

jpountz · 2014-11-12T11:13:59Z

Whether or not a filter should be cached is currently up to the filter parsers. For instance, we cache terms filters because they are costly to build and produce a result that is already cacheable, and do not cache geo-distance filters because they are usually expected to be applied to the position of a user of the system that keeps changing all the time.

We could make the default better by tracking usage statistics of the filters and decide to cache based on the cost of the filter and how often the filter is used.

The text was updated successfully, but these errors were encountered:

clintongormley · 2014-11-12T13:19:34Z

+1

mikemccand · 2014-11-12T14:35:09Z

+1

nik9000 · 2014-11-12T14:41:09Z

+1

Probably should be able to opt in/out of the behavior.
What about filters who's cache key is ridiculous without being overridden?
Might be cool to turn _cache support _cache as a number between 0 and 1 that the user can set closed to 1 to signal a priori knowledge about whether the filter is more likely to be reused. true would map to 1 (always cache) and false to 0 (never cache). I dunno. Maybe over complicated.

rjernst · 2014-11-14T21:33:51Z

Might be cool to turn _cache support _cache as a number between 0 and 1 that the user can set closed to 1 to signal a priori knowledge about whether the filter is more likely to be reused. true would map to 1 (always cache) and false to 0 (never cache). I dunno. Maybe over complicated.

That sounds too complicated. I think it would be better to have the default based on stats, and then have a special value, say "always" that would bypass the stats and ensure the filter is cached. Although I'm not sure how much we need the ability to override the stats based approach, since anything that should be cached should quickly rise statistically.

clintongormley · 2014-11-17T11:28:54Z

@rjernst @nik9000 I think the _cache setting should accept three parameters only: auto (default), true, false.

rjernst · 2014-11-19T19:37:38Z

@clintongormley I like that, defaults are the right place for this. And I'm ok with the "auto" terminology.

Up to now, all filters could be cached using the `_cache` flag that could be set to `true` or `false` and the default was set depending on the type of the `filter`. For instance, `script` filters are not cached by default while `terms` are. For some filters, the default is more complicated and eg. date range filters are cached unless they use `now` in a non-rounded fashion. This commit adds a 3rd option called `auto`, which becomes the default for all filters. So for all filters a cache wrapper will be returned, and the decision will be made at caching time, per-segment. Here is the default logic: - if there is already a cache entry for this filter in the current segment, then return the cache entry. - else if the doc id set cannot iterate (eg. script filter) then do not cache. - else if the doc id set is already cacheable and it has been used twice or more in the last 1000 filters then cache it. - else if the filter is costly (eg. multi-term) and has been used twice or more in the last 1000 filters then cache it. - else if the doc id set is not cacheable and it has been used 5 times or more in the last 1000 filters, then load it into a cacheable set and cache it. - else return the uncached set. So for instance geo-distance filters and script filters are going to use this new default and are not going to be cached because of their iterators. Similarly, date range filters are going to use this default all the time, but it is very unlikely that those that use `now` in a not rounded fashion will get reused so in practice they won't be cached. `terms`, `range`, ... filters produce cacheable doc id sets with good iterators so they will be cached as soon as they have been used twice. Filters that don't produce cacheable doc id sets such as the `term` filter will need to be used 5 times before being cached. This ensures that we don't spend CPU iterating over all documents matching such filters unless we have good evidence of reuse. One last interesting point about this change is that it also applies to compound filters. So if you keep on repeating the same `bool` filter with the same underlying clauses, it will be cached on its own while up to now it used to never be cached by default. `_cache: true` has been changed to only cache on large segments, in order to not pollute the cache since small segments should not be the bottleneck anyway. However `_cache: false` still has the same semantics. Close elastic#8449

jpountz added >enhancement v2.0.0-beta1 :Search/Search Search-related issues that do not fall into other categories labels Nov 12, 2014

jpountz mentioned this issue Nov 12, 2014

Core: Do not cache term filters by default. #7583

Closed

jpountz self-assigned this Nov 14, 2014

jpountz mentioned this issue Nov 20, 2014

Filter cache: add a _cache: auto option and make it the default. #8573

Closed

jpountz closed this as completed in ce11e0e Dec 18, 2014

clintongormley added the release highlight label Dec 19, 2014

jpountz added :Search/Search Search-related issues that do not fall into other categories and removed :Search/Search Search-related issues that do not fall into other categories v2.0.0-beta1 release highlight >enhancement labels Dec 29, 2014

markwalkom mentioned this issue Jul 2, 2015

Provide access to filter cache use via stats #11987

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Search: Track most commonly used filters and cache them #8449

Search: Track most commonly used filters and cache them #8449

jpountz commented Nov 12, 2014

clintongormley commented Nov 12, 2014

mikemccand commented Nov 12, 2014

nik9000 commented Nov 12, 2014

rjernst commented Nov 14, 2014

clintongormley commented Nov 17, 2014

rjernst commented Nov 19, 2014

Search: Track most commonly used filters and cache them #8449

Search: Track most commonly used filters and cache them #8449

Comments

jpountz commented Nov 12, 2014

clintongormley commented Nov 12, 2014

mikemccand commented Nov 12, 2014

nik9000 commented Nov 12, 2014

rjernst commented Nov 14, 2014

clintongormley commented Nov 17, 2014

rjernst commented Nov 19, 2014