forked from elastic/elasticsearch
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Filter cache: add a
_cache: auto
option and make it the default.
Up to now, all filters could be cached using the `_cache` flag that could be set to `true` or `false` and the default was set depending on the type of the `filter`. For instance, `script` filters are not cached by default while `terms` are. For some filters, the default is more complicated and eg. date range filters are cached unless they use `now` in a non-rounded fashion. This commit adds a 3rd option called `auto`, which becomes the default for all filters. So for all filters a cache wrapper will be returned, and the decision will be made at caching time, per-segment. Here is the default logic: - if there is already a cache entry for this filter in the current segment, then return the cache entry. - else if the doc id set cannot iterate (eg. script filter) then do not cache. - else if the doc id set is already cacheable and it has been used twice or more in the last 1000 filters then cache it. - else if the filter is costly (eg. multi-term) and has been used twice or more in the last 1000 filters then cache it. - else if the doc id set is not cacheable and it has been used 5 times or more in the last 1000 filters, then load it into a cacheable set and cache it. - else return the uncached set. So for instance geo-distance filters and script filters are going to use this new default and are not going to be cached because of their iterators. Similarly, date range filters are going to use this default all the time, but it is very unlikely that those that use `now` in a not rounded fashion will get reused so in practice they won't be cached. `terms`, `range`, ... filters produce cacheable doc id sets with good iterators so they will be cached as soon as they have been used twice. Filters that don't produce cacheable doc id sets such as the `term` filter will need to be used 5 times before being cached. This ensures that we don't spend CPU iterating over all documents matching such filters unless we have good evidence of reuse. One last interesting point about this change is that it also applies to compound filters. So if you keep on repeating the same `bool` filter with the same underlying clauses, it will be cached on its own while up to now it used to never be cached by default. `_cache: true` has been changed to only cache on large segments, in order to not pollute the cache since small segments should not be the bottleneck anyway. However `_cache: false` still has the same semantics. Close elastic#8449
- Loading branch information
Showing
81 changed files
with
607 additions
and
1,166 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
103 changes: 103 additions & 0 deletions
103
src/main/java/org/elasticsearch/index/cache/filter/AutoFilterCachingPolicy.java
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,103 @@ | ||
/* | ||
* Licensed to Elasticsearch under one or more contributor | ||
* license agreements. See the NOTICE file distributed with | ||
* this work for additional information regarding copyright | ||
* ownership. Elasticsearch licenses this file to you under | ||
* the Apache License, Version 2.0 (the "License"); you may | ||
* not use this file except in compliance with the License. | ||
* You may obtain a copy of the License at | ||
* | ||
* http://www.apache.org/licenses/LICENSE-2.0 | ||
* | ||
* Unless required by applicable law or agreed to in writing, | ||
* software distributed under the License is distributed on an | ||
* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | ||
* KIND, either express or implied. See the License for the | ||
* specific language governing permissions and limitations | ||
* under the License. | ||
*/ | ||
|
||
package org.elasticsearch.index.cache.filter; | ||
|
||
import org.apache.lucene.index.LeafReaderContext; | ||
import org.apache.lucene.search.DocIdSet; | ||
import org.apache.lucene.search.Filter; | ||
import org.apache.lucene.search.FilterCachingPolicy; | ||
import org.apache.lucene.search.UsageTrackingFilterCachingPolicy; | ||
import org.elasticsearch.common.inject.Inject; | ||
import org.elasticsearch.common.lucene.docset.DocIdSets; | ||
import org.elasticsearch.common.settings.ImmutableSettings; | ||
import org.elasticsearch.common.settings.Settings; | ||
import org.elasticsearch.index.AbstractIndexComponent; | ||
import org.elasticsearch.index.Index; | ||
import org.elasticsearch.index.settings.IndexSettings; | ||
|
||
import java.io.IOException; | ||
|
||
/** | ||
* This class is a wrapper around {@link UsageTrackingFilterCachingPolicy} | ||
* which wires parameters through index settings and makes sure to not | ||
* cache {@link DocIdSet}s which have a {@link DocIdSets#isBroken(DocIdSetIterator) broken} | ||
* iterator. | ||
*/ | ||
public class AutoFilterCachingPolicy extends AbstractIndexComponent implements FilterCachingPolicy { | ||
|
||
// These settings don't have the purpose of being documented. They are only here so that | ||
// if anyone ever hits an issue with elasticsearch that is due to the value of one of these | ||
// parameters, then it might be possible to temporarily work around the issue without having | ||
// to wait for a new release | ||
|
||
// number of times a filter that produces cacheable filters should be seen before the doc id sets are cached | ||
public static final String MIN_FREQUENCY_COSTLY = "index.cache.filter.policy.min_frequency.costly"; | ||
// number of times a filter that produces cacheable filters should be seen before the doc id sets are cached | ||
public static final String MIN_FREQUENCY_CACHEABLE = "index.cache.filter.policy.min_frequency.cacheable"; | ||
// same for filters that produce doc id sets that are not directly cacheable | ||
public static final String MIN_FREQUENCY_OTHER = "index.cache.filter.policy.min_frequency.other"; | ||
// sources of segments that should be cached | ||
public static final String MIN_SEGMENT_SIZE_RATIO = "index.cache.filter.policy.min_segment_size_ratio"; | ||
// size of the history to keep for filters. A filter will be cached if it has been seen more than a given | ||
// number of times (depending on the filter, the segment and the produced DocIdSet) in the most | ||
// ${history_size} recently used filters | ||
public static final String HISTORY_SIZE = "index.cache.filter.policy.history_size"; | ||
|
||
public static Settings AGGRESSIVE_CACHING_SETTINGS = ImmutableSettings.builder() | ||
.put(MIN_FREQUENCY_CACHEABLE, 1) | ||
.put(MIN_FREQUENCY_COSTLY, 1) | ||
.put(MIN_FREQUENCY_OTHER, 1) | ||
.put(MIN_SEGMENT_SIZE_RATIO, 0.000000001f) | ||
.build(); | ||
|
||
private final FilterCachingPolicy in; | ||
|
||
@Inject | ||
public AutoFilterCachingPolicy(Index index, @IndexSettings Settings indexSettings) { | ||
super(index, indexSettings); | ||
final int historySize = indexSettings.getAsInt(HISTORY_SIZE, 1000); | ||
// cache aggressively filters that produce sets that are already cacheable, | ||
// ie. if the filter has been used twice or more among the most 1000 recently | ||
// used filters | ||
final int minFrequencyCacheable = indexSettings.getAsInt(MIN_FREQUENCY_CACHEABLE, 2); | ||
// cache aggressively filters whose getDocIdSet method is costly | ||
final int minFrequencyCostly = indexSettings.getAsInt(MIN_FREQUENCY_COSTLY, 2); | ||
// be a bit less aggressive when the produced doc id sets are not cacheable | ||
final int minFrequencyOther = indexSettings.getAsInt(MIN_FREQUENCY_OTHER, 5); | ||
final float minSegmentSizeRatio = indexSettings.getAsFloat(MIN_SEGMENT_SIZE_RATIO, 0.01f); | ||
in = new UsageTrackingFilterCachingPolicy(minSegmentSizeRatio, historySize, minFrequencyCostly, minFrequencyCacheable, minFrequencyOther); | ||
} | ||
|
||
@Override | ||
public void onCache(Filter filter) { | ||
in.onCache(filter); | ||
} | ||
|
||
@Override | ||
public boolean shouldCache(Filter filter, LeafReaderContext context, DocIdSet set) throws IOException { | ||
if (set != null && DocIdSets.isBroken(set.iterator())) { | ||
// O(maxDoc) to cache, no thanks. | ||
return false; | ||
} | ||
|
||
return in.shouldCache(filter, context, set); | ||
} | ||
|
||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.