Cut over to the Lucene filter cache #10897

jpountz · 2015-04-30T09:54:55Z

This removes Elasticsearch's filter cache and uses Lucene's instead. It has some
implications:

custom cache keys (_cache_key) are unsupported
decisions are made internally and can't be overridden by users ('_cache`)
not only filters can be cached but also all queries that do not need scores
parent/child queries can now be cached, however cached entries are only
valid for the current top-level reader so in practice it will likely only
be used in practice on read-only indices
the cache deduplicates filters, which plays nicer with large keys (eg. terms)
better stats: we already had ram usage and evictions, but now also hit count,
miss count, lookup count, number of cached doc id sets and current number of
doc id sets in the cache
dynamically changing the filter cache size is not supported anymore

Internally, an important change is that it removes the NoCacheFilter infrastructure
in favour of making Query.rewrite specializing the query for the current reader so
that it will only be cached on this reader (look for IndexCacheableQuery).

Note that consuming filters with the query API (createWeight/scorer) instead of
the filter API (getDocIdSet) is important for parent/child queries because
otherwise a QueryWrapperFilter(ParentQuery) would run the wrapped query per
segment while relations might be cross segments.

jpountz · 2015-04-30T09:58:22Z

The PR is large so feel free to ping me if you would like to discuss it.

rmuir · 2015-04-30T12:36:14Z

src/main/java/org/elasticsearch/common/lucene/IndexCacheableQuery.java

+    private Object readerCacheKey;
+
+    @Override
+    public Query rewrite(IndexReader reader) throws IOException {


Can this be final? If a subclass overrides this but does not call super, its caching will break.

Some parent/child queries need to plug in other rewrite rules. I can try to make it final and add a protected doRewrite method for these queries to do what they need.

I see. Not sure if its worth it, if doRewrite makes things complicated in its own way, given that createWeight is final and it will catch the issue.

rmuir · 2015-04-30T14:17:32Z

look good.

I pushed a few cleanups.

I'm not happy with the complexity surrounding ShardCoreKeyMap etc: it is clearly not worth it.
I am happy for this to be simplified in a follow-up, but I think in general we have to stop doing
stuff like this in the name of back compat / inertia / etc. The codebase is just too large and we should
apply software engineering and keep things simple when its the right thing to do.

Related to that, I hit a test fail while playing:
mvn test -Dtests.class="*ShardCoreKeyMapTests" -Dtests.method=testBasics -Dtests.seed=D176C7AD20084F10

I think the 10 places where we do 'new IndexSearcher' should be reviewed and always have an explicit cache set
or not set in a followup as well (its rather unrelated to the change at hand).

jpountz · 2015-04-30T16:03:22Z

@rmuir I pushed more commits to address your comments.

I tend to agree with the complexity of ShardCoreKeyMap, it needs to be balanced with the benefit of having per-shard stats. The reason why I tried to hard to keep having per-shard stats is that doing otherwise would require an important refactoring of our monitoring APIs.

rmuir · 2015-04-30T16:07:10Z

Changes look great, +1

Yeah, as i said about the map, lets defer it to another issue. This one makes so much progress on its own.

This removes Elasticsearch's filter cache and uses Lucene's instead. It has some implications: - custom cache keys (`_cache_key`) are unsupported - decisions are made internally and can't be overridden by users ('_cache`) - not only filters can be cached but also all queries that do not need scores - parent/child queries can now be cached, however cached entries are only valid for the current top-level reader so in practice it will likely only be used on read-only indices - the cache deduplicates filters, which plays nicer with large keys (eg. `terms`) - better stats: we already had ram usage and evictions, but now also hit count, miss count, lookup count, number of cached doc id sets and current number of doc id sets in the cache - dynamically changing the filter cache size is not supported anymore Internally, an important change is that it removes the NoCacheFilter infrastructure in favour of making Query.rewrite specializing the query for the current reader so that it will only be cached on this reader (look for IndexCacheableQuery). Note that consuming filters with the query API (createWeight/scorer) instead of the filter API (getDocIdSet) is important for parent/child queries because otherwise a QueryWrapperFilter(ParentQuery) would run the wrapped query per segment while relations might be cross segments.

Core: Cut over to the Lucene filter cache. Close #10897

jpountz added v2.0.0-beta1 review labels Apr 30, 2015

This was referenced Apr 30, 2015

Core: Cut over to the Lucene filter cache. #9176

Closed

Search: reduce dependencies on the Filter API. #10731

Closed

jpountz added the >breaking label Apr 30, 2015

rmuir reviewed Apr 30, 2015
View reviewed changes

rjernst mentioned this pull request Apr 30, 2015

Upgrade lucene snapshot to r1677039 #10908

Merged

jpountz mentioned this pull request May 1, 2015

Does not honor cache sizes changed on the fly #10860

Closed

jpountz force-pushed the fix/nocache branch from 9009c99 to b72f27a Compare May 4, 2015 07:14

jpountz added a commit that referenced this pull request May 4, 2015

Merge pull request #10897 from jpountz/fix/nocache

3409994

Core: Cut over to the Lucene filter cache. Close #10897

jpountz merged commit 3409994 into elastic:master May 4, 2015

kevinkluge removed the review label May 4, 2015

jpountz deleted the fix/nocache branch May 4, 2015 07:14

clintongormley changed the title ~~Core: Cut over to the Lucene filter cache.~~ Cut over to the Lucene filter cache Jun 6, 2015

clintongormley added the :Search/Search Search-related issues that do not fall into other categories label Jun 6, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cut over to the Lucene filter cache #10897

Cut over to the Lucene filter cache #10897

jpountz commented Apr 30, 2015

jpountz commented Apr 30, 2015

rmuir Apr 30, 2015

jpountz Apr 30, 2015

rmuir Apr 30, 2015

rmuir commented Apr 30, 2015

jpountz commented Apr 30, 2015

rmuir commented Apr 30, 2015

Cut over to the Lucene filter cache #10897

Cut over to the Lucene filter cache #10897

Conversation

jpountz commented Apr 30, 2015

jpountz commented Apr 30, 2015

rmuir Apr 30, 2015

Choose a reason for hiding this comment

jpountz Apr 30, 2015

Choose a reason for hiding this comment

rmuir Apr 30, 2015

Choose a reason for hiding this comment

rmuir commented Apr 30, 2015

jpountz commented Apr 30, 2015

rmuir commented Apr 30, 2015