Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cut over to the Lucene filter cache #10897

Merged
merged 1 commit into from May 4, 2015
Merged

Conversation

jpountz
Copy link
Contributor

@jpountz jpountz commented Apr 30, 2015

This removes Elasticsearch's filter cache and uses Lucene's instead. It has some
implications:

  • custom cache keys (_cache_key) are unsupported
  • decisions are made internally and can't be overridden by users ('_cache`)
  • not only filters can be cached but also all queries that do not need scores
  • parent/child queries can now be cached, however cached entries are only
    valid for the current top-level reader so in practice it will likely only
    be used in practice on read-only indices
  • the cache deduplicates filters, which plays nicer with large keys (eg. terms)
  • better stats: we already had ram usage and evictions, but now also hit count,
    miss count, lookup count, number of cached doc id sets and current number of
    doc id sets in the cache
  • dynamically changing the filter cache size is not supported anymore

Internally, an important change is that it removes the NoCacheFilter infrastructure
in favour of making Query.rewrite specializing the query for the current reader so
that it will only be cached on this reader (look for IndexCacheableQuery).

Note that consuming filters with the query API (createWeight/scorer) instead of
the filter API (getDocIdSet) is important for parent/child queries because
otherwise a QueryWrapperFilter(ParentQuery) would run the wrapped query per
segment while relations might be cross segments.

@jpountz
Copy link
Contributor Author

jpountz commented Apr 30, 2015

The PR is large so feel free to ping me if you would like to discuss it.

private Object readerCacheKey;

@Override
public Query rewrite(IndexReader reader) throws IOException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be final? If a subclass overrides this but does not call super, its caching will break.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some parent/child queries need to plug in other rewrite rules. I can try to make it final and add a protected doRewrite method for these queries to do what they need.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Not sure if its worth it, if doRewrite makes things complicated in its own way, given that createWeight is final and it will catch the issue.

@rmuir
Copy link
Contributor

rmuir commented Apr 30, 2015

look good.

I pushed a few cleanups.

I'm not happy with the complexity surrounding ShardCoreKeyMap etc: it is clearly not worth it.
I am happy for this to be simplified in a follow-up, but I think in general we have to stop doing
stuff like this in the name of back compat / inertia / etc. The codebase is just too large and we should
apply software engineering and keep things simple when its the right thing to do.

Related to that, I hit a test fail while playing:
mvn test -Dtests.class="*ShardCoreKeyMapTests" -Dtests.method=testBasics -Dtests.seed=D176C7AD20084F10

I think the 10 places where we do 'new IndexSearcher' should be reviewed and always have an explicit cache set
or not set in a followup as well (its rather unrelated to the change at hand).

@jpountz
Copy link
Contributor Author

jpountz commented Apr 30, 2015

@rmuir I pushed more commits to address your comments.

I tend to agree with the complexity of ShardCoreKeyMap, it needs to be balanced with the benefit of having per-shard stats. The reason why I tried to hard to keep having per-shard stats is that doing otherwise would require an important refactoring of our monitoring APIs.

@rmuir
Copy link
Contributor

rmuir commented Apr 30, 2015

Changes look great, +1

Yeah, as i said about the map, lets defer it to another issue. This one makes so much progress on its own.

This removes Elasticsearch's filter cache and uses Lucene's instead. It has some
implications:
 - custom cache keys (`_cache_key`) are unsupported
 - decisions are made internally and can't be overridden by users ('_cache`)
 - not only filters can be cached but also all queries that do not need scores
 - parent/child queries can now be cached, however cached entries are only
   valid for the current top-level reader so in practice it will likely only
   be used on read-only indices
 - the cache deduplicates filters, which plays nicer with large keys (eg. `terms`)
 - better stats: we already had ram usage and evictions, but now also hit count,
   miss count, lookup count, number of cached doc id sets and current number of
   doc id sets in the cache
 - dynamically changing the filter cache size is not supported anymore

Internally, an important change is that it removes the NoCacheFilter infrastructure
in favour of making Query.rewrite specializing the query for the current reader so
that it will only be cached on this reader (look for IndexCacheableQuery).

Note that consuming filters with the query API (createWeight/scorer) instead of
the filter API (getDocIdSet) is important for parent/child queries because
otherwise a QueryWrapperFilter(ParentQuery) would run the wrapped query per
segment while relations might be cross segments.
jpountz added a commit that referenced this pull request May 4, 2015
Core: Cut over to the Lucene filter cache.

Close #10897
@jpountz jpountz merged commit 3409994 into elastic:master May 4, 2015
@kevinkluge kevinkluge removed the review label May 4, 2015
@jpountz jpountz deleted the fix/nocache branch May 4, 2015 07:14
@clintongormley clintongormley changed the title Core: Cut over to the Lucene filter cache. Cut over to the Lucene filter cache Jun 6, 2015
@clintongormley clintongormley added the :Search/Search Search-related issues that do not fall into other categories label Jun 6, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>breaking :Search/Search Search-related issues that do not fall into other categories v2.0.0-beta1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants