Navigation Menu

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix memory leak when percolating with nested documents #6578

Merged
merged 1 commit into from Jul 1, 2014

Conversation

martijnvg
Copy link
Member

The percolator uses non segment reader impl (MemoryIndexReader), this causes associated cache entries not automatically be cleared when the reader closes and that is why the percolator removes the cache entries manually (filter cache, field data)

However when percolating a document with nested objects a multi reader is used that wraps a MemoryIndexReader for each nested object. Cache entries use the leaves as key in filter / field data cache, but the percolator clear using the top level multi reader. This causes cache entries never to be evicted and resulting in OOM.

The memory is fixed by the following changes:

  • Not allowing caching for non segment reader implementations.
  • The percolator never cache anything that associated with the in-memory index where document being percolated resides in.

@martijnvg martijnvg mentioned this pull request Jun 20, 2014
@tiran
Copy link

tiran commented Jun 21, 2014

Martijn's patch fixes a critical production issue that prevents us from updating from 0.90 to 1.2 We are using nested documents and percolators a lot. With 1.2.1 memory consumption on our integration system figuratively explodes to more than 8GB RSS in a matter of minutes although we ran our tests with just a small fraction of our documents.

We have cherry-picked your commits on top of ES 1.2 branch and deployed 1.2.2-SNAPSHOT on our integration server. Memory consumption is keeping steady at 2.4 GB for 900k docs and 700 MB index for more than an hour.

Thanks a lot!

@julianhille
Copy link

is t here an ETA for this to be fixed in an official release? we have to decide if we do this on our own.
thank you for any help / information.

@kimchy
Copy link
Member

kimchy commented Jun 23, 2014

we still need to review it, once its in, releasing 1.2.2 is relatively simple, and will be released based on urgency of issues found, so we take your input into account!

@areek areek unassigned rmuir Jun 23, 2014
@@ -137,7 +145,7 @@ public Filter cache(Filter filterToCache) {

private final Filter filter;

private final WeightedFilterCache cache;
private final WeightedFilterCache cache ;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extra space?

@s1monw
Copy link
Contributor

s1monw commented Jun 24, 2014

I added some comments! Good change @martijnvg

@s1monw
Copy link
Contributor

s1monw commented Jun 24, 2014

oh can you please label it and put review back once you have changes?

@martijnvg martijnvg added review and removed review labels Jun 25, 2014
@martijnvg
Copy link
Member Author

@s1monw Thanks for reviewing it, I updated the PR.

@@ -157,12 +165,11 @@ public DocIdSet getDocIdSet(AtomicReaderContext context, Bits acceptDocs) throws
DocIdSet cacheValue = innerCache.getIfPresent(cacheKey);
if (cacheValue == null) {
if (!cache.seenReaders.containsKey(context.reader().getCoreCacheKey())) {
Boolean previous = cache.seenReaders.putIfAbsent(context.reader().getCoreCacheKey(), Boolean.TRUE);
SegmentReader segmentReader = SegmentReaderUtils.segmentReader(context.reader());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what I meant here was instead of casting here to a SegmentReader at all you should do this:

Boolean previous = cache.seenReaders.putIfAbsent(context.reader().getCoreCacheKey(), Boolean.TRUE);
if (previous == null) {
    // we add a core closed listener only, for non core IndexReaders we rely on clear being called (percolator for example)
    SegmentReaderUtils.registerCoreListener(context.getReader(), cache); 
}

we fixed SegmentReaderUtils.registerCoreListener(AtomicReader reader, SegmentReader.CoreClosedListener listener) in lucene 4 to work with atomic readers too so we get that upgrade for free

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason I didn't use SegmentReaderUtils.registerCoreListener(context.getReader(), cache); is that I was afraid that in the case that the validation fails in this method, the core cache key is in the seenReaders map while there is no listener registered for it. Is that a valid concern?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When #6623 is in this wouldn't be an issue, but in 1.2.x this in theory could leave core cache keys in the seenReaders map.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah but then please fix this in 1.2.x

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make, sense I will do that then in 1.2 branch only.

On 26 June 2014 14:09, Simon Willnauer notifications@github.com wrote:

In
src/main/java/org/elasticsearch/index/cache/filter/weighted/WeightedFilterCache.java:

@@ -157,12 +165,11 @@ public DocIdSet getDocIdSet(AtomicReaderContext context, Bits acceptDocs) throws
DocIdSet cacheValue = innerCache.getIfPresent(cacheKey);
if (cacheValue == null) {
if (!cache.seenReaders.containsKey(context.reader().getCoreCacheKey())) {

  •                Boolean previous = cache.seenReaders.putIfAbsent(context.reader().getCoreCacheKey(), Boolean.TRUE);
    
  •                SegmentReader segmentReader = SegmentReaderUtils.segmentReader(context.reader());
    

yeah but then please fix this in 1.2.x


Reply to this email directly or view it on GitHub
https://github.com/elasticsearch/elasticsearch/pull/6578/files#r14236611
.

Met vriendelijke groet,

Martijn van Groningen

@s1monw
Copy link
Contributor

s1monw commented Jun 26, 2014

left one comment, other than that looks good

@s1monw s1monw removed the review label Jun 26, 2014
…the field data caches.

Percolator: Never cache filters and field data in percolator for the percolator query parsing part.

Closes elastic#6553
@martijnvg martijnvg merged commit ec74a7e into elastic:master Jul 1, 2014
@martijnvg martijnvg changed the title Fix memory leak when percolating with nested documents Percolator: Fix memory leak when percolating with nested documents Jul 2, 2014
@martijnvg martijnvg deleted the bugs/nested-percolator2 branch May 18, 2015 23:31
@clintongormley clintongormley added the :Search/Percolator Reverse search: find queries that match a document label Jun 7, 2015
@clintongormley clintongormley changed the title Percolator: Fix memory leak when percolating with nested documents Fix memory leak when percolating with nested documents Jun 7, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Search/Percolator Reverse search: find queries that match a document v1.2.2 v1.3.0 v2.0.0-beta1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants