Fix memory leak when percolating with nested documents #6578

martijnvg · 2014-06-20T13:53:11Z

The percolator uses non segment reader impl (MemoryIndexReader), this causes associated cache entries not automatically be cleared when the reader closes and that is why the percolator removes the cache entries manually (filter cache, field data)

However when percolating a document with nested objects a multi reader is used that wraps a MemoryIndexReader for each nested object. Cache entries use the leaves as key in filter / field data cache, but the percolator clear using the top level multi reader. This causes cache entries never to be evicted and resulting in OOM.

The memory is fixed by the following changes:

Not allowing caching for non segment reader implementations.
The percolator never cache anything that associated with the in-memory index where document being percolated resides in.

tiran · 2014-06-21T13:18:30Z

Martijn's patch fixes a critical production issue that prevents us from updating from 0.90 to 1.2 We are using nested documents and percolators a lot. With 1.2.1 memory consumption on our integration system figuratively explodes to more than 8GB RSS in a matter of minutes although we ran our tests with just a small fraction of our documents.

We have cherry-picked your commits on top of ES 1.2 branch and deployed 1.2.2-SNAPSHOT on our integration server. Memory consumption is keeping steady at 2.4 GB for 900k docs and 700 MB index for more than an hour.

Thanks a lot!

julianhille · 2014-06-23T13:04:45Z

is t here an ETA for this to be fixed in an official release? we have to decide if we do this on our own.
thank you for any help / information.

kimchy · 2014-06-23T13:26:59Z

we still need to review it, once its in, releasing 1.2.2 is relatively simple, and will be released based on urgency of issues found, so we take your input into account!

s1monw · 2014-06-24T10:39:52Z

src/main/java/org/elasticsearch/index/cache/filter/weighted/WeightedFilterCache.java

@@ -137,7 +145,7 @@ public Filter cache(Filter filterToCache) {

        private final Filter filter;

-        private final WeightedFilterCache cache;
+        private final WeightedFilterCache cache ;


extra space?

s1monw · 2014-06-24T10:49:02Z

I added some comments! Good change @martijnvg

s1monw · 2014-06-24T10:49:35Z

oh can you please label it and put review back once you have changes?

martijnvg · 2014-06-25T09:09:02Z

@s1monw Thanks for reviewing it, I updated the PR.

s1monw · 2014-06-26T08:27:24Z

src/main/java/org/elasticsearch/index/cache/filter/weighted/WeightedFilterCache.java

@@ -157,12 +165,11 @@ public DocIdSet getDocIdSet(AtomicReaderContext context, Bits acceptDocs) throws
            DocIdSet cacheValue = innerCache.getIfPresent(cacheKey);
            if (cacheValue == null) {
                if (!cache.seenReaders.containsKey(context.reader().getCoreCacheKey())) {
-                    Boolean previous = cache.seenReaders.putIfAbsent(context.reader().getCoreCacheKey(), Boolean.TRUE);
+                    SegmentReader segmentReader = SegmentReaderUtils.segmentReader(context.reader());


what I meant here was instead of casting here to a SegmentReader at all you should do this:

Boolean previous = cache.seenReaders.putIfAbsent(context.reader().getCoreCacheKey(), Boolean.TRUE); if (previous == null) { // we add a core closed listener only, for non core IndexReaders we rely on clear being called (percolator for example) SegmentReaderUtils.registerCoreListener(context.getReader(), cache); }

we fixed SegmentReaderUtils.registerCoreListener(AtomicReader reader, SegmentReader.CoreClosedListener listener) in lucene 4 to work with atomic readers too so we get that upgrade for free

The reason I didn't use SegmentReaderUtils.registerCoreListener(context.getReader(), cache); is that I was afraid that in the case that the validation fails in this method, the core cache key is in the seenReaders map while there is no listener registered for it. Is that a valid concern?

When #6623 is in this wouldn't be an issue, but in 1.2.x this in theory could leave core cache keys in the seenReaders map.

yeah but then please fix this in 1.2.x

Make, sense I will do that then in 1.2 branch only.

On 26 June 2014 14:09, Simon Willnauer notifications@github.com wrote:

In
src/main/java/org/elasticsearch/index/cache/filter/weighted/WeightedFilterCache.java:

@@ -157,12 +165,11 @@ public DocIdSet getDocIdSet(AtomicReaderContext context, Bits acceptDocs) throws
DocIdSet cacheValue = innerCache.getIfPresent(cacheKey);
if (cacheValue == null) {
if (!cache.seenReaders.containsKey(context.reader().getCoreCacheKey())) {

Boolean previous = cache.seenReaders.putIfAbsent(context.reader().getCoreCacheKey(), Boolean.TRUE);

SegmentReader segmentReader = SegmentReaderUtils.segmentReader(context.reader());

yeah but then please fix this in 1.2.x

—
Reply to this email directly or view it on GitHub
https://github.com/elasticsearch/elasticsearch/pull/6578/files#r14236611
.

Met vriendelijke groet,

Martijn van Groningen

s1monw · 2014-06-26T08:27:57Z

left one comment, other than that looks good

…the field data caches. Percolator: Never cache filters and field data in percolator for the percolator query parsing part. Closes elastic#6553

martijnvg mentioned this pull request Jun 20, 2014

OOM on percolators #6553

Closed

areek assigned rmuir Jun 23, 2014

areek unassigned rmuir Jun 23, 2014

s1monw reviewed Jun 24, 2014
View reviewed changes

martijnvg added review and removed review labels Jun 25, 2014

s1monw reviewed Jun 26, 2014
View reviewed changes

s1monw removed the review label Jun 26, 2014

Core: Prevent non segment readers from entering the filter cache and …

ec74a7e

…the field data caches. Percolator: Never cache filters and field data in percolator for the percolator query parsing part. Closes elastic#6553

martijnvg merged commit ec74a7e into elastic:master Jul 1, 2014

martijnvg added bug labels Jul 2, 2014

martijnvg changed the title ~~Fix memory leak when percolating with nested documents~~ Percolator: Fix memory leak when percolating with nested documents Jul 2, 2014

martijnvg mentioned this pull request Jul 11, 2014

Percolate performance difference in 1.0.0 and 1.2.2 #6806

Closed

martijnvg deleted the bugs/nested-percolator2 branch May 18, 2015 23:31

clintongormley added the :Search/Percolator Reverse search: find queries that match a document label Jun 7, 2015

clintongormley changed the title ~~Percolator: Fix memory leak when percolating with nested documents~~ Fix memory leak when percolating with nested documents Jun 7, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix memory leak when percolating with nested documents #6578

Fix memory leak when percolating with nested documents #6578

martijnvg commented Jun 20, 2014

tiran commented Jun 21, 2014

julianhille commented Jun 23, 2014

kimchy commented Jun 23, 2014

s1monw Jun 24, 2014

s1monw commented Jun 24, 2014

s1monw commented Jun 24, 2014

martijnvg commented Jun 25, 2014

s1monw Jun 26, 2014

martijnvg Jun 26, 2014

martijnvg Jun 26, 2014

s1monw Jun 26, 2014

martijnvg Jun 26, 2014

s1monw commented Jun 26, 2014

Navigation Menu

Fix memory leak when percolating with nested documents #6578

Fix memory leak when percolating with nested documents #6578

Conversation

martijnvg commented Jun 20, 2014

tiran commented Jun 21, 2014

julianhille commented Jun 23, 2014

kimchy commented Jun 23, 2014

s1monw Jun 24, 2014

Choose a reason for hiding this comment

s1monw commented Jun 24, 2014

s1monw commented Jun 24, 2014

martijnvg commented Jun 25, 2014

s1monw Jun 26, 2014

Choose a reason for hiding this comment

martijnvg Jun 26, 2014

Choose a reason for hiding this comment

martijnvg Jun 26, 2014

Choose a reason for hiding this comment

s1monw Jun 26, 2014

Choose a reason for hiding this comment

martijnvg Jun 26, 2014

Choose a reason for hiding this comment

s1monw commented Jun 26, 2014