New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix memory leak when percolating with nested documents #6578
Conversation
Martijn's patch fixes a critical production issue that prevents us from updating from 0.90 to 1.2 We are using nested documents and percolators a lot. With 1.2.1 memory consumption on our integration system figuratively explodes to more than 8GB RSS in a matter of minutes although we ran our tests with just a small fraction of our documents. We have cherry-picked your commits on top of ES 1.2 branch and deployed 1.2.2-SNAPSHOT on our integration server. Memory consumption is keeping steady at 2.4 GB for 900k docs and 700 MB index for more than an hour. Thanks a lot! |
is t here an ETA for this to be fixed in an official release? we have to decide if we do this on our own. |
we still need to review it, once its in, releasing 1.2.2 is relatively simple, and will be released based on urgency of issues found, so we take your input into account! |
@@ -137,7 +145,7 @@ public Filter cache(Filter filterToCache) { | |||
|
|||
private final Filter filter; | |||
|
|||
private final WeightedFilterCache cache; | |||
private final WeightedFilterCache cache ; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
extra space?
I added some comments! Good change @martijnvg |
oh can you please label it and put |
@s1monw Thanks for reviewing it, I updated the PR. |
@@ -157,12 +165,11 @@ public DocIdSet getDocIdSet(AtomicReaderContext context, Bits acceptDocs) throws | |||
DocIdSet cacheValue = innerCache.getIfPresent(cacheKey); | |||
if (cacheValue == null) { | |||
if (!cache.seenReaders.containsKey(context.reader().getCoreCacheKey())) { | |||
Boolean previous = cache.seenReaders.putIfAbsent(context.reader().getCoreCacheKey(), Boolean.TRUE); | |||
SegmentReader segmentReader = SegmentReaderUtils.segmentReader(context.reader()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what I meant here was instead of casting here to a SegmentReader
at all you should do this:
Boolean previous = cache.seenReaders.putIfAbsent(context.reader().getCoreCacheKey(), Boolean.TRUE);
if (previous == null) {
// we add a core closed listener only, for non core IndexReaders we rely on clear being called (percolator for example)
SegmentReaderUtils.registerCoreListener(context.getReader(), cache);
}
we fixed SegmentReaderUtils.registerCoreListener(AtomicReader reader, SegmentReader.CoreClosedListener listener)
in lucene 4 to work with atomic readers too so we get that upgrade for free
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason I didn't use SegmentReaderUtils.registerCoreListener(context.getReader(), cache);
is that I was afraid that in the case that the validation fails in this method, the core cache key is in the seenReaders
map while there is no listener registered for it. Is that a valid concern?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When #6623 is in this wouldn't be an issue, but in 1.2.x this in theory could leave core cache keys in the seenReaders
map.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah but then please fix this in 1.2.x
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make, sense I will do that then in 1.2 branch only.
On 26 June 2014 14:09, Simon Willnauer notifications@github.com wrote:
In
src/main/java/org/elasticsearch/index/cache/filter/weighted/WeightedFilterCache.java:@@ -157,12 +165,11 @@ public DocIdSet getDocIdSet(AtomicReaderContext context, Bits acceptDocs) throws
DocIdSet cacheValue = innerCache.getIfPresent(cacheKey);
if (cacheValue == null) {
if (!cache.seenReaders.containsKey(context.reader().getCoreCacheKey())) {
Boolean previous = cache.seenReaders.putIfAbsent(context.reader().getCoreCacheKey(), Boolean.TRUE);
SegmentReader segmentReader = SegmentReaderUtils.segmentReader(context.reader());
yeah but then please fix this in 1.2.x
—
Reply to this email directly or view it on GitHub
https://github.com/elasticsearch/elasticsearch/pull/6578/files#r14236611
.
Met vriendelijke groet,
Martijn van Groningen
left one comment, other than that looks good |
…the field data caches. Percolator: Never cache filters and field data in percolator for the percolator query parsing part. Closes elastic#6553
The percolator uses non segment reader impl (MemoryIndexReader), this causes associated cache entries not automatically be cleared when the reader closes and that is why the percolator removes the cache entries manually (filter cache, field data)
However when percolating a document with nested objects a multi reader is used that wraps a MemoryIndexReader for each nested object. Cache entries use the leaves as key in filter / field data cache, but the percolator clear using the top level multi reader. This causes cache entries never to be evicted and resulting in OOM.
The memory is fixed by the following changes: