Force refresh when versionMap is using too much RAM #6443

mikemccand · 2014-06-09T21:19:48Z

If the user sets a high refresh interval, the versionMap can use
unbounded RAM. I fixed LiveVersionMap to track its RAM used, and
trigger refresh if it's > 25% of IW's RAM buffer. (We could add
another setting for this but we have so many settings already?).

I also fixed deletes to prune every index.gc_deletes/4 msec, and I
only save a delete tombstone if index.gc_deletes > 0.

I think we could expose the RAM used by versionMap somewhere
(Marvel? _cat?), but we can do that separately ... I put a TODO.

Closes #6378

If the user sets a high refresh interval, the versionMap can use unbounded RAM. I fixed LiveVersionMap to track its RAM used, and trigger refresh if it's > 25% of IW's RAM buffer. (We could add another setting for this but we have so many settings already?). I also fixed deletes to prune every index.gc_deletes/4 msec, and I only save a delete tombstone if index.gc_deletes > 0. I think we could expose the RAM used by versionMap somewhere (Marvel? _cat?), but we can do that separately ... I put a TODO. Closes elastic#6378

jpountz · 2014-06-09T22:23:37Z

src/main/java/org/elasticsearch/index/engine/internal/LiveVersionMap.java

+        4*RamUsageEstimator.NUM_BYTES_INT +
+        3*RamUsageEstimator.NUM_BYTES_LONG +
+        7*RamUsageEstimator.NUM_BYTES_OBJECT_REF + 
+        RamUsageEstimator.NUM_BYTES_ARRAY_HEADER;


Would it be make things simpler to have a ramBytesUsed method on VersionValue/DeleteVersionValue//Translog.Location (even if we only implement Accountable when upgrading to 4.9)?

@jpountz good idea! It'd be a few more adds per insert/remove but it's better to have the bytes used computation "at the source". I'll do that. Maybe BytesRef should have it too.

…uning deletes from liveMap; make failing test

…tombstones

mikemccand · 2014-06-10T16:58:47Z

OK I folded in all the feedback here (thank you!), and added two new
tests.

I reworked how deletes are handled, so that they are now included in
the versionMap.current/old but also added to a separate tombstones
map so that we can prune that map separately from using refresh to
free up RAM. I think the logic is simpler now.

s1monw · 2014-06-12T12:30:49Z

+1 to expose the RAM usage via an API. Can you please open an issue to do that? We might think further here and see how much RAM IW is using per shard as well, the DWFlushControl expose this to the FlushPolicy already so we might want to expose that via the IW API?

s1monw · 2014-06-12T12:32:51Z

src/main/java/org/elasticsearch/index/engine/internal/InternalEngine.java

-        // we need to refresh in order to clear older version values
-        refresh(new Refresh("version_table").force(true));
+    private void pruneDeletedTombstones() {
+        if (enableGcDeletes == false) {


can we maybe turn this around and remove the extra return statement... like

if (enableGcDeletes) { // do what we did in this method? }

Sure, will do.

mikemccand · 2014-06-12T13:34:57Z

I opened #6483 to expose the RAM usage via ShardStats and indices cat API...

kimchy · 2014-06-13T14:53:14Z

src/main/java/org/elasticsearch/index/engine/internal/LiveVersionMap.java

+        6*RamUsageEstimator.NUM_BYTES_OBJECT_REF + 
+        RamUsageEstimator.NUM_BYTES_ARRAY_HEADER;
+
+    final AtomicLong ramBytesUsed = new AtomicLong();


can we use LongAdder here? The call to "get" the sum doesn't have to be fast, but adding each element would be nice to not have to CAS on a single AtomicLong

Ahh good, will do.

ahh, I see its checked on each index/create operation, strike that..., won't make sense to use LongAdder

…'refresh because versionMap is full' runs at once

bleskes · 2014-06-18T09:41:37Z

src/main/java/org/elasticsearch/index/engine/internal/InternalEngine.java

@@ -531,6 +547,7 @@ private void innerIndex(Index index, IndexWriter writer) throws IOException {
            }
            Translog.Location translogLocation = translog.add(new Translog.Index(index));

+            // TODO: expose versionMap's RAM usage in ShardStats?


i think we can remove this now that we have the ticket? #6483

OK I'll remove.

bleskes · 2014-06-18T11:23:38Z

src/main/java/org/elasticsearch/index/engine/internal/LiveVersionMap.java

-        // reader.  So we can safely clear old here:
-        addsOld = ConcurrentCollections.newConcurrentMapWithAggressiveConcurrency();
+        // We can now drop old because these operations are now visible via the newly opened searcher.  Even if didRefresh is false, it's
+        // possible old has some entries in it, which is fien: it means they were actually already included in the previously opened reader,


typo in 'fien'. Question - in what scenario do we have didRefresh=false? does it mean there is potentially another refresh started but not yet finished? if so, is it safe to clean the old map?

I'll fix the typo.

Confusingly, it is safe to clear the old map. didRefresh will be false if Lucene didn't see any changes (and so didn't open a new reader) ... if old map is non-empty, that those ids were already reflected in the last reopen. This is because we assign a new "current" slightly before Lucene actually flushes any segments for the reopen, and so concurrent indexing requests can sneak in a few additions to that current map that are in fact reflected in the previous reader.

Also, only one refresh can run at once in Lucene's ReferenceManager, so we'll always see beforeRefresh then afterRefresh (never intermixed).

OK. I see. Thx for explaining.

s1monw · 2014-06-26T08:15:57Z

LGTM

mikemccand · 2014-06-28T09:18:15Z

I think along with this, we can go back to Integer.MAX_VALUE default for index.translog.flush_threshold_ops.... I'll commit that.

s1monw · 2014-06-29T10:53:11Z

@mikemccand can we make the move to INT_MAX a separate issue?

s1monw · 2014-07-02T07:39:41Z

I think this is ready, mike if you want another review put the review label back pls

mikemccand · 2014-07-04T10:11:23Z

Thanks Simon, I think it's ready too. I put xlog flushing back to 5000 ops ... I'll commit this soon.

s1monw · 2014-07-04T10:25:12Z

+1

When refresh_interval is long or disabled, and indexing rate is high, it's possible for live version map to use non-trivial amounts of RAM. With this change we now trigger a refresh in such cases to clear the version map so we don't use unbounded RAM. Closes #6443

We check if the version map needs to be refreshed after we released the readlock which can cause the the engine being closed before we read the value from the volatile `indexWriter` field which can cause an NPE on the indexing thread. This commit also fixes a potential uncaught exception if the refresh failed due to the engine being already closed. Relates to elastic#6443 Closes elastic#6786

We check if the version map needs to be refreshed after we released the readlock which can cause the the engine being closed before we read the value from the volatile `indexWriter` field which can cause an NPE on the indexing thread. This commit also fixes a potential uncaught exception if the refresh failed due to the engine being already closed. Relates to #6443 Closes #6786

The operation looks at indexWriter.getConfig(), which can throw a `org.apache.lucene.store.AlreadyClosedException` if the engine is already closed. Relates to elastic#6443, elastic#6786

…. We run it out of lock, the indexWriter may be closed.. Relates to #6443, #6786 Closes #6794

jpountz reviewed Jun 9, 2014
View reviewed changes

mikemccand added 4 commits June 10, 2014 04:22

impl ramByteUsed in Delete/VersionValue and TransLog.Location

a3a50ad

move checkVersionRefresh out of the readLock; force refresh before pr…

85694a3

…uning deletes from liveMap; make failing test

still record the delete when index.gc_deletes is 0

30dff16

include deletes in versionMap.current/old; use separate map to track …

28b9a5d

…tombstones

mikemccand added enhancement and removed enhancement labels Jun 11, 2014

s1monw reviewed Jun 12, 2014
View reviewed changes

s1monw removed the review label Jun 12, 2014

mikemccand mentioned this pull request Jun 12, 2014

Admin: Expose IndexWriter and versionMap RAM usage in stats #6483

Closed

move enableGcDeletes out of pruneDeletedTombstones

4c4475f

kimchy reviewed Jun 13, 2014
View reviewed changes

do refresh via threadPool not the current thread; make sure only one …

76ab07c

…'refresh because versionMap is full' runs at once

bleskes reviewed Jun 18, 2014
View reviewed changes

remove TODO that we already have issue for

cada2ae

bleskes reviewed Jun 18, 2014
View reviewed changes

mikemccand added 2 commits June 28, 2014 05:28

don't flush translog based on ops by default (again)

c1adae5

whitespace

0023db1

s1monw removed the review label Jul 2, 2014

still flush translog at 5000 ops

6e10842

mikemccand mentioned this pull request Jul 4, 2014

Use unlimited flush_threshold_ops for translog (again) #6726

Closed

mikemccand closed this in a8417a7 Jul 8, 2014

mikemccand mentioned this pull request Jul 8, 2014

Indexing: If versionMap is too large we should trigger refresh #6378

Closed

mikemccand added a commit that referenced this pull request Jul 8, 2014

fix test failure from #6443

54fb705

mikemccand mentioned this pull request Jul 8, 2014

Set default translog flush_threshold_ops to unlimited, to flush by byte size by default and not penalize tiny documents #6783

Closed

s1monw mentioned this pull request Jul 8, 2014

Prevent NPE if engine is closed while version map is checked #6786

Merged

bleskes mentioned this pull request Jul 9, 2014

[Engine] checkVersionMapRefresh shouldn't use indexWriter.getConfig() #6794

Closed

bleskes added a commit that referenced this pull request Jul 9, 2014

[Engine] checkVersionMapRefresh shouldn't use indexWriter.getConfig()…

af119df

…. We run it out of lock, the indexWriter may be closed.. Relates to #6443, #6786 Closes #6794

bleskes added a commit that referenced this pull request Jul 9, 2014

[Engine] checkVersionMapRefresh shouldn't use indexWriter.getConfig()…

11945ff

…. We run it out of lock, the indexWriter may be closed.. Relates to #6443, #6786 Closes #6794

areek mentioned this pull request Jul 14, 2014

Expose IndexWriter and VersionMap RAM usage #6854

Closed

clintongormley added the :Core/Infra/Core Core issues without another label label Jun 7, 2015

clintongormley changed the title ~~Indexing: force refresh when versionMap is using too much RAM~~ Force refresh when versionMap is using too much RAM Jun 7, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Force refresh when versionMap is using too much RAM #6443

Force refresh when versionMap is using too much RAM #6443

mikemccand commented Jun 9, 2014

jpountz Jun 9, 2014

mikemccand Jun 9, 2014

mikemccand commented Jun 10, 2014

s1monw commented Jun 12, 2014

s1monw Jun 12, 2014

mikemccand Jun 12, 2014

mikemccand commented Jun 12, 2014

kimchy Jun 13, 2014

mikemccand Jun 13, 2014

kimchy Jun 13, 2014

bleskes Jun 18, 2014

mikemccand Jun 18, 2014

bleskes Jun 18, 2014

mikemccand Jun 18, 2014

bleskes Jun 19, 2014

s1monw commented Jun 26, 2014

mikemccand commented Jun 28, 2014

s1monw commented Jun 29, 2014

s1monw commented Jul 2, 2014

mikemccand commented Jul 4, 2014

s1monw commented Jul 4, 2014

Force refresh when versionMap is using too much RAM #6443

Force refresh when versionMap is using too much RAM #6443

Conversation

mikemccand commented Jun 9, 2014

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mikemccand commented Jun 10, 2014

s1monw commented Jun 12, 2014

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mikemccand commented Jun 12, 2014

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

s1monw commented Jun 26, 2014

mikemccand commented Jun 28, 2014

s1monw commented Jun 29, 2014

s1monw commented Jul 2, 2014

mikemccand commented Jul 4, 2014

s1monw commented Jul 4, 2014