-
Notifications
You must be signed in to change notification settings - Fork 24.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Force refresh when versionMap is using too much RAM #6443
Conversation
If the user sets a high refresh interval, the versionMap can use unbounded RAM. I fixed LiveVersionMap to track its RAM used, and trigger refresh if it's > 25% of IW's RAM buffer. (We could add another setting for this but we have so many settings already?). I also fixed deletes to prune every index.gc_deletes/4 msec, and I only save a delete tombstone if index.gc_deletes > 0. I think we could expose the RAM used by versionMap somewhere (Marvel? _cat?), but we can do that separately ... I put a TODO. Closes elastic#6378
4*RamUsageEstimator.NUM_BYTES_INT + | ||
3*RamUsageEstimator.NUM_BYTES_LONG + | ||
7*RamUsageEstimator.NUM_BYTES_OBJECT_REF + | ||
RamUsageEstimator.NUM_BYTES_ARRAY_HEADER; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be make things simpler to have a ramBytesUsed
method on VersionValue/DeleteVersionValue//Translog.Location (even if we only implement Accountable when upgrading to 4.9)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jpountz good idea! It'd be a few more adds per insert/remove but it's better to have the bytes used computation "at the source". I'll do that. Maybe BytesRef should have it too.
…uning deletes from liveMap; make failing test
OK I folded in all the feedback here (thank you!), and added two new I reworked how deletes are handled, so that they are now included in |
+1 to expose the RAM usage via an API. Can you please open an issue to do that? We might think further here and see how much RAM IW is using per shard as well, the DWFlushControl expose this to the FlushPolicy already so we might want to expose that via the IW API? |
// we need to refresh in order to clear older version values | ||
refresh(new Refresh("version_table").force(true)); | ||
private void pruneDeletedTombstones() { | ||
if (enableGcDeletes == false) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we maybe turn this around and remove the extra return statement... like
if (enableGcDeletes) {
// do what we did in this method?
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, will do.
I opened #6483 to expose the RAM usage via ShardStats and indices cat API... |
6*RamUsageEstimator.NUM_BYTES_OBJECT_REF + | ||
RamUsageEstimator.NUM_BYTES_ARRAY_HEADER; | ||
|
||
final AtomicLong ramBytesUsed = new AtomicLong(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we use LongAdder here? The call to "get" the sum doesn't have to be fast, but adding each element would be nice to not have to CAS on a single AtomicLong
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh good, will do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ahh, I see its checked on each index/create operation, strike that..., won't make sense to use LongAdder
…'refresh because versionMap is full' runs at once
@@ -531,6 +547,7 @@ private void innerIndex(Index index, IndexWriter writer) throws IOException { | |||
} | |||
Translog.Location translogLocation = translog.add(new Translog.Index(index)); | |||
|
|||
// TODO: expose versionMap's RAM usage in ShardStats? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think we can remove this now that we have the ticket? #6483
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK I'll remove.
// reader. So we can safely clear old here: | ||
addsOld = ConcurrentCollections.newConcurrentMapWithAggressiveConcurrency(); | ||
// We can now drop old because these operations are now visible via the newly opened searcher. Even if didRefresh is false, it's | ||
// possible old has some entries in it, which is fien: it means they were actually already included in the previously opened reader, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo in 'fien'. Question - in what scenario do we have didRefresh=false? does it mean there is potentially another refresh started but not yet finished? if so, is it safe to clean the old map?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll fix the typo.
Confusingly, it is safe to clear the old map. didRefresh will be false if Lucene didn't see any changes (and so didn't open a new reader) ... if old map is non-empty, that those ids were already reflected in the last reopen. This is because we assign a new "current" slightly before Lucene actually flushes any segments for the reopen, and so concurrent indexing requests can sneak in a few additions to that current map that are in fact reflected in the previous reader.
Also, only one refresh can run at once in Lucene's ReferenceManager, so we'll always see beforeRefresh then afterRefresh (never intermixed).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. I see. Thx for explaining.
LGTM |
I think along with this, we can go back to Integer.MAX_VALUE default for index.translog.flush_threshold_ops.... I'll commit that. |
@mikemccand can we make the move to |
I think this is ready, mike if you want another review put the review label back pls |
Thanks Simon, I think it's ready too. I put xlog flushing back to 5000 ops ... I'll commit this soon. |
+1 |
When refresh_interval is long or disabled, and indexing rate is high, it's possible for live version map to use non-trivial amounts of RAM. With this change we now trigger a refresh in such cases to clear the version map so we don't use unbounded RAM. Closes #6443
We check if the version map needs to be refreshed after we released the readlock which can cause the the engine being closed before we read the value from the volatile `indexWriter` field which can cause an NPE on the indexing thread. This commit also fixes a potential uncaught exception if the refresh failed due to the engine being already closed. Relates to elastic#6443 Closes elastic#6786
We check if the version map needs to be refreshed after we released the readlock which can cause the the engine being closed before we read the value from the volatile `indexWriter` field which can cause an NPE on the indexing thread. This commit also fixes a potential uncaught exception if the refresh failed due to the engine being already closed. Relates to #6443 Closes #6786
The operation looks at indexWriter.getConfig(), which can throw a `org.apache.lucene.store.AlreadyClosedException` if the engine is already closed. Relates to elastic#6443, elastic#6786
If the user sets a high refresh interval, the versionMap can use
unbounded RAM. I fixed LiveVersionMap to track its RAM used, and
trigger refresh if it's > 25% of IW's RAM buffer. (We could add
another setting for this but we have so many settings already?).
I also fixed deletes to prune every index.gc_deletes/4 msec, and I
only save a delete tombstone if index.gc_deletes > 0.
I think we could expose the RAM used by versionMap somewhere
(Marvel? _cat?), but we can do that separately ... I put a TODO.
Closes #6378