Store IO throttling throttles far more than asked #6018

mikemccand · 2014-05-02T10:47:34Z

I've been digging into the "merges can fall behind" at high indexing
rates, and I discovered some serious issues with the IO throttling,
which we recently (#5902) up'd from 20 MB/sec to 50 MB/sec by default.

Net/net I think when we ask for 50 MB/sec today we are really
throttling at something like 8 MB/sec!

Details:

I indexed a bunch of small log-file type docs into 1 shard, 0
replicas, using 1 sync _bulk client, to the point where it did it's
first big-ish merge (611 MB, 440K docs); the merge does not use CFS so
it's really writing 611 MB. I'm using a fast SSD.

With no throttling (index.store.throttle.type=none), the merge takes
20.8 seconds.

With the default 50 MB/sec merge throttling, it takes 72.1 sec, which
far too long (611 MB / 50 = 12.2 sec). The rate limiter enforces the
instantaneous rate, so at worse the merge time should have been 20.8 +
12.2 = 33 sec but likely much less than that because merging takes
CPU time.

So I dug in and discovered one problem, I think caused by the
super.flush and then delegate.flush in BufferedChecksumIndexOutput,
where the RateLimiter is always alternately called first on 8192 bytes
then on 0 bytes. If I fix RateLimiter to just ignore those 0 bytes,
the merge time with 50 MB/sec throttle drops to 49.9 sec: better, but
still too long. (I think once we cutover to Lucene's checksums this 0
byte issue will be fixed?)

System.nanoTime is actually quite costly, so I suspect the overhead of
just checking whether to pause, and of calling Thread.sleep, is way
too much when the pause time is small. So I change SimpleRateLimiter to
just accumulate the incoming bytes and then once it crosses 1 msec
worth at the specified rate, invoke the pause logic.

This really improved it: now the merge takes 25.7 sec at 50 MB/sec
throttle, and 64.9 sec at 10 MB/sec throttle. These times seem correct.

I'll also open a Lucene issue to fix this, and make an XRateLimiter
for ES in the meantime.

rmuir · 2014-05-02T12:16:23Z

So I dug in and discovered one problem, I think caused by the
super.flush and then delegate.flush in BufferedChecksumIndexOutput,
where the RateLimiter is always alternately called first on 8192 bytes
then on 0 bytes. If I fix RateLimiter to just ignore those 0 bytes,
the merge time with 50 MB/sec throttle drops to 49.9 sec: better, but
still too long. (I think once we cutover to Lucene's checksums this 0
byte issue will be fixed?)

It will solve it in that place: the BufferedChecksumIndexOutput. Note that RateLimitedIndexOutput itself has a similar construction, we should ensure its not doing anything stupid too.

Another similar expensive call is any digester's updateBytes(). We want to avoid calling this native method frequently (e.g. for zero bytes) for the same reasons.

mikemccand · 2014-05-05T10:17:47Z

I opened https://issues.apache.org/jira/browse/LUCENE-5641 to fix this in Lucene; I'll carry the fix over to ES once that's in ...

This commit upgrades to the latest Lucene 4.8.1 release including the following bugfixes: * An IndexThrottle now kicks in when merges start falling behind limiting index threads to 1 until merges caught up. Closes #6066 * RateLimiter now kicks in at the configured rate where previously the limiter was limiting at ~8MB/sec almost all the time. Closes #6018

Lucene's RateLimiter can do too much sleeping on small values (see also elastic#6018). The issue here is that calls to "pause" are not properly guarded in "restoreFile". Instead of simply adding the guard, this commit uses the RateLimitingInputStream similar as for "snapshotFile". Closes elastic#13828

Lucene's RateLimiter can do too much sleeping on small values (see also #6018). The issue here is that calls to "pause" are not properly guarded in "restoreFile". Instead of simply adding the guard, this commit uses the RateLimitingInputStream similar as for "snapshotFile". Closes #13828

mikemccand added v2.0.0 labels May 2, 2014

mikemccand self-assigned this May 2, 2014

mikemccand mentioned this issue May 7, 2014

Should we lower the default merge IO throttle rate? #6081

Closed

mikemccand added a commit that referenced this issue May 7, 2014

#6018: buffer up bytes before calling RateLimiter

34d4a03

mikemccand added a commit that referenced this issue May 12, 2014

#6081: set IO throttling back to 20 MB/sec now that #6018 is fixed

00fcf4d

mikemccand added a commit that referenced this issue May 12, 2014

#6081: set IO throttling back to 20 MB/sec now that #6018 is fixed

7cf8234

s1monw added the bug label May 18, 2014

s1monw closed this as completed in 85a0b76 May 19, 2014

mikemccand mentioned this issue Mar 20, 2015

Increase default rate limiting for snapshot, restore and recovery to 40 MB/sec #10185

Closed

ywelsch mentioned this issue Sep 28, 2015

Snapshot restore operations throttle more than specified #13828

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Store IO throttling throttles far more than asked #6018

Store IO throttling throttles far more than asked #6018

mikemccand commented May 2, 2014

rmuir commented May 2, 2014

mikemccand commented May 5, 2014

Store IO throttling throttles far more than asked #6018

Store IO throttling throttles far more than asked #6018

Comments

mikemccand commented May 2, 2014

rmuir commented May 2, 2014

mikemccand commented May 5, 2014