storage: lower the raft-log queue timer from 50ms to 0ms #23869

petermattis · 2018-03-14T21:24:09Z

Lower the raft-log queue timer from 50ms to 0ms. This timer was forcing
an artificial delay between raft-log truncation operations which was in
turn allowing the raft-log to grow undesirably long in cluster overload
situations. When the raft-log for a range grows to large, the eventual
truncation operation can then take a prohibitively long time which leads
to a downward spiral of performance, oftentimes resulting in Raft
snapshots (which are significantly more expensive) and even further
performance degradation. There are no downsides to a zero duration
between raft-log truncations as there are other mechanisms in place to
avoid performing truncations unless they are necessary (e.g. tracking of
the raft-log size and the number of entries).

Release note (performance improvement): Improve cluster performance
during overload scenarios.

Lower the raft-log queue timer from 50ms to 0ms. This timer was forcing an artificial delay between raft-log truncation operations which was in turn allowing the raft-log to grow undesirably long in cluster overload situations. When the raft-log for a range grows to large, the eventual truncation operation can then take a prohibitively long time which leads to a downward spiral of performance, oftentimes resulting in Raft snapshots (which are significantly more expensive) and even further performance degradation. There are no downsides to a zero duration between raft-log truncations as there are other mechanisms in place to avoid performing truncations unless they are necessary (e.g. tracking of the raft-log size and the number of entries). Release note (performance improvement): Improve cluster performance during overload scenarios.

cockroach-teamcity · 2018-03-14T21:24:13Z

This change is

bdarnell · 2018-03-14T21:24:38Z

Review status: 0 of 1 files reviewed at latest revision, all discussions resolved.

Comments from Reviewable

petermattis · 2018-03-14T21:24:56Z

@jordanlewis I recall you saw this during tpcc testing. So did I which is what motivated this change. Did you file an issue about it? (I couldn't find the issue if you did).

petermattis · 2018-03-14T21:26:48Z

FYI, I plan to cherry-pick this for 2.0.

a-robinson

👍👍👍

jordanlewis · 2018-03-14T23:05:47Z

I don't think I filed an issue. It was in the performance weekly notes:

[nathan] seeing a lot of raft log truncation events
- We were seeing the raft log truncation queue operating a lot. This isn’t necessarily indicative of anything going wrong, but we are seeing nonzero snapshots. So that’s indication that this is maybe not working as intended. Investigating
- [peter] We don’t have much tracing around the quota pool, but we should make sure we add a tracing event there, as it’s the last stop before the raft train leaves the tracing station.
- The quota pool is the thing that proactively rate limits the raft log
- [ben] lots of magic constants baked in above - we should look at tweaking them, perhaps that will fix/mitigate the problems.

petermattis requested a review from a team March 14, 2018 21:24

petermattis requested a review from bdarnell March 14, 2018 21:24

petermattis mentioned this pull request Mar 14, 2018

storage: reevaluate gc queue and replica gc queue timer settings #23870

Closed

a-robinson approved these changes Mar 14, 2018

View reviewed changes

petermattis merged commit 67a1d96 into cockroachdb:master Mar 14, 2018

petermattis deleted the pmattis/raft-log-queue-timer branch March 14, 2018 23:55

petermattis mentioned this pull request Mar 14, 2018

backport-2.0: storage: lower the raft-log queue timer from 50ms to 0ms #23884

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

storage: lower the raft-log queue timer from 50ms to 0ms #23869

storage: lower the raft-log queue timer from 50ms to 0ms #23869

petermattis commented Mar 14, 2018

cockroach-teamcity commented Mar 14, 2018

bdarnell commented Mar 14, 2018

petermattis commented Mar 14, 2018

petermattis commented Mar 14, 2018

a-robinson left a comment

jordanlewis commented Mar 14, 2018

storage: lower the raft-log queue timer from 50ms to 0ms #23869

storage: lower the raft-log queue timer from 50ms to 0ms #23869

Conversation

petermattis commented Mar 14, 2018

cockroach-teamcity commented Mar 14, 2018

bdarnell commented Mar 14, 2018

petermattis commented Mar 14, 2018

petermattis commented Mar 14, 2018

a-robinson left a comment

Choose a reason for hiding this comment

jordanlewis commented Mar 14, 2018