New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
storage: lower the raft-log queue timer from 50ms to 0ms #23869
storage: lower the raft-log queue timer from 50ms to 0ms #23869
Conversation
Lower the raft-log queue timer from 50ms to 0ms. This timer was forcing an artificial delay between raft-log truncation operations which was in turn allowing the raft-log to grow undesirably long in cluster overload situations. When the raft-log for a range grows to large, the eventual truncation operation can then take a prohibitively long time which leads to a downward spiral of performance, oftentimes resulting in Raft snapshots (which are significantly more expensive) and even further performance degradation. There are no downsides to a zero duration between raft-log truncations as there are other mechanisms in place to avoid performing truncations unless they are necessary (e.g. tracking of the raft-log size and the number of entries). Release note (performance improvement): Improve cluster performance during overload scenarios.
Review status: 0 of 1 files reviewed at latest revision, all discussions resolved. Comments from Reviewable |
@jordanlewis I recall you saw this during tpcc testing. So did I which is what motivated this change. Did you file an issue about it? (I couldn't find the issue if you did). |
FYI, I plan to cherry-pick this for 2.0. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍👍👍
I don't think I filed an issue. It was in the performance weekly notes:
|
Lower the raft-log queue timer from 50ms to 0ms. This timer was forcing
an artificial delay between raft-log truncation operations which was in
turn allowing the raft-log to grow undesirably long in cluster overload
situations. When the raft-log for a range grows to large, the eventual
truncation operation can then take a prohibitively long time which leads
to a downward spiral of performance, oftentimes resulting in Raft
snapshots (which are significantly more expensive) and even further
performance degradation. There are no downsides to a zero duration
between raft-log truncations as there are other mechanisms in place to
avoid performing truncations unless they are necessary (e.g. tracking of
the raft-log size and the number of entries).
Release note (performance improvement): Improve cluster performance
during overload scenarios.