-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kvserver: simplify and track entire set of gossiped IOThresholds #85739
Conversation
33d307c
to
05e3d12
Compare
05e3d12
to
21eb10b
Compare
21eb10b
to
c4df7db
Compare
c4df7db
to
4eb0c8f
Compare
@erikgrinaker this is now ready for review. I've run the latest round of tests with this PR so I'm confident it "works" and I think it leaves things in much cleaner place than before. I propose that a follow-up PR enables the cluster setting with a value of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regardless of these future possible uses, this is a nice conceptual clean-up and a good last PR to land for pausing in the 22.2 cycle (unless we find time to add quiescence in presence of paused followers, in which case that would be worthy follow-up).
This change seems fine, but we're planning on doing something about the following before 22.2 right?
I think we will have to address #84465 by reducing the frequency at which the paused stores are revisited, but adding an eager pass whenever the sequence is bumped.
4eb0c8f
to
b44a21b
Compare
We would given enough bandwidth, but if I am to join the fun on MVCC tombstones this may be something that we won't get to. Let's discuss separately. |
b44a21b
to
bb277cc
Compare
This commit makes the following changes: - track *all* IOThresholds in the store's map, not just the ones for overloaded stores. - improve the container for these IOThresholds to be easier to work with. - Rather than "hard-coding" a value of "1.0" to mean overloaded, use (and plumb) the value of the cluster setting. "1.0" is the value at which I/O admission control chooses to engage; but the cluster setting is usually smaller and determines when to consider followers on a remote store pausable. The API now reflects that and avoids this kind of confusion. - Rename all uses of the container away from "overload" towards "IOThreshold". - add a Sequence() method that is bumped whenever the set of Stores whose IOThreshold score indicates pausability changes. I originally started to work on this to address cockroachdb#84465, but realized that we couldn't "just" leave the set of paused followers untouched absent sequence changes. This is because the set of paused followers has additional inputs, most importantly the set of live followers. This set is per-Replica and subject to change, so we can't be too sure the outcome would be the same, and we do want to be reactive to followers becoming nonresponsive by, if necessary, unpausing followers. I think we will have to address cockroachdb#84465 by reducing the frequency at which the paused stores are revisited, but adding an eager pass whenever the sequence is bumped. Additionally, for cockroachdb#84252, we are likely also going to be able to rely on the sequence number to trigger unquiescing of ranges that were previously quiesced in the presence of a paused follower. Regardless of these future possible uses, this is a nice conceptual clean-up and a good last PR to land for pausing in the 22.2 cycle (unless we find time to add quiescence in presence of paused followers, in which case that would be worthy follow-up). I verified that with this commit, the [roachtest] still works and effectively avoids I/O admission control activation a large percentage of the time at a setting of 0.8. This gives good confidence - at least for this exact test - that with 0.5 we'd probably never see admission control throttle foreground writes. However, the test is fairly specific since it severely constrains n3's disk throughput, so 0.8 might be perfectly appropriate in practice still. We'll need some more experience to tell. [roachtest]: cockroachdb#81516 Touches cockroachdb#84465. Touches cockroachdb#84252. Release note: None
bb277cc
to
892d5b4
Compare
bors r=erikgrinaker |
Build succeeded: |
This commit makes the following changes:
overloaded stores.
with.
use (and plumb) the value of the cluster setting. "1.0" is the
value at which I/O admission control chooses to engage; but
the cluster setting is usually smaller and determines when to
consider followers on a remote store pausable. The API now
reflects that and avoids this kind of confusion.
"IOThreshold".
whose IOThreshold score indicates pausability changes.
I originally started to work on this to address #84465, but realized
that we couldn't "just" leave the set of paused followers untouched
absent sequence changes. This is because the set of paused followers
has additional inputs, most importantly the set of live followers.
This set is per-Replica and subject to change, so we can't be too
sure the outcome would be the same, and we do want to be reactive
to followers becoming nonresponsive by, if necessary, unpausing
followers.
I think we will have to address #84465 by reducing the frequency
at which the paused stores are revisited, but adding an eager
pass whenever the sequence is bumped.
Additionally, for #84252, we are likely also going to be able to rely on
the sequence number to trigger unquiescing of ranges that were
previously quiesced in the presence of a paused follower.
Regardless of these future possible uses, this is a nice conceptual
clean-up and a good last PR to land for pausing in the 22.2 cycle
(unless we find time to add quiescence in presence of paused followers,
in which case that would be worthy follow-up).
I verified that with this commit, the roachtest still works and
effectively avoids I/O admission control activation a large percentage
of the time at a setting of 0.8. This gives good confidence - at least
for this exact test - that with 0.5 we'd probably never see admission
control throttle foreground writes. However, the test is fairly
specific since it severely constrains n3's disk throughput, so
0.8 might be perfectly appropriate in practice still. We'll need
some more experience to tell.
Touches #84465.
Touches #84252.
Release note: None
Release justification: low-risk improvement to new functionality