-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kvserver: rebalance ranges to minimize QPS delta among stores #72296
Merged
craig
merged 1 commit into
cockroachdb:master
from
aayushshah15:preventRebalanceThrashing
Jan 29, 2022
Merged
kvserver: rebalance ranges to minimize QPS delta among stores #72296
craig
merged 1 commit into
cockroachdb:master
from
aayushshah15:preventRebalanceThrashing
Jan 29, 2022
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
7a15a96
to
1b21828
Compare
560d92d
to
b2a89f9
Compare
b2d8bf9
to
13ac983
Compare
thanks for the reviews!! bors r+ |
Build failed (retrying...): |
Build failed (retrying...): |
Build failed (retrying...): |
Build failed: |
bors r+ |
Build failed: |
bors r+ |
Build succeeded: |
This was referenced Jan 29, 2022
Closed
andreimatei
added a commit
to andreimatei/cockroach
that referenced
this pull request
Jan 31, 2022
This setting had rotted, leading to a span use-after-Finish. This was caught by the recently-unskipped follower reads roachtests (roachtests panic on use-after-finish), accounting for all the issues below. The last entry in these issues refers to the nodes crashing because of the bug fixed here. Some of these issues have been open for a long time for more relevant failures - those have been fixed in cockroachdb#72296. Fixes cockroachdb#75716 Fixes cockroachdb#75715 Fixes cockroachdb#75714 Fixes cockroachdb#71244 Fixes cockroachdb#71126 Fixes cockroachdb#70818 Fixes cockroachdb#70191 Fixes cockroachdb#70011 Fixes cockroachdb#70010 Fixes cockroachdb#69952 Fixes cockroachdb#69951 Release note: None
andreimatei
added a commit
to andreimatei/cockroach
that referenced
this pull request
Jan 31, 2022
This setting had rotted, leading to a span use-after-Finish. This was caught by the recently-unskipped follower reads roachtests (roachtests panic on use-after-finish), accounting for all the issues below. The last entry in these issues refers to the nodes crashing because of the bug fixed here. Some of these issues have been open for a long time for more relevant failures - those have been fixed in cockroachdb#72296. Fixes cockroachdb#75716 Fixes cockroachdb#75715 Fixes cockroachdb#75714 Fixes cockroachdb#71244 Fixes cockroachdb#71126 Fixes cockroachdb#70818 Fixes cockroachdb#70191 Fixes cockroachdb#70011 Fixes cockroachdb#70010 Fixes cockroachdb#69952 Fixes cockroachdb#69951 Release note: None
andreimatei
added a commit
to andreimatei/cockroach
that referenced
this pull request
Jan 31, 2022
This setting had rotted, leading to a span use-after-Finish. This was caught by the recently-unskipped follower reads roachtests (roachtests panic on use-after-finish), accounting for all the issues below. The last entry in these issues refers to the nodes crashing because of the bug fixed here. Some of these issues have been open for a long time for more relevant failures - those have been fixed in cockroachdb#72296. Fixes cockroachdb#75716 Fixes cockroachdb#75715 Fixes cockroachdb#75714 Fixes cockroachdb#71244 Fixes cockroachdb#71126 Fixes cockroachdb#70818 Fixes cockroachdb#70191 Fixes cockroachdb#70011 Fixes cockroachdb#70010 Fixes cockroachdb#69952 Fixes cockroachdb#69951 Release note: None
craig bot
pushed a commit
that referenced
this pull request
Feb 1, 2022
75739: sql: fix the sql.trace.stmt.enable_threshold cluster setting r=andreimatei a=andreimatei This setting had rotted, leading to a span use-after-Finish. This was caught by the recently-unskipped follower reads roachtests (roachtests panic on use-after-finish), accounting for all the issues below. The last entry in these issues refers to the nodes crashing because of the bug fixed here. Some of these issues have been open for a long time for more relevant failures - those have been fixed in #72296. Fixes #75716 Fixes #75715 Fixes #75714 Fixes #71244 Fixes #71126 Fixes #70818 Fixes #70191 Fixes #70011 Fixes #70010 Fixes #69952 Fixes #69951 Release note: None 75764: roachtest: add a variant of tpch_concurrency with no admission control r=yuzefovich a=yuzefovich Informs: #74836. Release note: None Co-authored-by: Andrei Matei <andrei@cockroachlabs.com> Co-authored-by: Yahor Yuzefovich <yahor@cockroachlabs.com>
This was referenced Feb 11, 2022
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
kvserver: rebalance ranges to minimize QPS delta among stores
This commit fixes the regression(s) introduced by
#65379 where we observed replica
thrashing in various workloads (#70396 and #71244).
The following is a description of the differences between the QPS based
rebalancing scheme used in the previous implementation of the store rebalancer
(release-21.2 and before) and the "new" implementation (22.1 and beyond).
lease rebalancing
release 21.2 and before
QPS based lease rebalancing in CRDB 21.2 considers the overall cluster level
average QPS and computes underfull and overfull thresholds based off of this
average. For each range that the local store has a lease for, the store
rebalancer goroutine checks whether transferring said range's lease away will
bring the local store's QPS below the underfull threshold. If so, it ignores
the range and moves on to the next one. Otherwise, it iterates through the
stores of all the non-leaseholder voting replicas (in ascending order of their
QPS) and checks whether it would be reasonable to transfer the lease away to
such a store. It ensures that the receiving store would not become overfull
after the lease transfer. It checks that the receiving store doesn't have a
replica that's lagging behind the current leaseholder. It checks that the
receiving store is not in violation of lease preferences. Finally, it ensures
that the lease is not on the local store because of access locality
considerations (i.e. because of follow-the-workload).
All of this was bespoke logic that lived in the store rebalancer (using none of
the Allocator's machinery).
master and this commit
In #65379, we moved this decision making into the Allocator by adding a new
mode in
Allocator.TransferLeaseTarget
that tries to determine whethertransferring the lease to another voting replica would reduce the qps delta
between the hottest and the coldest stores in the replica set. This commit adds
some padding to this logic by ensuring that the qps difference between the
store relinquishing the lease and the store receiving the lease is at least
200qps. Furthermore, it ensures that the store receiving the lease won't become
significantly hotter than the current leaseholder.
replica rebalancing
release 21.2 and before
QPS replica rebalancing in CRDB <=21.2 works similarly to the lease rebalancing
logic. We first compute a cluster level QPS average, overfull and underfull
thresholds. Based on these thresholds we try to move replicas away from
overfull stores and onto stores that are underfull, all while ensuring that the
receiving stores would not become overfull after the rebalance. A critical
assumption that the store rebalancer made (and still does, in the approach
implemented by this commit) is that follower replicas serve the same traffic as
the leaseholder.
master and this commit
The approach implemented by #65379 and refined by this commit tries to leverage
machinery in the Allocator that makes rebalancing decisions that converge load
based statistics per equivalence class. Previously, this machinery was only
used for range count based replica rebalancing (performed by the
replicateQueue
) but not for qps-based rebalancing. This commit implements asimilar approach to what we do now for lease rebalancing, which is to determine
whether a rebalance action would reduce the qps delta between the hottest and
the coldest store in the equivalence class. This commit adds some safeguards
around this logic by ensuring that the store relinquishing the replica and the
store receiving it differ by at least 200 qps. Furthermore, it ensures that the
replica rebalance would not significantly switch the relative dispositions of
the two stores.
An important thing to note with the 21.2 implementation of the store rebalancer
is that it was making all of its decisions based on cluster-level QPS averages.
This behaves poorly in heterogenously sized / loaded clusters where some
localities are designed to receive more traffic than others. In such clusters,
heavily loaded localities can always be considered "overfull". This usually
means that all stores in such localities would be above the "overfull"
threshold in the cluster. The logic described above would effectively not do
anything since there are no underfull stores to move replicas to.
Manual testing
![tpcc-with-low-ramp](https://user-images.githubusercontent.com/10788754/149742518-981825f4-6812-41c1-8320-519caafda9c1.png)
This patch has been stress tested with the follower reads roachtests (~250 iterations of
follower-reads/survival=region/locality=global/reads=strong
and 100 iterations offollower-reads/survival=zone/locality=regional/reads=exact-staleness
). It has also beenstress tested with the
rebalance/by-load
roachtests (100 iterations for both..leases
and..replicas
tests). I also manually ran a TPCC 10K run with a small ramp (something weknow triggers #31135) a few times and
saw average QPS converge among stores fairly quickly.
Release note (performance improvement): A set of bugs that rendered QPS-based
lease and replica rebalancing in CRDB 21.2 and prior ineffective under
heterogenously loaded cluster localities has been fixed. Additionally a
limitation which prevented CRDB from effectively alleviating extreme QPS hotspots
from nodes has also been fixed.