-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
release-2.1: storage: fix various problems with dynamic replication factor #34199
release-2.1: storage: fix various problems with dynamic replication factor #34199
Conversation
`filterUnremovableReplicas` was allowing replicas that were a necessary part of quorum to be removed. This occurred due to `filterBehindReplicas` taking a "brand new replica" that was being considered up-to-date even when we didn't have evidence of it being up-to-date. `filterBehindReplicas` needs to return an accurate picture of the up-to-date replicas. Rather than push this work into `filterBehindReplicas`, `filterUnremovableReplicas` has been changed to perform the filtering of the "brand new replica" from the removable candidates. See cockroachdb#34122 Release note: None
Previously, `Replica.mu.lastReplicaAdded` was being set to the maximum replica ID in the descriptor whenever the descriptor changed. This was invalid when a replica was removed from the range. For example, consider a range with 9 replicas, IDs 1 through 9. If replica ID 5 is removed from the range, `lastReplicaAdded` was being set to 9. Coupled with the bug in the previous commit, this was causing replica ID 9 to appear to be up-to-date when it wasn't. The fix here isn't strictly necessary, but is done to bring sanity: `lastReplicaAdded` should accurately reflect the last replica which was added, not the maximum replica ID in the range. See cockroachdb#34122 Release note: None
Reimplement `StorePool.AvailableNodeCount` in terms of `NodeLiveness.GetNodeCount`. The latter returns a count of the user's intended number of nodes in the cluster. This is added nodes minus decommissioning (or decommissioned) nodes. This fixes misbehavior of the dynamic replication factor heuristic. The previous `StorePool.AvailableNodeCount` implementation would fluctuate depending on the number of node descriptors that had been received from gossip and the number of dead nodes. So if you had a 5-node cluster and 2 nodes died for more than 5 min (and thus marked as dead), the cluster would suddenly start down-replicating ranges. Similarly, if you had a 5-node cluster and you took down all 5-nodes and only restarted 3, the cluster would start down-replicating ranges. The new behavior is to consider a node part of the cluster until it is decommissioned. This better matches user expectations. Fixes cockroachdb#34122 Release note (bug fix): Avoid down-replicating widely replicated ranges when nodes in the cluster are temporarily down.
Release note: None
Add the `replicate/wide` roachtest which starts up a 9-node cluster, sets the replication factor for all zones to 9, waits for full replication, and then restarts the cluster, bringing up only nodes 1-6. Previously, this would cause down-replication and that down-replication could cause unavailable ranges. Further, test decommissioning one of the nodes and verify that the replication of the ranges falls to 7. Lastly, decrease the replication factor to 5 and verify the replicas per range again falls. See cockroachdb#34122 Release note: None
Change `filterUnremovableCandidates` to return zero candidates if the range is currently below quorum. This likely has little effect in practice as a below quorum range would be unable to process a replica removal. Release note: None
ba55698
to
4e1e0f8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bdarnell TestRemoveDeadReplicas is failing with this change. It is stuck waiting for up-replication to occur after removing the dead replicas. I see that you're fixing some other problem with TestRemoveDeadReplicas on master. Is there something that needs to be backported there?
TestRemoveDeadReplicas is skipped on master because @andreimatei 's change to pre-create all ranges at bootstrap made it flaky. I've fixed it for that change, but I'm still looking at how to fix it for this one. I've gotten past the hang while waiting for replication but now it's failing for other reasons I don't quite understand.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @bdarnell and @tbg)
Ack. Should I wait for that fix, or skip the test on 2.1? |
Probably skip the test on 2.1; this test is for a not-fully-supported edge case anyway. |
Release note: None
Added a commit to skip |
Backport 6/6 commits from #34126.
@bdarnell
TestRemoveDeadReplicas
is failing with this change. It is stuck waiting for up-replication to occur after removing the dead replicas. I see that you're fixing some other problem withTestRemoveDeadReplicas
on master. Is there something that needs to be backported there?/cc @cockroachdb/release
Fixes #34122
Release note (bug fix): Avoid down-replicating widely replicated ranges
when nodes in the cluster are temporarily down.