-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
storage: Simplify raft automatic campaigning after PreVote #24920
Conversation
Reviewed 2 of 2 files at r1, 4 of 4 files at r2, 1 of 1 files at r3, 1 of 1 files at r4. pkg/storage/replica.go, line 548 at r1 (raw file):
The Raft groups are still created lazily, and in a large enough cluster most ranges are going to be quiesced, so in that scenario there won't be a "pre-election storm" that can cause latency blips in the foreground workload on the cluster. Correct? pkg/storage/replica.go, line 599 at r4 (raw file):
The method comment suggests that this needs a nil check. Is the Raft group populated in the new callsites as well? Might be worth adjusting the comment. Comments from Reviewable |
Review status: 3 of 4 files reviewed at latest revision, 2 unresolved discussions, some commit checks failed. pkg/storage/replica.go, line 548 at r1 (raw file): Previously, tschottdorf (Tobias Schottdorf) wrote…
Right. The fact that most ranges are quiesced doesn't really matter. The key factors in preventing storms are that the ranges are created lazily and that unquiescing a range will not usually cause a campaign because the previous leaseholder is still alive so it is presumed to still be the leader. pkg/storage/replica.go, line 599 at r4 (raw file): Previously, tschottdorf (Tobias Schottdorf) wrote…
Comments from Reviewable |
0cce1b6
to
bfa2bfb
Compare
Reviewed 2 of 2 files at r5, 1 of 1 files at r6, 1 of 1 files at r7. Comments from Reviewable |
PreVote will be the only option in 2.1 Release note: None
With PreVote, it is less disruptive to campaign unnecessarily, so there's no need for this additional check. Release note: None
Fold the necessary checks into withRaftGroupLocked and remove unnecessary arguments. This has the effect of campaigning somewhat more than before but that's OK since PreVote minimizes the disruption. Fixes cockroachdb#18365 Release note: None
This removes the time-to-recovery penalty for disabling TickQuiesced. Note that the new maybeCampaignOnWakeLocked method is not a verbatim copy of the code that was moved from withRaftGroupLocked; new conditions were added to avoid unnecessary campaigns since this method can be called more than before. Release note: None
With the change to automatically campaign when unquiescing, this should no longer be necessary. The option will be removed in a subsequent change. Release note: None
bors r+ |
24920: storage: Simplify raft automatic campaigning after PreVote r=bdarnell a=bdarnell Before we implemented PreVote, we had various heuristics to decide when we should ask raft to campaign (bypassing the usual timeout). Since PreVote has reduced the cost of raft elections (by ensuring that a node that calls for an election it can't win doesn't disrupt its peers), we can get by with simpler logic. In addition to simplifying the logic, this PR introduces a new campaign trigger when a range unquiesces. This is a prerequisite for getting rid of the TickQuiesced hack (which is disabled by default in this PR and will be removed in a future one). Fixes #18365 Co-authored-by: Ben Darnell <ben@cockroachlabs.com>
Build succeeded |
24956: storage: Maintain a separate set of unquiesced replicas r=petermattis a=bdarnell This means that idle replicas no longer have a per-tick CPU cost, which is one of the bottlenecks limiting the amount of data we can handle per store. Fixes #17609 Release note (performance improvement): Reduced CPU overhead of idle ranges The first five commits are from #24920; that PR should be merged and tested in isolation first. 25735: sql: fix null normalization r=RaduBerinde a=RaduBerinde The normalization rules are happy to convert `NULL::TEXT` to `NULL`. While both expressions evaluate to `DNull`, the `ResolvedType()` is different. It seems unsound for normalization to change the type. This issue is shown by trying to run a query containing `ARRAY_AGG(NULL::TEXT)` through distsql planning: by the time the distsql planner looks at it, the `NULL::TEXT` is just `DNull` (with the `Unknown` type) and the distsql planner cannot find the builtin. This change fixes the normalization rules by retaining the cast in this case. In general, any expression that statically evaluates to NULL gets a cast to the original expression type. The same is done in the opt execbuilder. Fixes #25724. Release note (bug fix): Fixed query errors in some cases involving a NULL constant that is cast to a specific type. Co-authored-by: Ben Darnell <ben@cockroachlabs.com> Co-authored-by: Radu Berinde <radu@cockroachlabs.com>
Before we implemented PreVote, we had various heuristics to decide when we should ask raft to campaign (bypassing the usual timeout). Since PreVote has reduced the cost of raft elections (by ensuring that a node that calls for an election it can't win doesn't disrupt its peers), we can get by with simpler logic.
In addition to simplifying the logic, this PR introduces a new campaign trigger when a range unquiesces. This is a prerequisite for getting rid of the TickQuiesced hack (which is disabled by default in this PR and will be removed in a future one).
Fixes #18365