Skip to content

Conversation

@nvb
Copy link
Contributor

@nvb nvb commented Jan 7, 2025

Informs #133885.
Informs #133216.

This commit adjusts Replica.tick to check RawNode.HasReady when deciding whether a Ready struct should be processed (Replica.handleRaftReady) from the ticked replica. Previously, we would unconditionally pass through Replica.handleRaftReady after a tick, even if doing so was unnecessary and we knew ahead of time that we would hit this early return:

This was mostly harmless in the past with quiescence and a cheap short-circuit in handleRaftReady, but has become more problematic recently for the following three reasons:

  1. Leader leases don't permit quiescence. We have some work planned to allow follower replicas to quiesce under a leader lease (see kv: local, StoreLiveness-based quiescence for Leader leases #133885), but even after that work, we likely won't allow quiescence for the leaseholder. This means that
  2. Replica.handleRaftReady has been getting more expensive, even in the no-op case. This is because of work done in Replica.updateProposalQuotaRaftMuLocked and work done for replica admission control v2 (RACv2).
  3. At some point in the future, we would like to revert 5e6698 and drop the raft tick interval back down to something like 200ms. This will allow us to address raft: separate out ElectionJitterTicks from ElectionTick #133576 and further reduce failover times.

Release note: None

@nvb nvb added the A-leader-leases Related to the introduction of leader leases label Jan 7, 2025
@nvb nvb requested a review from miraradeva January 7, 2025 02:03
@nvb nvb requested a review from a team as a code owner January 7, 2025 02:03
@blathers-crl
Copy link

blathers-crl bot commented Jan 7, 2025

It looks like your PR touches production code but doesn't add or edit any test code. Did you consider adding tests to your PR?

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@cockroach-teamcity
Copy link
Member

This change is Reviewable

Informs cockroachdb#133885.
Informs cockroachdb#133216.

This commit adjusts `Replica.tick` to check `RawNode.HasReady` when
deciding whether a `Ready` struct should be processed
(`Replica.handleRaftReady`) from the ticked replica. Previously, we
would unconditionally pass through `Replica.handleRaftReady` after a
tick, even if doing so was unnecessary and we knew ahead of time that
we would hit this early return: https://github.com/cockroachdb/cockroach/blob/57aab736c34ce5dc7988bd53e0604fde48cef441/pkg/kv/kvserver/replica_raft.go#L1025.

This was mostly harmless in the past with quiescence and a cheap short-circuit
in `handleRaftReady`, but has become more problematic recently for the following
three reasons:
1. Leader leases don't permit quiescence. We have some work planned to allow
   follower replicas to quiesce under a leader lease (see cockroachdb#133885), but even
   after that work, we likely won't allow quiescence for the leaseholder. This
   means that
2. `Replica.handleRaftReady` has been getting more expensive, even in the no-op
   case. This is because of work done in `Replica.updateProposalQuotaRaftMuLocked`
   and work done for replica admission control v2 (RACv2).
3. At some point in the future, we would like to revert 5e6698 and drop the raft
   tick interval back down to something like 200ms. This will allow us to
   address cockroachdb#133576 and further reduce failover times.

Release note: None
@nvb nvb force-pushed the nvanbenschoten/readyAfterTick branch from 6fdca0e to 072e061 Compare March 6, 2025 20:18
@nvb
Copy link
Contributor Author

nvb commented Mar 6, 2025

@pav-kv I updated this PR with the additional condition that we discussed. I confirmed that it was failing reliably on TestFlowControlCrashedNodeV2 with a 2 second sleep after h.comment(-- (Crashing n2)), and that it passes reliably after the len(unreachableRemotes) > 0 condition.

Copy link
Collaborator

@pav-kv pav-kv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nvanbenschoten Great, thanks a lot. I'm closing this PR, and will review the other one.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @miraradeva)

@pav-kv pav-kv closed this Mar 6, 2025
@pav-kv
Copy link
Collaborator

pav-kv commented Mar 6, 2025

Oups, sorry, wrong PR :)

@pav-kv pav-kv reopened this Mar 6, 2025
defer r.raftMu.Unlock()
defer func() {
if exists && err == nil {
if processReady && err == nil {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The MaybeSendPingsRaftMuLocked call can trigger raft messages. Wonder if we should check HasReady after it, to capture these potentially new messages.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously, this line relied on the fact that we return true unconditionally (except when we're quiesced). Logically, this line is not checking "has ready", but rather checks that we passed below line 1456.

So it should be something like:

if noOp || err != nil {
	return
}
MaybeSendPings()
if !processReady { // have we accounted for ReportUnreachable already?
	processReady = HasReady()
}

Copy link
Collaborator

@pav-kv pav-kv Mar 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potentially there is a less ugly way than this defer. At the very least, it can be moved below line 1456 (caution: I think it doesn't want Replica.mu held).

// group has Ready to process.
processReady = r.mu.internalRaftGroup.HasReady() ||
// Also do so if we just marked a replica as unreachable. This can be
// helpful for RACv2, which responds to changes in the Progress.StateType of
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Progress.State

for remoteReplica := range unreachableRemotes {
if bypassFn != nil && bypassFn(remoteReplica) {
continue
}
Copy link
Collaborator

@pav-kv pav-kv Mar 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is another call to ReportUnreachable in updatePausedFollowersLocked (and there are no more ReportUnreachables). Should we capture that in processReady flag as well, for completeness?

// Also do so if we just marked a replica as unreachable. This can be
// helpful for RACv2, which responds to changes in the Progress.StateType of
// follower replicas, despite that not being a Ready event.
len(unreachableRemotes) > 0
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A much nicer fix than #140979 (which is now reverted) is #142652. However, the latter now happens inside raft so there would be no unreachableRemotes here in this case. Potentially, this makes something like #140585 necessary again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-leader-leases Related to the introduction of leader leases

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants