Skip to content

raft: advance commit index safely#122690

Merged
craig[bot] merged 1 commit into
cockroachdb:masterfrom
lyang24:raft_acc_term
May 23, 2024
Merged

raft: advance commit index safely#122690
craig[bot] merged 1 commit into
cockroachdb:masterfrom
lyang24:raft_acc_term

Conversation

@lyang24

@lyang24 lyang24 commented Apr 19, 2024

Copy link
Copy Markdown
Contributor

This change makes the commit index advancement in handleHeartbeat safe. Previously, a follower would attempt to update the commit index to whichever was sent in the MsgHeartbeat message. Out-of-bound indices would crash the node.

It is always safe to advance a commit index if the follower's log is "in sync" with the leader, i.e. when its log is guaranteed to be a prefix of the leader's log. This is always true if the term of last entry in the log matches the leader's term, otherwise this guarantee is established when the first MsgApp append message from the leader succeeds.

At the moment, the leader will never send a commit index that exceeds the follower's log size. However, this may change in future. This change is a defence-in-depth.

The newly added accTerm field will be used for other safety checks in the future, for example to establish that overriding a suffix of entries in raftLog is safe.

Part of #122100

@blathers-crl

blathers-crl Bot commented Apr 19, 2024

Copy link
Copy Markdown

Thank you for contributing to CockroachDB. Please ensure you have followed the guidelines for creating a PR.

My owl senses detect your PR is good for review. Please keep an eye out for any test failures in CI.

I was unable to automatically find a reviewer. You can try CCing one of the following members:

  • A person you worked with closely on this PR.
  • The person who created the ticket, or a CRDB organization member involved with the ticket (author, commenter, etc.).
  • Join our community slack channel and ask on #contributors.
  • Try find someone else from here.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@blathers-crl blathers-crl Bot added O-community Originated from the community X-blathers-untriaged blathers was unable to find an owner labels Apr 19, 2024
@cockroach-teamcity

Copy link
Copy Markdown
Member

This change is Reviewable

@lyang24 lyang24 left a comment

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained


pkg/raft/raft.go line 897 at r1 (raw file):

	}
	// the leader's log is consistent with itself.
	r.accTerm = r.Term

let me know if this field fits better in the raftLog struct


pkg/raft/raft_test.go line 1158 at r1 (raw file):

		sm := newTestRaft(1, 5, 1, storage)
		// need to simulate the follower is consistent with leader.
		sm.accTerm = tt.accTerm

need to assign to simulate follower in sync otherwise follower accTerm is 0.
In this case commit index will not moved by heartbeat unless the follow received entry or snapshot

@blathers-crl

blathers-crl Bot commented Apr 22, 2024

Copy link
Copy Markdown

Thank you for updating your pull request.

My owl senses detect your PR is good for review. Please keep an eye out for any test failures in CI.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@lyang24 lyang24 left a comment

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained


pkg/raft/rawnode_test.go line 744 at r2 (raw file):

	rawNode, err := NewRawNode(cfg)
	rawNode.raft.accTerm = 1

This passes the tests but i am not sure if this is the right thing to do ideally when a node become leader the accterm will be assigned as the current term of the leader, but it seems like we do not have election here?

@yuzefovich yuzefovich removed the X-blathers-untriaged blathers was unable to find an owner label May 2, 2024
@yuzefovich yuzefovich requested review from a team May 2, 2024 04:22
@kvoli kvoli requested a review from pav-kv May 2, 2024 04:44
@lyang24 lyang24 marked this pull request as ready for review May 8, 2024 17:26
Comment thread pkg/raft/raft.go Outdated
Comment thread pkg/raft/raft.go
Comment thread pkg/raft/raft.go Outdated

@pav-kv pav-kv left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @lyang24)


pkg/raft/raft.go line 897 at r1 (raw file):

Previously, lyang24 (Lanqing Yang) wrote…

let me know if this field fits better in the raftLog struct

It does, but including it into raftLog requires a bit larger refactoring. How about a TODO in the accTerm field comment?


pkg/raft/raft_test.go line 1158 at r1 (raw file):

Previously, lyang24 (Lanqing Yang) wrote…

need to assign to simulate follower in sync otherwise follower accTerm is 0.
In this case commit index will not moved by heartbeat unless the follow received entry or snapshot

This should not be needed when you make the newRaft func initialize accTerm to the last entry term.


pkg/raft/rawnode_test.go line 744 at r2 (raw file):

Previously, lyang24 (Lanqing Yang) wrote…

This passes the tests but i am not sure if this is the right thing to do ideally when a node become leader the accterm will be assigned as the current term of the leader, but it seems like we do not have election here?

This should not be needed when you make the newRaft func initialize accTerm to the last entry term.

@lyang24 lyang24 left a comment

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review, I have addressed the feedbacks please take another look.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @pav-kv)


pkg/raft/raft.go line 897 at r1 (raw file):

Previously, pav-kv (Pavel Kalinnikov) wrote…

It does, but including it into raftLog requires a bit larger refactoring. How about a TODO in the accTerm field comment?

added todo on the field


pkg/raft/raft.go line 348 at r2 (raw file):

Previously, pav-kv (Pavel Kalinnikov) wrote…

A more elaborate comment would be good, e.g.:

// accTerm is the term of the leader whose append was accepted into the log
// last. The log is a prefix of the accTerm's leader log.
//
// Invariant: raftLog.lastTerm() <= accTerm <= Term
//
// NB: the log can be partially or fully compacted. When we say "log" above,
// we logically include all the entries that were the pre-image of a snapshot,
// as well as the entries that are still physically in the log.

Done.


pkg/raft/raft.go line 349 at r2 (raw file):

Previously, pav-kv (Pavel Kalinnikov) wrote…

We should initialize accTerm to be last entry term, somewhere in newRaft func. If we don't, then a restarted node won't be able to advance commit index immediately, it will need an append message first, to get appTerm up-to-date.

It would also be good to document the invariant (see above).

Done.


pkg/raft/raft.go line 1715 at r2 (raw file):

Previously, pav-kv (Pavel Kalinnikov) wrote…

Rename leaderTerm to accTerm in this comment? (here and below)

Done.

@lyang24 lyang24 requested a review from pav-kv May 9, 2024 15:54

@pav-kv pav-kv left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained

@pav-kv pav-kv requested a review from nvb May 10, 2024 12:10
@pav-kv

pav-kv commented May 10, 2024

Copy link
Copy Markdown
Collaborator

LGTM, but let's have a second review.
@nvanbenschoten Could you PTAL?

Comment thread pkg/raft/raft.go
lastID := r.raftLog.lastEntryID()
// Invariant: accTerm needs to be initialized as the term of last raft entry
// to make sure restarted nodes can advance commit index immediately.
r.accTerm = lastID.term

@pav-kv pav-kv May 12, 2024

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note for reviewers. There is a case when the commit index won't be able to advance by a heartbeat message right after a restart, specifically when the lastID.term < HardState.Term. This is okay though, because the next accepted MsgApp message from the Term leader (which is guaranteed, because this follower is behind the leader on the log) will bump accTerm to the leader's term, and advance the commit index.

For complete snappiness, we need #122446, i.e. persist accTerm to the HardState - this will make the "log is consistent with the leader" signal survive restarts. We need this accTerm persisted for other reasons too.

Comment thread pkg/raft/raft.go Outdated
r.logger.Infof("%x [commit: %d] restored snapshot [index: %d, term: %d]",
r.id, r.raftLog.committed, sindex, sterm)
// log is now consistent with leader
r.accTerm = sterm

@pav-kv pav-kv May 12, 2024

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this needs to be m.Term, same as in the append case. We're saying here that the state of the log/snapshot is now consistent with the leader.

I suspect sterm == m.Term (because the leader always sends a snapshot including at least one of its entries), but semantically we should better use m.Term (in case the snapshot is sent by someone else, or is outdated).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed it should be the term of the raft message instead of term on the snapshot.

in case the snapshot is sent by someone else, or is outdated
just curious from reading the raft paper if its outdated or its not sent from a quorum leader the restore will fail right?

@pav-kv pav-kv May 12, 2024

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the snapshot was sent by the leader with an outdated Term (or on behalf of it), the snapshot will be ignored.

However, we can't generally assume that the snapshot (index, term) has the same Term. Say, the leader's Term is 2, and log entries terms are [1 1 1 2 2 2], and the leader has only applied/committed the first 3 entries. Then the snapshot will be for (index=3, term=1) @ Term=2.

@nvb nvb requested a review from pav-kv May 13, 2024 20:35

@nvb nvb left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 3 files at r3, 1 of 1 files at r4, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @lyang24 and @pav-kv)


pkg/raft/raft.go line 434 at r3 (raw file):

which is guaranteed, because this follower is behind the leader on the log

Why is the follower guaranteed to be behind the leader on the log?


pkg/raft/raft.go line 349 at r4 (raw file):

	lead uint64
	// accTerm is the term of the leader whose append was accepted into the log
	// last. The log is a prefix of the accTerm's leader log.

We should clarify that a rejected MsgApp does not update accTerm.


pkg/raft/raft.go line 1729 at r4 (raw file):

	// the log unconditionally, but it can only be done if m.Term > accTerm.
	if m.Term == r.accTerm {
		r.raftLog.commitTo(min(m.Commit, r.raftLog.lastIndex()))

@pav-kv This is going to avoid a panic in raftLog.commitTo after a follower's loss of durability which causes its raft log to regress across a restart. That's a good thing. But what is going to happen once the follower doesn't crash? Will the leader ever recognize that the follower's log has lost entries (which from its perspective, it just acknowledged) and go back to re-append those?

@pav-kv pav-kv requested a review from nvb May 14, 2024 11:03

@pav-kv pav-kv left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @lyang24 and @nvanbenschoten)


pkg/raft/raft.go line 434 at r3 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

which is guaranteed, because this follower is behind the leader on the log

Why is the follower guaranteed to be behind the leader on the log?

This just describes a scenario in which lastID.term < HardState.Term at restart. This means the leader at HardState.Term (or any higher term) hasn't replicated the log to this follower all the way up to the Term entries, so we will soon see an append.


pkg/raft/raft.go line 1729 at r4 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

@pav-kv This is going to avoid a panic in raftLog.commitTo after a follower's loss of durability which causes its raft log to regress across a restart. That's a good thing. But what is going to happen once the follower doesn't crash? Will the leader ever recognize that the follower's log has lost entries (which from its perspective, it just acknowledged) and go back to re-append those?

I think this depends on the leader implementation. At some point it will see a rejected MsgApp at Match index, so it will see an urge to regress Match. Regressing Match means potentially uncommitting entries, so it's not safe. I think it should panic or put the range to a brick state. It may also enter a busy loop in which it does not regress Match, and sends appends post-Match which never bring the follower to an up-to-date state. It will only heal when the leader changes, but there is still risk of uncommitting entries somewhere else (explicitly, or, worse, implicitly).

Maybe there are some cases when regressing Match can be survived. Particularly, if a regressed Match does not cause the commit index regression (because there is still a quorum of Match-es at >= commit index), then maybe we're good. But we need to be extremely cautious here.

All-in-all, loss of durability is Byzantine, we can't make Raft fully tolerant to that, or correctly detect it in all cases.

@pav-kv pav-kv left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @lyang24 and @nvanbenschoten)


pkg/raft/raft.go line 1729 at r4 (raw file):

Previously, pav-kv (Pavel Kalinnikov) wrote…

I think this depends on the leader implementation. At some point it will see a rejected MsgApp at Match index, so it will see an urge to regress Match. Regressing Match means potentially uncommitting entries, so it's not safe. I think it should panic or put the range to a brick state. It may also enter a busy loop in which it does not regress Match, and sends appends post-Match which never bring the follower to an up-to-date state. It will only heal when the leader changes, but there is still risk of uncommitting entries somewhere else (explicitly, or, worse, implicitly).

Maybe there are some cases when regressing Match can be survived. Particularly, if a regressed Match does not cause the commit index regression (because there is still a quorum of Match-es at >= commit index), then maybe we're good. But we need to be extremely cautious here.

All-in-all, loss of durability is Byzantine, we can't make Raft fully tolerant to that, or correctly detect it in all cases.

Maybe one way to put things: if Match regresses (durability is lost), this should be equivalent to a dead node. If making this node equivalent to dead doesn't cause LoQ, we're good. After making it "dead", we should reconfigure the Range, and maybe somehow re-add this replica with another ID. The fact that it's got most of the log already can probably be utilized to quickly catch it up.

craig Bot pushed a commit that referenced this pull request May 20, 2024
124225: raft: add match index safety check on followers r=pav-kv a=pav-kv

This PR adds a safety check which ensures that a follower's log state does not contradict to the leader's `Match` index.

If the leader has a non-zero `Match` for the follower, it believes that the follower's log a) is consistent with this leader, and b) has all entries up to the `Match` index persisted. If the follower's `Term <= leader.Term` and (b) turns out to not be the case, this means the follower has lost part of the durable state (for example, fsync didn't work properly in the system, and the node got restarted). This is a safety violation, so this follower panics to avoid spreading the harm.

In the future, it would be better to "quarantine" this follower, and surface this information to the operator without crashing the node.

Touches #122690
Epic: none
Release note: none

Co-authored-by: Pavel Kalinnikov <pavel@cockroachlabs.com>

@pav-kv pav-kv left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lyang24 the blocking PR has merged. Please address Nathan's comments, and it should be safe to merge this one.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @lyang24)


pkg/raft/raft.go line 1729 at r4 (raw file):

Previously, pav-kv (Pavel Kalinnikov) wrote…

Added #124225 to retain the panic behaviour if the follower breaks its durability promises.

The PR has merged. It should be safe to make the change now.

@lyang24

lyang24 commented May 21, 2024

Copy link
Copy Markdown
Contributor Author

@lyang24 the blocking PR has merged. Please address Nathan's comments, and it should be safe to merge this one.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @lyang24)

pkg/raft/raft.go line 1729 at r4 (raw file):

Previously, pav-kv (Pavel Kalinnikov) wrote…
The PR has merged. It should be safe to make the change now.

Hi I have rebased to master will merge when tests passes and will work on the follow up prs.

@lyang24

lyang24 commented May 21, 2024

Copy link
Copy Markdown
Contributor Author

bors r+

@craig

craig Bot commented May 21, 2024

Copy link
Copy Markdown
Contributor

🕐 Waiting for PR status (GitHub check) to be set, probably by CI. Bors will automatically try to run when all required PR statuses are set.

@pav-kv

pav-kv commented May 21, 2024

Copy link
Copy Markdown
Collaborator

bors cancel

@pav-kv

pav-kv commented May 21, 2024

Copy link
Copy Markdown
Collaborator

@lyang24 there are some outstanding comments from @nvanbenschoten, could you please address them first?

@nvanbenschoten Do you want to take a final pass / LGTM?

@nvb nvb requested a review from pav-kv May 21, 2024 14:53

@nvb nvb left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 1 files at r5, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @lyang24 and @pav-kv)


pkg/raft/raft.go line 434 at r3 (raw file):

Previously, pav-kv (Pavel Kalinnikov) wrote…

This just describes a scenario in which lastID.term < HardState.Term at restart. This means the leader at HardState.Term (or any higher term) hasn't replicated the log to this follower all the way up to the Term entries, so we will soon see an append.

How does that relate to "make sure restarted nodes can advance commit index immediately"? Is the point that you're making that in the node restart case, lastID.term will be correct, so in that case this allows it to advance its commit index immediately instead of waiting for ... something?

What is that "something" by the way? Are there cases where the restarted follower is caught up enough to not need to accept any new MsgApps?

If the term changed, the follower cannot be caught up on its log because a term change is followed by an empty entry in the log. So that is enough to require a MsgApp which will eventually set accTerm and will eventually inform the follower of the commit index.

What if the term did not change and the leader has not proposed anything new since the follower restarted. Will accTerm ever be set to the leader's current term? Is this also relying on the empty entry in the log to be from the leader's current term, such that if we're in this state, lastID.term must be the leader's current term?

@pav-kv pav-kv left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @lyang24)


pkg/raft/raft.go line 434 at r3 (raw file):

What is that "something" by the way?

A necessary condition to advance commit index is:
a) the entry at this index is committed (so the leader who sent the message has learned about it)
b) our log ends at least at this index
c) our log matches the leader's up to this index

(a) and (b) are obvious. A sufficient condition for (c) is leader.Term == accTerm, meaning our log is entirely a prefix of the leader's. Weaker conditions are possible here, if the follower has more info about the leader's log (e.g. the leader includes a sample of index,term pairs in the request), but let's ignore those.

Are there cases where the restarted follower is caught up enough to not need to accept any new MsgApps?

If we persist accTerm across restarts, we don't need need any new MsgApps. Since we don't yet persist it, what is the safe and the best value we can initialize it to? We need accTerm to be such that our log is a prefix of the accTerm leader. The easy way to guarantee this is to initialize accTerm to lastEntry.term. If we're lucky and lastEntry.term == leader.Term, we initialized to exactly the pre-restart accTerm, otherwise we're at a conservatively low value, so our commit index will start moving only at first MsgApp.

If the term changed, the follower cannot be caught up on its log because a term change is followed by an empty entry in the log. So that is enough to require a MsgApp which will eventually set accTerm and will eventually inform the follower of the commit index.

Yes.

What if the term did not change and the leader has not proposed anything new since the follower restarted. Will accTerm ever be set to the leader's current term? Is this also relying on the empty entry in the log to be from the leader's current term, such that if we're in this state, lastID.term must be the leader's current term?

Today, this does rely on the dummy entry (more broadly, on the fact that a leader will append at least one entry). But generally, this relies on the fact that a leader pings follower until it knows that the follower's accTerm == Term (as if there was an invisible entry at the start of the leader's term).

@pav-kv pav-kv left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @lyang24)


pkg/raft/raft.go line 434 at r3 (raw file):

Previously, pav-kv (Pavel Kalinnikov) wrote…

What is that "something" by the way?

A necessary condition to advance commit index is:
a) the entry at this index is committed (so the leader who sent the message has learned about it)
b) our log ends at least at this index
c) our log matches the leader's up to this index

(a) and (b) are obvious. A sufficient condition for (c) is leader.Term == accTerm, meaning our log is entirely a prefix of the leader's. Weaker conditions are possible here, if the follower has more info about the leader's log (e.g. the leader includes a sample of index,term pairs in the request), but let's ignore those.

Are there cases where the restarted follower is caught up enough to not need to accept any new MsgApps?

If we persist accTerm across restarts, we don't need need any new MsgApps. Since we don't yet persist it, what is the safe and the best value we can initialize it to? We need accTerm to be such that our log is a prefix of the accTerm leader. The easy way to guarantee this is to initialize accTerm to lastEntry.term. If we're lucky and lastEntry.term == leader.Term, we initialized to exactly the pre-restart accTerm, otherwise we're at a conservatively low value, so our commit index will start moving only at first MsgApp.

If the term changed, the follower cannot be caught up on its log because a term change is followed by an empty entry in the log. So that is enough to require a MsgApp which will eventually set accTerm and will eventually inform the follower of the commit index.

Yes.

What if the term did not change and the leader has not proposed anything new since the follower restarted. Will accTerm ever be set to the leader's current term? Is this also relying on the empty entry in the log to be from the leader's current term, such that if we're in this state, lastID.term must be the leader's current term?

Today, this does rely on the dummy entry (more broadly, on the fact that a leader will append at least one entry). But generally, this relies on the fact that a leader pings follower until it knows that the follower's accTerm == Term (as if there was an invisible entry at the start of the leader's term).

So "make sure restarted nodes can advance commit index immediately" is not entirely accurate here. Without accTerm persistence, it is best effort (but, crucially, still correct).

@nvb nvb requested a review from pav-kv May 21, 2024 17:35

@nvb nvb left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @lyang24 and @pav-kv)


pkg/raft/raft.go line 434 at r3 (raw file):

Previously, pav-kv (Pavel Kalinnikov) wrote…

So "make sure restarted nodes can advance commit index immediately" is not entirely accurate here. Without accTerm persistence, it is best effort (but, crucially, still correct).

Thanks for explaining, this makes sense. I think what I'm asking for then is more words in this comment about why we bother setting r.accTerm = lastID.term on restart, what cases that helps with, and what cases it does not.

I'd encourage more commentary around the call to r.raftLog.commitTo as well. Specifically, it would be useful to reason through why dropping a heartbeat's commit index contribution in either of the following two cases still ensures that the follower eventually learns of the latest commit index.

  1. the case where m.Term != r.accTerm
  2. the case where r.raftLog.lastIndex() < m.Commit

@lyang24 lyang24 force-pushed the raft_acc_term branch 2 times, most recently from 794d435 to f3b4906 Compare May 22, 2024 05:14

@pav-kv pav-kv left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten)


pkg/raft/raft.go line 434 at r3 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Thanks for explaining, this makes sense. I think what I'm asking for then is more words in this comment about why we bother setting r.accTerm = lastID.term on restart, what cases that helps with, and what cases it does not.

I'd encourage more commentary around the call to r.raftLog.commitTo as well. Specifically, it would be useful to reason through why dropping a heartbeat's commit index contribution in either of the following two cases still ensures that the follower eventually learns of the latest commit index.

  1. the case where m.Term != r.accTerm
  2. the case where r.raftLog.lastIndex() < m.Commit

Done


pkg/raft/raft.go line 349 at r4 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

We should clarify that a rejected MsgApp does not update accTerm.

Done

@nvb nvb requested a review from pav-kv May 23, 2024 16:37

@nvb nvb left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm_strong:

Reviewed 1 of 1 files at r6, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @lyang24 and @pav-kv)


pkg/raft/raft.go line 452 at r6 (raw file):

	// higher accTerm (ideally, matching the current leader Term) gives us more
	// information about the log, and then allows bumping its commit index sooner
	// than the next MsgApp arrives.

"sooner than when the ..."?


pkg/raft/raft.go line 1760 at r6 (raw file):

The m.Term leader indicates to us that

This can be interpreted a few different ways. Consider "The m.Term leader is indicating to us through this heartbeat message that ..."

This change makes the commit index advancement in handleHeartbeat safe.
Previously, a follower would attempt to update the commit index to
whichever was sent in the MsgHeartbeat message. Out-of-bound indices
would crash the node.

It is always safe to advance a commit index if the follower's log is "in
sync" with the leader, i.e. when its log is guaranteed to be a prefix of
the leader's log. This is always true if the term of last entry in the
log matches the leader's term, otherwise this guarantee is established
when the first MsgApp append message from the leader succeeds.

At the moment, the leader will never send a commit index that exceeds
the follower's log size. However, this may change in future. This change
is a defence-in-depth.

The newly added accTerm field will be used for other safety checks in the
future, for example to establish that overriding a suffix of entries in
raftLog is safe.

Epic: None
Release note: None

@lyang24 lyang24 left a comment

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @nvanbenschoten and @pav-kv)


pkg/raft/raft.go line 452 at r6 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

"sooner than when the ..."?

Done


pkg/raft/raft.go line 1760 at r6 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

The m.Term leader indicates to us that

This can be interpreted a few different ways. Consider "The m.Term leader is indicating to us through this heartbeat message that ..."

done

@lyang24

lyang24 commented May 23, 2024

Copy link
Copy Markdown
Contributor Author

bors r+

@craig

craig Bot commented May 23, 2024

Copy link
Copy Markdown
Contributor

@craig craig Bot merged commit d6da6c6 into cockroachdb:master May 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

O-community Originated from the community

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants