Skip to content

raft: use last accepted term for log writes#127424

Merged
craig[bot] merged 4 commits into
cockroachdb:masterfrom
pav-kv:async-log-writes-acc-term
Jul 30, 2024
Merged

raft: use last accepted term for log writes#127424
craig[bot] merged 4 commits into
cockroachdb:masterfrom
pav-kv:async-log-writes-acc-term

Conversation

@pav-kv

@pav-kv pav-kv commented Jul 18, 2024

Copy link
Copy Markdown
Collaborator

This PR modifies the log write protocol to take advantage of the last accepted term tracked in raftLog/unstable.

Log entry writes are ordered by the last accepted term (unstable.term). Entries in the log can be overwritten only if this term is bumped (when we accept a log write from a higher term leader). If unstable.term is unchanged when we receive an acknowledgement from log storage, then the acknowledged entries don't have any overwrites in the write queue, and it is safe to remove them from unstable.

Part of #124440

@cockroach-teamcity

Copy link
Copy Markdown
Member

This change is Reviewable

@pav-kv pav-kv force-pushed the async-log-writes-acc-term branch from df876b2 to b26e6a2 Compare July 18, 2024 12:43
@pav-kv pav-kv force-pushed the async-log-writes-acc-term branch 2 times, most recently from 8deb73e to 4b57f5a Compare July 18, 2024 12:59
@pav-kv pav-kv requested review from nvb and sumeerbhola July 18, 2024 13:02
@pav-kv pav-kv marked this pull request as ready for review July 18, 2024 13:10
@pav-kv pav-kv requested a review from a team as a code owner July 18, 2024 13:10
@pav-kv pav-kv force-pushed the async-log-writes-acc-term branch from 4b57f5a to 71946ca Compare July 18, 2024 13:25
Comment thread pkg/raft/rawnode.go
m.Index = last.index
m.LogTerm = last.term
m.Index = rd.Entries[ln-1].Index
m.LogTerm = r.raftLog.accTerm()

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nvanbenschoten Could use m.Term field instead of m.LogTerm, if we don't want to overload the meaning of m.LogTerm. In MsgApp messages, m.LogTerm is the "entry term", while in this protocol it carries the "last accepted / leader term".

Maybe using m.Term for this would be more semantically consistent - m.Term is usually the leader sending a MsgApp, and MsgStorageAppend is essentially MsgApp forwarded to the storage on behalf of the latest leader with whom our log is consistent.

WDYT?

@pav-kv

pav-kv commented Jul 18, 2024

Copy link
Copy Markdown
Collaborator Author

@sumeerbhola The logic in stableTo method is equivalent to what we will do in RAC to no-op outdated admissions. We can use the MsgStorageAppendResp.LogTerm field for this (may need to put it to MsgStorageAppend too, to avoid the need of searching through the message Responses).

Comment thread pkg/raft/rawnode.go
// the time that the MsgStorageAppend was sent to the time that the response
// is received, indicating that no new leader has overwritten the log.
//
// However, this replaces a correctness problem with a liveness problem. If we

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We would still have this liveness problem if accTerm could be bumped without having new entries. But today, accTerm is either bumped on a snapshot (when we clear the unstable log), or on appending a non-empty slice of entries. In the former case, we don't need the last accepted term logic because there are no pending unstable entries. In the latter case, we send messages because len(Entries) > 0.

@nvb nvb left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 10 of 10 files at r1, 4 of 4 files at r2, 1 of 1 files at r3, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @pav-kv and @sumeerbhola)


pkg/raft/raft.go line 1045 at r1 (raw file):

	case m.Term == 0:
		// local message
	case m.Type == pb.MsgStorageAppendResp:

Should we stop setting Term on MsgStorageAppendResp messages?

EDIT: next commit 😄


pkg/raft/rawnode.go line 312 at r1 (raw file):

Previously, pav-kv (Pavel Kalinnikov) wrote…

@nvanbenschoten Could use m.Term field instead of m.LogTerm, if we don't want to overload the meaning of m.LogTerm. In MsgApp messages, m.LogTerm is the "entry term", while in this protocol it carries the "last accepted / leader term".

Maybe using m.Term for this would be more semantically consistent - m.Term is usually the leader sending a MsgApp, and MsgStorageAppend is essentially MsgApp forwarded to the storage on behalf of the latest leader with whom our log is consistent.

WDYT?

Agreed on switching to m.Term for the reason you mentioned — LogTerm is a reference to the term of a specific log entry, while Term is a reference to the context of the log append.

@pav-kv pav-kv left a comment

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @nvanbenschoten and @sumeerbhola)


pkg/raft/rawnode.go line 312 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Agreed on switching to m.Term for the reason you mentioned — LogTerm is a reference to the term of a specific log entry, while Term is a reference to the context of the log append.

I realized we need accTerm both in MsgStorageAppend and Resp, for RAC. But MsgStorageAppend already uses the Term field to carry HardState.Term.

A couple of workarounds:

  1. Use LogTerm, but it overrides its meaning, as we discussed.
  2. Only use Term in MsgStorageAppendResp, and parse it out from MsgStorageAppend.Responses. This is also a bit inconsistent: Term is used for different things in the message and its reply.

Both approaches have a similar flaw, but maybe (1) is a bit more consistent, at least in scope of the message/reply pair; and easier to use. How about sticking to it for now?

Longer-term, the MsgStorageAppend/Resp could be replaced by a structured Ready-like API. I previously played with this idea here, but there is no urgency to pursue this now.

@sumeerbhola sumeerbhola requested a review from nvb July 22, 2024 14:41

@sumeerbhola sumeerbhola left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to avoid the need of searching through the message Responses).

can you point me to that code? I don't quite understand.

Regarding outdated admissions, we are tracking individual entries for admission, and we can easily change waitingForAdmissionState to wipeout stale entries in waitingForAdmissionState.add. So we have multiple options regarding how to ignore outdated admissions. We could definitely also do something like you outlined here.
I believe we partially discussed in https://cockroachlabs.slack.com/archives/C06UFBJ743F/p1720029427647799?thread_ts=1720023953.470729&cid=C06UFBJ743F

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @nvanbenschoten)

@pav-kv pav-kv force-pushed the async-log-writes-acc-term branch from b07ee8c to a1b903a Compare July 30, 2024 12:10

@pav-kv pav-kv left a comment

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you point me to that code?

We don't have this code yet. The 2 messages are: MsgStorageAppend and MsgStorageAppendResp. To implement the no-op for admissions, we need to extract the leader Term from MsgStorageAppend (so that we can attach it to the admission metadata, and, when it comes back admitted, we can compare it with the one tracked in waitingForAdmissionState).

We could extract this term by scanning MsgStorageAppend.Responses field and finding the MsgStorageAppendResp (which contains the term). Or we could attach the term to the MsgStorageAppend directly to avoid this unergonomic scan. Added the last commit to this extent.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @nvanbenschoten)

@nvb nvb left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 2 of 7 files at r4, 4 of 4 files at r5, 1 of 1 files at r6, 3 of 3 files at r7, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @pav-kv)


pkg/raft/rawnode.go line 224 at r7 (raw file):

	if ln := len(rd.Entries); ln != 0 {
		m.LogTerm = r.raftLog.accTerm()
		m.Index = rd.Entries[ln-1].Index

nit: assign these fields in the same order that you do down below.

pav-kv added 4 commits July 30, 2024 16:19
This commit modifies the log write protocol to take advantage of the
last accepted term tracked in raftLog/unstable.

Epic: none
Release note: none
Epic: none
Release note: none
Epic: none
Release note: none
@pav-kv pav-kv force-pushed the async-log-writes-acc-term branch from a1b903a to 4beb96b Compare July 30, 2024 15:19

@pav-kv pav-kv left a comment

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @pav-kv)


pkg/raft/rawnode.go line 224 at r7 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

nit: assign these fields in the same order that you do down below.

done

@pav-kv

pav-kv commented Jul 30, 2024

Copy link
Copy Markdown
Collaborator Author

bors r=nvanbenschoten

@craig

craig Bot commented Jul 30, 2024

Copy link
Copy Markdown
Contributor

@craig craig Bot merged commit 8cbc870 into cockroachdb:master Jul 30, 2024
@pav-kv pav-kv deleted the async-log-writes-acc-term branch July 30, 2024 16:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants