tla-plus: add spec for raft leader election #123189

nvanbenschoten · 2024-04-29T05:22:06Z

This commit adds a new TLA+ specification (expressed in PlusCal) called RaftLeader. RaftLeader is a specification for the Raft leader election protocol, as described in section 3.4 of the Raft thesis¹. The spec models multiple terms, node restarts, and the Pre-Vote election phase extension.

The central safety property of leader election is exclusivity. In Raft, this is defined as the Election Safety Property: "at most one leader can be elected in a given term". This property is defined as an invariant of the spec.

My goal is to use this spec to verify the correctness of leader-leases. We can start with this simpler spec and then extend it to include notions of store liveness and of leases associated with leaders. From there, we can define the lease disjointness invariant and prove that it holds.

Epic: None
Release note: None

https://web.stanford.edu/~ouster/cgi-bin/papers/OngaroPhD.pdf ↩

This commit updates the ParallelCommit spec with a .cfg config file which mirrors its Toolbox .launch file. This allows the spec to be checked with the command-line version of the TLC model checker. As a result, this allows the spec to play well with the VSCode TLA+ extension, which is more popular and easier to use than the original eclipse-based TLA+ Toolbox. Epic: None Relase note: None

cockroach-teamcity · 2024-04-29T05:22:16Z

This change is

nvanbenschoten · 2024-04-29T18:52:30Z

Experience report

Background

We started working with TLA+ and formal model checking at Cockroach Labs back in 2019. @tbg and @petermattis kicked this off with some background conversations and then this email thread. We had Hillel Wayne come and host a 1-week intensive workshop on TLA+ and PlusCal (an imperative language which transpiles to TLA+), much of which was derived from his excellent Practical TLA+: Planning Driven Development book. That was critical and near-necessary for adoption of TLA+ at CRL, because the available documentation on the topic online is severely lacking (unless you enjoy reading 100-page white papers full of mathematical notation).

By the end of the workshop, we had a couple of working TLA+ specs, an understanding of how formal methods might contribute to the development of CockroachDB in the future, and desires for it to play a role.

We haven't done much with TLA+ since 2021, but that desire is still there. Since we're starting to work with the Raft algorithm directly again and have desires to extend it to support leader leases (design in progress and not ready for broader consumption quite yet), it made sense to start thinking about formal verification again.

Experience

To get reacquainted with TLA+ and PlusCal, I started by skimming the Practical TLA+ book again. It's really a great resource, especially compared to everything else written on the topic, so I would suggest that those who want to work with TLA+ familiarize themselves with the book (so that you don't need to read Specifying Systems).

The other resources I found useful were:

https://learntla.com/index.html
https://lamport.azurewebsites.net/pubs/pluscal.pdf
re-reading the parallel commits spec
ChatGPT for simple questions, though it was less helpful than I had hoped in writing large chunks of the specification

In the past, we had worked with the Eclipse-based TLA+ Toolkit. I decided to avoid that this time, as it's fairly clunky and hides configuration behind a GUI. Instead, I opted for the VSCode TLA+ extension, which is slim but works well and is built directly on top of the official TLA+ command line tools. I would recommend this to others who want to work with TLA+/PlusCal.

Results

As detailed above, my goal was to write a spec to verify the correctness (specifically the lease disjointness invariant safety property) of our proposed leader-lease extension to Raft. To do so, I started by writing this simpler spec of the existing Raft leader election algorithm.

The process took around 10 hours. That is to be expected, as there is a large barrier to entry to writing these specifications which authors should be prepared for. Don't get discouraged though. The initial investment may be more than one might expect, but the ease of extension once a spec is written is equally as surprising in the other direction.

The other thing to be aware of is that the process of writing the spec is half the battle. While TLA+ does support exhaustive model checking with TLC and it works just as well as promised, I have found that the process of writing the spec is the most valuable part of the process. Doing so forces one to sit and think about an algorithm, often revealing bugs before the model checker even gets to run. It's a really valuable learning experience which doesn't have many direct equivalents.

With that all said, here is what I learned while writing this spec (again, without starting the extension to add leader-leases):

PreVote bug

I believe that there is a bug in the existing PreVote algorithm in etcd/raft (and our new fork) that can allow a pre-vote candidate to win an pre-vote election that it otherwise shouldn't. Specifically, pre-vote elections are called with the term following the current term, which is carried by MsgPreVote messages. This term is also carried in non-rejected MsgPreVoteResp messages. Pre-vote candidates know to handle these non-rejected MsgPreVoteResp messages with a future term. However, they have no special handling of non-rejected MsgPreVoteResp messages with their current term. These messages correspond to a pre-vote election held at the previous term, but it looks like they can be used to win a pre-election at the current term.

For example, I think the following order of operations is allowed:

1. n3 is the leader at term 10
2. n1 calls a pre-vote election at term 10, sends MsgPreVote{Term: 11} to n2 and n3
3. n2 responds with a successful MsgPreVoteResp{Term: 11, Reject: false}, but the message is delayed
4. n1 never wins the pre-election
5. eventually the term advances to term 11 and n2 takes over as leader
6. n1 calls a pre-vote election at term 11, sends MsgPreVote{Term: 12} to n2 and n3
7. n2 and n3 both reject these, however, before they can respond...
8. the original message from n3 is delivered, n1 thinks it won the pre-vote election
9. n1 calls a disruptive election for term 12

The consequences for this bug are not overly severe, because a pre-vote election is a best-effort attempt to prevent a disruptive leader election. Bugs in pre-vote do not compromise safety (i.e. leader exclusivity). Still, we should fix this.

Leader-lease overlapping quorums

In the leader-lease design, we rely on overlapping quorums between followers that support a leader and followers that are calling an election to ensure that if a leader has support from a quorum of followers, it will not be replaced until after that support is withdrawn. Up until now, I had been thinking that the overlapping quorums to consider were the pre-vote election and the normal election. We would then build safety off of this pre-vote election never succeeding if a leader still had support from the followers that had elected it in the previous term.

Writing this spec revealed a flaw in this approach. Immediately a leader is elected, its followers who elected it do not know that it won the election. As a result, we can't have them refuse to vote for anyone else at a future term or we would lose liveness. Followers should only start "protecting" the leader once they know that it has won the election. This reveals the flaw in the previous assumption that the pre-vote and vote quorums are the ones we care about. Instead, we care about the vote quorum and some post-election "heartbeat" quorum. A leader needs to reach out to each follower and confirm that it knows about the successful election and has not voted in a new election before it can be sure it has that follower's support.

This is interesting for a few reasons:

it more closely aligns with how the etcd/raft's implementation of best-effort leases already work, where they enforce overlapping quorums between non-pre-vote leader election and between heartbeats that inform followers of r.lead.
it means that pre-vote is not necessary for the safety of leader-leases, though it is still a good idea to use for other reasons.
it clarifies the interaction needed for a newly elected leader to gain support for its lease. It also clarifies the relationship between this initial broadcast and the handshake to re-gain support from a follower that it had lost support for. The design had alluded to a MsgFortify extension, where a leader could re-gain support from a follower at new store-liveness epoch. It turns out that this is exactly what we need at the start of a leader term to initially gain support from a quorum as well. This MsgFortify will be broadcast to all followers when a new leader is anointed. This will act like a one-time heartbeat, informing the followers of the successful election and informing the leader of the followers who are aware of the leadership and will now protect it.
a consequence of 3 is that we no longer need to associate the same store-liveness epoch with each follower. Instead, all we need is for each follower to continue to support the store-liveness epoch that it had fortified leadership under. This is a nice bit of flexibility and keeps leases a little less coupled to the initial leader election.

Lead and LeadEpoch require persistence

The initial leader-lease design doc mentioned that a new VoteEpoch needed to be persisted. Because of what we described above, this is no longer correct. A leader does not need to campaign with an associated store-liveness epoch. Instead, it needs to fortify itself with an associated store-liveness epoch after winning the election. Before acknowledging one of these messages, a follower must persist in its HardState new Lead and LeadEpoch fields. This ensures that a loss of volatile state on the follower does not forget that the follower was supporting a leader when it comes back up.

docs/tla-plus/LeaderLeases/RaftLeader.tla

sumeerbhola

mostly questions to make up for my gaps in understanding of TLA+ and Raft.

Reviewed 2 of 2 files at r1.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @arulajmani, @miraradeva, @nvanbenschoten, and @pav-kv)

docs/tla-plus/LeaderLeases/RaftLeader.tla line 15 at r2 (raw file):

(* RaftLeader is a specification for the Raft leader election protocol, as   *)
(* described in section 3.4 of the Raft thesis[^1]. The spec models multiple *)
(* terms, node restarts, and the Pre-Vote election phase extension.          *)

was this needed since https://github.com/ongardie/raft.tla/blob/master/raft.tla omits pre-vote?

docs/tla-plus/LeaderLeases/RaftLeader.tla line 32 at r2 (raw file):

    term      |-> 0,
    vote      |-> 0,
    lead      |-> 0,

basic question: is this lead field in the thesis, and what does it correspond to there? The persistent state is only currentTerm, votedFor (which we keep in HardState) and the log, and I don't see it in the volatile state either. I am guessing this is to define the invariant below.

From a modeling perspective, it isn't clear yet which of these fields are persistent. I'm going to assume this will get clarified later. Edit: I see the restart() macro now.

docs/tla-plus/LeaderLeases/RaftLeader.tla line 38 at r2 (raw file):

  ]];
  \* In-flight messages, by destination.
  network = [i \in Nodes |-> <<>>];

basic modeling clarification: this is a sequence, since it represents the order of delivery to this node, yes?

docs/tla-plus/LeaderLeases/RaftLeader.tla line 56 at r2 (raw file):

  RaftTerms    == 0..Terms
  RaftVotes    == Nodes \union {0}
  RaftLead     == Nodes \union {0}

presumably a 0 is no leader?

docs/tla-plus/LeaderLeases/RaftLeader.tla line 66 at r2 (raw file):

  RaftMsgInvariant ==
    /\ \A i \in Nodes : \A j \in 1..(Len(network[i])-1) :

why the -1? Since sequences are 1 indexed, I was expecting the last index is Len(network[i]). I suppose there is something special about the last message that will get clarified later.

docs/tla-plus/LeaderLeases/RaftLeader.tla line 73 at r2 (raw file):

  TypeInvariants ==
    /\ RaftStateInvariant
    /\ RaftMsgInvariant

PlusCal question: is this needed, given all the invariants need to hold and we've already defined both individually?

docs/tla-plus/LeaderLeases/RaftLeader.tla line 83 at r2 (raw file):

    raft[i].term = raft[j].term /\ raft[i].lead /= 0 /\ raft[j].lead /= 0 =>
      raft[i].lead = raft[j].lead
end define;

so no liveness properties yet?

docs/tla-plus/LeaderLeases/RaftLeader.tla line 91 at r2 (raw file):

macro become_follower(term, lead)
begin
  if raft[self].term /= term then

should this be raft[self].term < term. Or presumably this is an accurate model of what the spec says since the caller has already verified that raft[self].term <= term.

docs/tla-plus/LeaderLeases/RaftLeader.tla line 106 at r2 (raw file):

  raft[self].state     := StatePreCandidate ||
  raft[self].lead      := 0                 ||
  raft[self].voteResps := {self};

so does voteResps model both PreVoteResp and VoteResp? Is there a way/need to distinguish between the two?

docs/tla-plus/LeaderLeases/RaftLeader.tla line 224 at r2 (raw file):

\****************************\

fair process node \in Nodes

PlusCal question: would we need strong fairness if we wanted to add any liveness properties?

docs/tla-plus/LeaderLeases/RaftLeader.tla line 239 at r2 (raw file):

Previously, pav-kv (Pavel Kalinnikov) wrote…

Do we need to allow out-of-order messages here? E.g. by making network[self] be a set from which we get a random message.

I had a similar question. Wondering whether it was done this way since there is non-determinism across processes and within processes that are adding to the sequence, to make the sequence sufficiently non-deterministic. It seems it would not allow for a stray late message to show up, like in the PreVote bug example (though in that case it is possible n1 never received another message between steps 3 and 8, since n2 did not need n1's vote to become leader, so perhaps it will get tickled despite this ordering).

docs/tla-plus/LeaderLeases/RaftLeader.tla line 242 at r2 (raw file):

      if msg.term > raft[self].term then
        if msg.type \in {MsgVote, MsgPreVote} /\ raft[self].lead /= 0 then

if raft[self].lead = 0 then it seems we would want to do the same become_follower(msg.term, 0), but we don't do this here and instead do it in the later else. The structure looks a bit odd. Is this because it is following the spec listed at the end of the thesis. Section 3.4 is a lot of words, so there could be many translations.

docs/tla-plus/LeaderLeases/RaftLeader.tla line 248 at r2 (raw file):

          \* either \* NON-DETERMINISM! */
          \*   \* Reject vote or pre-vote if recently heard from leader.

It can reject a (non-pre) vote for a higher term if it has recently heard from the leader?

docs/tla-plus/LeaderLeases/RaftLeader.tla line 262 at r2 (raw file):

          if msg.type = MsgApp then
            become_follower(msg.term, msg.from);
          else

Would MsgAppResp fall through to here? Presumably that can't happen with a higher term?

docs/tla-plus/LeaderLeases/RaftLeader.tla line 273 at r2 (raw file):

        end if;
      else
        \* TODO(nvanbenschoten): is not having this check a bug?

couldn't it get a stray old pre-vote response at any time?

docs/tla-plus/LeaderLeases/RaftLeader.tla line 281 at r2 (raw file):

      RecvMessage:
        if msg.type \in {MsgVote, MsgPreVote} then
          \* We can vote if this is a repeat of a vote we've already cast...

Looks like for MsgVote and MsgPreVote we can only be here if msg.term >= raft[self].term.

docs/tla-plus/LeaderLeases/RaftLeader.tla line 306 at r2 (raw file):

            \*   end if;
            \* or
            \*   \* The candidate's log is not up-to-date.

so this will require modeling the log, which is not done yet?

docs/tla-plus/LeaderLeases/RaftLeader.tla line 325 at r2 (raw file):

        elsif msg.type \in {MsgVoteResp, MsgPreVoteResp} then
          if ~msg.reject /\ ((raft[self].state = StateCandidate /\ msg.type = MsgVoteResp)
                          \/ (raft[self].state = StatePreCandidate /\ msg.type = MsgPreVoteResp)) then

so it stays in states StateCandidate or StatePreCandidate on a rejection? Which means CanCampaign will not be true? But if someone else (pre)campaigns we will execute become_follower, and can potentially campaign again.
just curious if this is intentional?

docs/tla-plus/LeaderLeases/RaftLeader.tla line 348 at r2 (raw file):

        elsif msg.type = MsgApp then
          if msg.term < raft[self].term then
            \* TODO: will this violate leases?

can you elaborate on this TODO -- I don't see the relationship with leases?

docs/tla-plus/LeaderLeases/RaftLeader.tla line 357 at r2 (raw file):

          if ~msg.reject /\ raft[self].state = StateLeader then
            raft[self].appResps := raft[self].appResps \union {msg.from} ||
            raft[self].appended := (raft[self].appResps \union {msg.from}) \in RaftQuorum;

does appended really mean committed?

docs/tla-plus/LeaderLeases/RaftLeader.tla line 359 at r2 (raw file):

            raft[self].appended := (raft[self].appResps \union {msg.from}) \in RaftQuorum;
            if raft[self].appended then
              Appended:

what is the purpose of this Appended label?

docs/tla-plus/LeaderLeases/RaftLeader.tla line 372 at r2 (raw file):

end algorithm; *)
\* BEGIN TRANSLATION (chksum(pcal) = "6779ecf7" /\ chksum(tla) = "6dcdeccf")
VARIABLES raft, network, pc

Is the following the TLA+ translation of PlusCal, so can ignore?

sumeerbhola

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @arulajmani, @miraradeva, @nvanbenschoten, and @pav-kv)

docs/tla-plus/LeaderLeases/RaftLeader.tla line 248 at r2 (raw file):

Previously, sumeerbhola wrote…

It can reject a (non-pre) vote for a higher term if it has recently heard from the leader?

Never mind, I see the following in raft.go

				// If a server receives a RequestVote request within the minimum election timeout
				// of hearing from a current leader, it does not update its term or grant its vote

This commit adds a new TLA+ specification (expressed in PlusCal) called RaftLeader. RaftLeader is a specification for the Raft leader election protocol, as described in section 3.4 of the Raft thesis[^1]. The spec models multiple terms, node restarts, and the Pre-Vote election phase extension. The central safety property of leader election is exclusivity. In Raft, this is defined as the Election Safety Property: "at most one leader can be elected in a given term". This property is defined as an invariant of the spec. My goal is to use this spec to verify the correctness of leader-leases. We can start with this simpler spec and then extend it to include notions of store liveness and of leases associated with leaders. From there, we can define the lease disjointness invariant and prove that it holds. [^1]: https://web.stanford.edu/~ouster/cgi-bin/papers/OngaroPhD.pdf Epic: None Release note: None

nvanbenschoten

Dismissed @sumeerbhola from a discussion.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @arulajmani, @miraradeva, @pav-kv, and @sumeerbhola)

docs/tla-plus/LeaderLeases/RaftLeader.tla line 15 at r2 (raw file):

Previously, sumeerbhola wrote…

was this needed since https://github.com/ongardie/raft.tla/blob/master/raft.tla omits pre-vote?

It was useful (not needed) for all of the following reasons:

raft.tla is all of raft, so it's a larger spec than necessary.
raft.tla does not have any invariants. It's a spec, but there's nothing to check. https://github.com/Vanlightly/raft-tlaplus/blob/main/specifications/standard-raft/Raft.tla improves on this.
raft.tla omits pre-vote
I want to extend this spec to include leader-leases, and would prefer to do so in PlusCal. That's both because PlusCal is easier to write, it's easier to read, and it's easier to make it more closely mirror our specific etcd/raft implementation.

docs/tla-plus/LeaderLeases/RaftLeader.tla line 32 at r2 (raw file):

is this lead field in the thesis, and what does it correspond to there?

You're right, it's not explicitly in the raft thesis/paper. It is implied though, just like a state (Leader, Follower, Candidate) field is implied. And both of these are in our implementation.

docs/tla-plus/LeaderLeases/RaftLeader.tla line 38 at r2 (raw file):

Previously, sumeerbhola wrote…

basic modeling clarification: this is a sequence, since it represents the order of delivery to this node, yes?

Yes, see Pavel's question above. We can do out-of-order delivery, and we do in #123523, but it makes model checking exponentially slower. It's an easy switch once we're ready for it.

docs/tla-plus/LeaderLeases/RaftLeader.tla line 56 at r2 (raw file):

Previously, sumeerbhola wrote…

presumably a 0 is no leader?

Yes, 0 is no leader. This is used to define vote and lead as optional.

docs/tla-plus/LeaderLeases/RaftLeader.tla line 66 at r2 (raw file):

Previously, sumeerbhola wrote…

why the -1? Since sequences are 1 indexed, I was expecting the last index is Len(network[i]). I suppose there is something special about the last message that will get clarified later.

Just a bug. Fixed while adding out-of-order delivery.

docs/tla-plus/LeaderLeases/RaftLeader.tla line 73 at r2 (raw file):

Previously, sumeerbhola wrote…

PlusCal question: is this needed, given all the invariants need to hold and we've already defined both individually?

In PlusCal / TLA+, you need to then tell the spec which invariants to check at each step. So this packages these two invariants under a single operator which we then plug into the .cfg file.

docs/tla-plus/LeaderLeases/RaftLeader.tla line 83 at r2 (raw file):

Previously, sumeerbhola wrote…

so no liveness properties yet?

Not yet. I'm not sure that we will add any. I've found that they're harder to express in TLA+, often requiring a spec to be stripped down further first. Raft also doesn't have a guarantee of liveness (leader elections can continue to hang) and relies on randomness in the election timeouts to get close.

docs/tla-plus/LeaderLeases/RaftLeader.tla line 91 at r2 (raw file):

Previously, sumeerbhola wrote…

should this be raft[self].term < term. Or presumably this is an accurate model of what the spec says since the caller has already verified that raft[self].term <= term.

This is just trying to mirror the etcd/raft code 1:1.

docs/tla-plus/LeaderLeases/RaftLeader.tla line 106 at r2 (raw file):

Previously, sumeerbhola wrote…

so does voteResps model both PreVoteResp and VoteResp? Is there a way/need to distinguish between the two?

That's correct. We don't need to distinguish between the two, we can tell the difference based on state. This is also how etcd/raft works.

docs/tla-plus/LeaderLeases/RaftLeader.tla line 224 at r2 (raw file):

Previously, sumeerbhola wrote…

PlusCal question: would we need strong fairness if we wanted to add any liveness properties?

I think even that would not be enough to guarantee that a leader election eventually succeeds. But I don't understand this part of TLA+ to say for sure.

docs/tla-plus/LeaderLeases/RaftLeader.tla line 242 at r2 (raw file):

Previously, sumeerbhola wrote…

if raft[self].lead = 0 then it seems we would want to do the same become_follower(msg.term, 0), but we don't do this here and instead do it in the later else. The structure looks a bit odd. Is this because it is following the spec listed at the end of the thesis. Section 3.4 is a lot of words, so there could be many translations.

This structure is a bit odd. It was needed to avoid adding another label when the non-determinism below is enabled. It should be equivalent to the etcd/raft logic.

docs/tla-plus/LeaderLeases/RaftLeader.tla line 262 at r2 (raw file):

Previously, sumeerbhola wrote…

Would MsgAppResp fall through to here? Presumably that can't happen with a higher term?

Yes, it would fall through. And it can actually happen, see

cockroach/pkg/raft/raft.go

Line 1055 in b4ad399

// We have received messages from a leader at a lower term. It is possible

.

This is a case we will need to handle with leader leases using a graceful stepdown.

docs/tla-plus/LeaderLeases/RaftLeader.tla line 273 at r2 (raw file):

Previously, sumeerbhola wrote…

couldn't it get a stray old pre-vote response at any time?

This bug has been captured in #123297. The key to this bug is that we send PreVote requests at Term+1, so stray old pre-votes may have m.Term=r.Term (and hit this branch) and not m.Term < r.Term and hit the discard branch above.

docs/tla-plus/LeaderLeases/RaftLeader.tla line 281 at r2 (raw file):

Previously, sumeerbhola wrote…

Looks like for MsgVote and MsgPreVote we can only be here if msg.term >= raft[self].term.

Right, this is here to allow a replay of the same vote for the same term. It's also a direct translation from etcd/raft and probably more than we strictly need for this spec.

docs/tla-plus/LeaderLeases/RaftLeader.tla line 306 at r2 (raw file):

Previously, sumeerbhola wrote…

so this will require modeling the log, which is not done yet?

Either modeling it or treating it as a form of non-determinism. If we aren't adding log-related invariants to this spec (to do so, we'd need a larger spec, which may make it unfeasible to model check), then non-determinism allows us to explore the larger state space more easily.

docs/tla-plus/LeaderLeases/RaftLeader.tla line 325 at r2 (raw file):

Previously, sumeerbhola wrote…

so it stays in states StateCandidate or StatePreCandidate on a rejection? Which means CanCampaign will not be true? But if someone else (pre)campaigns we will execute become_follower, and can potentially campaign again.
just curious if this is intentional?

Correct, it won't re-campaign if it loses the election. We could expand the spec to detect when an election is definitely lost and re-campaign.

docs/tla-plus/LeaderLeases/RaftLeader.tla line 348 at r2 (raw file):

Previously, sumeerbhola wrote…

can you elaborate on this TODO -- I don't see the relationship with leases?

This was related to the discussion above about this code. Currently, if a candidate moves to a later term and then eventually hears from the leader, it will inform the leader about this and encourage it to step down. The rationale is that this will allow the quorum to move to the later term and "pick up" this replica who was otherwise stranded at a large term. This is mostly avoided by pre-vote, but is still possible with it.

We will need to handle this with leases, but we'll need to do so without invalidating any future-time reads that the leaseholder has already served.

docs/tla-plus/LeaderLeases/RaftLeader.tla line 357 at r2 (raw file):

Previously, sumeerbhola wrote…

does appended really mean committed?

Yes, renamed.

docs/tla-plus/LeaderLeases/RaftLeader.tla line 359 at r2 (raw file):

Previously, sumeerbhola wrote…

what is the purpose of this Appended label?

Just for debugging purposes, so I can see it in the state space coverage explorer.

docs/tla-plus/LeaderLeases/RaftLeader.tla line 372 at r2 (raw file):

Previously, sumeerbhola wrote…

Is the following the TLA+ translation of PlusCal, so can ignore?

Yes, it's the translation. Can be ignored.

andrewbaptist

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @arulajmani, @miraradeva, @nvanbenschoten, @pav-kv, and @sumeerbhola)

docs/tla-plus/LeaderLeases/RaftLeader.tla line 241 at r4 (raw file):

  toSend  = <<>>;
  msg     = [type |-> 0];
begin TickLoop:

Wouldn't this be better represented as nodes "responding" to messages they receive rather than sitting in a TickLoop? It seems like from a modeling perspective this doesn't add any additional value and it makes it unclear whether the TickLoop is really necessary. There are only two "triggers" here. Either CanCampaign or ReceiveMsg. I'm not very familiar with TLA+ so maybe there is some value in having this done this way. This is related to my comment in the doc as well.

Feel free to ignore this if this is the best way to represent this.

123827: tla-plus: add .cfg file for ParallelCommits spec r=nvanbenschoten a=nvanbenschoten NOTE: this is just extracting the first commit from #123189 out into a separate PR so that we can get it merged. This commit updates the ParallelCommit spec with a .cfg config file which mirrors its Toolbox .launch file. This allows the spec to be checked with the command-line version of the TLC model checker. As a result, this allows the spec to play well with the VSCode TLA+ extension, which is more popular and easier to use than the original eclipse-based TLA+ Toolbox. Epic: None Relase note: None Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>

nvanbenschoten requested review from arulajmani and miraradeva April 29, 2024 05:22

nvanbenschoten mentioned this pull request Apr 30, 2024

raft: pre-candidate can spuriously win election from vote in prior term #123297

Open

pav-kv reviewed May 6, 2024

View reviewed changes

docs/tla-plus/LeaderLeases/RaftLeader.tla Outdated Show resolved Hide resolved

docs/tla-plus/LeaderLeases/RaftLeader.tla Outdated Show resolved Hide resolved

sumeerbhola requested a review from pav-kv May 6, 2024 16:13

sumeerbhola requested changes May 6, 2024

View reviewed changes

sumeerbhola reviewed May 6, 2024

View reviewed changes

nvanbenschoten force-pushed the nvanbenschoten/leaderLeaseTLA branch from 2c11ba1 to dd79a9a Compare May 6, 2024 18:46

nvanbenschoten commented May 6, 2024

View reviewed changes

nvanbenschoten mentioned this pull request May 8, 2024

tla-plus: add .cfg file for ParallelCommits spec #123827

Merged

andrewbaptist requested a review from sumeerbhola May 8, 2024 18:53

andrewbaptist reviewed May 8, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tla-plus: add spec for raft leader election #123189

tla-plus: add spec for raft leader election #123189

nvanbenschoten commented Apr 29, 2024

cockroach-teamcity commented Apr 29, 2024

nvanbenschoten commented Apr 29, 2024

sumeerbhola left a comment

sumeerbhola left a comment

nvanbenschoten left a comment

andrewbaptist left a comment

tla-plus: add spec for raft leader election #123189

Are you sure you want to change the base?

tla-plus: add spec for raft leader election #123189

Conversation

nvanbenschoten commented Apr 29, 2024

Footnotes

cockroach-teamcity commented Apr 29, 2024

nvanbenschoten commented Apr 29, 2024

Experience report

Background

Experience

Results

PreVote bug

Leader-lease overlapping quorums

Lead and LeadEpoch require persistence

sumeerbhola left a comment

Choose a reason for hiding this comment

sumeerbhola left a comment

Choose a reason for hiding this comment

nvanbenschoten left a comment

Choose a reason for hiding this comment

andrewbaptist left a comment

Choose a reason for hiding this comment