recvAppendEntries: Assertion `r->state == RAFT_FOLLOWER || r->state == RAFT_CANDIDATE` failed #386

cole-miller · 2023-03-14T21:56:49Z

Found by Jepsen, here's the job:

https://github.com/canonical/jepsen.dqlite/actions/runs/4418832275/jobs/7746473216

Hat tip to @nurturenature, whose recent improvements to our Jepsen harness helped uncover this.

cole-miller · 2023-03-15T00:50:41Z

Here's the failing assertion:

Lines 58 to 87 in 98070bd

    
               /* If we get here it means that the term in the request matches our current 
        
                * term or it was higher and we have possibly stepped down, because we 
        
                * discovered the current leader: 
        
                * 
        
                * From Figure 3.1: 
        
                * 
        
                *   Rules for Servers: Candidates: if AppendEntries RPC is received from 
        
                *   new leader: convert to follower. 
        
                * 
        
                * From Section 3.4: 
        
                * 
        
                *   While waiting for votes, a candidate may receive an AppendEntries RPC 
        
                *   from another server claiming to be leader. If the leader's term 
        
                *   (included in its RPC) is at least as large as the candidate's current 
        
                *   term, then the candidate recognizes the leader as legitimate and 
        
                *   returns to follower state. If the term in the RPC is smaller than the 
        
                *   candidate's current term, then the candidate rejects the RPC and 
        
                *   continues in candidate state. 
        
                * 
        
                * From state diagram in Figure 3.3: 
        
                * 
        
                *   [candidate]: discovers current leader -> [follower] 
        
                * 
        
                * Note that it should not be possible for us to be in leader state, because 
        
                * the leader that is sending us the request should have either a lower term 
        
                * (and in that case we reject the request above), or a higher term (and in 
        
                * that case we step down). It can't have the same term because at most one 
        
                * leader can be elected at any given term. 
        
                */ 
        
               assert(r->state == RAFT_FOLLOWER || r->state == RAFT_CANDIDATE);

If the invariants described in that comment are being violated, that seems pretty serious.

cole-miller · 2023-06-06T11:56:37Z

This continues to happen, here's a job where you can also see a consistency violation (stale reads): https://github.com/canonical/jepsen.dqlite/actions/runs/5187761474/jobs/9350528154

freeekanayaka · 2023-06-06T14:26:38Z

This continues to happen, here's a job where you can also see a consistency violation (stale reads): https://github.com/canonical/jepsen.dqlite/actions/runs/5187761474/jobs/9350528154

Looking at the logs It looks like term 11 had two leaders. The first term-11 leader was n2 elected at:

LIBRAFT   1686049635974788419 src/recv_request_vote_result.c:132 votes quorum reached -> convert to leader

the second term-11 leader was n3 elected about 8 seconds later at:

LIBRAFT   1686049643929871543 src/recv_request_vote_result.c:132 votes quorum reached -> convert to leader

Perhaps n3 was partitioned before n2 was elected and was stuck at term 10, at some point it became candidate and bumped to term 11, winning the elections. I will investigate further to see what happened exactly.

freeekanayaka · 2023-06-06T18:00:41Z

It seems that the data.tar.bz2 file in the GHA artifact saved for the Jepsen run is empty, for all nodes. Is that normal? Like we're not actually including in the artifact zip the data dir of the node, because of space concerns. Or is this a bug in our Jepsen code?

cole-miller · 2023-06-06T18:07:59Z

@freeekanayaka I think it's a bug.

MathieuBordere · 2023-07-05T14:11:29Z

Happened again here
jepsen-data-append-partition-failure.zip

Not on a master run, but don't think it matters ...

cole-miller · 2023-07-31T18:34:23Z

It seems what happens in the logs Mathieu posted is that n3 goes offline at some point, comes back, gets its log up to speed, but somehow keeps operating using a very old configuration (several ASSIGNs ago). So when the election for term 4 happens, n2 thinks that the voters are {n1, n2, n4}, while n3 thinks the voters are {n1, n3, n5}. They both receive one other vote and become leader, and some time later n2 gets an unexpected request from n3. I will have to dig further to figure out why n3 doesn't switch to a newer configuration as it's updating its log.

cole-miller · 2023-07-31T20:23:43Z

I think the root of the problem might be the code in appendFollowerCb (replication.c) that can update the last_stored index for a configuration log entry without applying that configuration locally, if the node has become a candidate since appendFollowerCb was installed. This make it possible for n3 to appear up to date to other nodes based on the data in its RequestVote messages, while still operating with an old configuration.

That raises the question of what a candidate node should do if it finds itself applying (uncommitted) a configuration change during its candidacy. Seems like it would be best for it to just end its candidacy in this case? Otherwise counting votes turns into a mess. Or maybe followers shouldn't convert to candidate until all their in-flight append requests have completed successfully, so that configuration changes get a chance to be applied.

hifron · 2023-11-24T17:20:22Z

I think that this is candidate for corporate db vendors review where replication problem with a lot of theory was made into real products on the market...

There is also Postgres but think they are some century behind...

cole-miller changed the title ~~recvAppendEntries: Assertion r->state == RAFT_FOLLOWER || r->state == RAFT_CANDIDATE' failed.`~~ recvAppendEntries: Assertion `r->state == RAFT_FOLLOWER || r->state == RAFT_CANDIDATE' failed. Mar 14, 2023

cole-miller changed the title ~~recvAppendEntries: Assertion `r->state == RAFT_FOLLOWER || r->state == RAFT_CANDIDATE' failed.~~ recvAppendEntries: Assertion r->state == RAFT_FOLLOWER || r->state == RAFT_CANDIDATE failed Mar 14, 2023

cole-miller mentioned this issue Mar 27, 2023

src/uv_truncate.c:58: Assertion `i < n_segments' failed canonical/dqlite#591

Open

This was referenced Jun 6, 2023

data.tar.bz2 seems to be empty canonical/jepsen.dqlite#105

Closed

Fix possible split brain and consequent crash due to assertion failure #429

Merged

MathieuBordere closed this as completed in #429 Jun 7, 2023

MathieuBordere reopened this Jul 5, 2023

MathieuBordere mentioned this issue Jul 5, 2023

Fix callback for second barrier not being attached #435

Merged

cole-miller self-assigned this Jul 27, 2023

cole-miller mentioned this issue Jul 27, 2023

Trace current term in a couple of key places #463

Merged

cole-miller mentioned this issue Jul 31, 2023

Don't convert to candidate while entries are being persisted #464

Closed

cole-miller added the Bug Confirmed to be a bug label Aug 3, 2023

cole-miller mentioned this issue Aug 4, 2023

Update commit index and apply configs eagerly #465

Closed

cole-miller mentioned this issue Aug 15, 2023

Don't convert to candidate while entries are being persisted, take 2 #467

Merged

MathieuBordere closed this as completed in #467 Aug 16, 2023

cole-miller mentioned this issue Aug 16, 2023

Should leaders wait to persist entries before stepping down? canonical/dqlite#577

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

recvAppendEntries: Assertion `r->state == RAFT_FOLLOWER || r->state == RAFT_CANDIDATE` failed #386

recvAppendEntries: Assertion `r->state == RAFT_FOLLOWER || r->state == RAFT_CANDIDATE` failed #386

cole-miller commented Mar 14, 2023

cole-miller commented Mar 15, 2023

cole-miller commented Jun 6, 2023

freeekanayaka commented Jun 6, 2023 •

edited

freeekanayaka commented Jun 6, 2023

cole-miller commented Jun 6, 2023

MathieuBordere commented Jul 5, 2023

cole-miller commented Jul 31, 2023 •

edited

cole-miller commented Jul 31, 2023 •

edited

hifron commented Nov 24, 2023

recvAppendEntries: Assertion r->state == RAFT_FOLLOWER || r->state == RAFT_CANDIDATE failed #386

recvAppendEntries: Assertion r->state == RAFT_FOLLOWER || r->state == RAFT_CANDIDATE failed #386

Comments

cole-miller commented Mar 14, 2023

cole-miller commented Mar 15, 2023

cole-miller commented Jun 6, 2023

freeekanayaka commented Jun 6, 2023 • edited

freeekanayaka commented Jun 6, 2023

cole-miller commented Jun 6, 2023

MathieuBordere commented Jul 5, 2023

cole-miller commented Jul 31, 2023 • edited

cole-miller commented Jul 31, 2023 • edited

hifron commented Nov 24, 2023

recvAppendEntries: Assertion `r->state == RAFT_FOLLOWER || r->state == RAFT_CANDIDATE` failed #386

recvAppendEntries: Assertion `r->state == RAFT_FOLLOWER || r->state == RAFT_CANDIDATE` failed #386

freeekanayaka commented Jun 6, 2023 •

edited

cole-miller commented Jul 31, 2023 •

edited

cole-miller commented Jul 31, 2023 •

edited