-
Notifications
You must be signed in to change notification settings - Fork 28
Processing a large incoming AppendEntriesReply in I/O thread can trigger an election timeout on the Receiver #40
Comments
Two things to note:
|
First off, thanks for reporting these issues! I noticed both these while doing some preliminary testing on AWS, but decided to punt on them until I finished the work on snapshots. Moreover, until I looked at the log above I didn't have a good mental model of how that ISE could be tripped. That said, I have created #41 and can prioritize work on that. The fix on that is simple and should be done in the The reason a response is sent in https://github.com/allengeorge/libraft/blob/master/libraft-core/src/main/java/io/libraft/algorithm/RaftAlgorithm.java#L1302 is so that peers who are behind can notice the term change and catch up. If you simply ignore the request you lose the opportunity to force the cluster to make progress. |
As to the election timeout issue: there are a lot of optimizations that can be done to speed up operation. Again, I decided to punt on that until after the snapshot work had been completed. Through pure coincidence I was thinking about this earlier and came up with this solution: Instead of creating an AppendEntries message on every heartbeat that contains all the missing entries for a peer, one could simply modify https://github.com/allengeorge/libraft/blob/master/libraft-core/src/main/java/io/libraft/algorithm/RaftAlgorithm.java#L1061 to only send |
Not sure the best way to put the trace logging into a github issue so I'll just leave that messiness until the end.
The situation I have encountered is that with a small enough election timeout (I am using 300ms) when a node tries to re-enter the cluster there can be enough entries in the first AppendEntries message it receives that the I/O thread actually blocks long enough for an election timeout. This seems to cause that node to then send AppendEntriesReplies to all of the AppendEntries that backed up (due to heartbeats) but with the new term (since the node started an election). I was able to "fix" this by adding a call to scheduleElectionTimeout() at the end of each iteration of the for(LogEntry entry : entries) loop in onAppendEntries. Not a particularly elegant solution. Changing config params will also fix it but I thought it was worth reporting.
I think a follower could also just ignore AppendEntriesReply RPCs instead of failing on the precondition of being a leader. However, I'm sure you have spent more time with the algorithm than me and may be able to think of a reason why that would be a bad idea.
Here is some evidence of the issue.
The exception on the current leader
The follower timing out while processing log entries
The text was updated successfully, but these errors were encountered: