-
Notifications
You must be signed in to change notification settings - Fork 118
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
short lived leader election reported when starting a raft cluster #168
Comments
How is it possible if there are only 3262, 3263, 3264 nodes? |
@sakno sorry for the noise, I used the log messages of the 10 nodes test run because they looked the same and forgot about it. Here are the actual entries for the 3 nodes run:
|
So the problem with 3263 node, right? It was elected as a leader, and that fact was not reflected by other nodes. |
Yes, that is the issue.
…On Sat, Jun 3, 2023, 09:20 SRV ***@***.***> wrote:
So the problem with 3263 node, right? It was elected as a leader, and that
fact was not reflected by other nodes.
—
Reply to this email directly, view it on GitHub
<#168 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AQ42ZDNUD5QTQOCO2YD5623XJLQVPANCNFSM6AAAAAAYYAJOR4>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
In theory, it's possible (and correct from Raft perspective). But it should be very rare situation. I'll take a look. |
Maybe related PR #170 |
Can you turn on |
forgot to add that I set logging to trace and did not get anything extra |
Example code uses |
Turned out a lot of the logging is only enabled for http. Here are the details of one of the runs after enabling it for tcp. Some things that I found interesting:
|
This is fine. Term 0 is an initial term for every fresh node. Transition to Candidate increases the term. |
I missed it says Member is downgrading vs. Member is downgraded / so its not double message of that. The timeout message does show twice though. |
I found a way to repro the issue with an isolated unit test (see commit above). Test fails and reports that some nodes in the cluster see different leaders immediately after start. |
I also tracked down on my side some of it. At least in the last capture I made the election timer trigers 35ms before the node transitions to Candidate. In between those 2 steps, the node can vote for others. |
I forgot to add |
I did not have time to post the details yesterday. It seems the root cause is that in the process of becoming a candidate the node can still grant votes, and when it does it still becomes a candidate. One potential solution is in RaftCluster.MoveToCandidateState, after taking the transitionSync lock, to abort becoming a candidate if FollowerState.Refresh was called after the timeout was set. PreVote already seems to work similarly in there. It seems the lock would prevent votes while we are checking this, which should remove the ambiguity on voting vs. becoming a candidate. Additionally it seems FollowerState.Refresh and related caller logic is called under the lock too, so it could help avoiding similar issues in other uses that expect the timeout to have been refreshed if the node was still a follower. This is the capture from the node that I used to track it down:
The "timer timed out" log entry is an entry I added to FollowerState.Track when it exits the loop and sets timedOut = true. Here is my interpretation of it:
|
I'll check this out. However, I can tell that this part of the algorithm remains unchanged for a long time. Also, I recommend you to check |
FYI I pulled the latest develop after my last message and reproduced it again. |
By the way, the issue is present at least since version 4.4.1 which is the one we have been running, so it makes sense it relates to code that has not changed over time. We had some issues on our side that made it harder to notice the issue. |
Fixed as following: check whether the refresh has been requested after the lock inside of |
works great, thanks! ran the example reproduction at least 15 times and in most cases it properly elects the leader in term 1. The other case was when all nodes became candidates close to each others and rejected each others votes and then elected a leader in term 2 (as expected in raft). |
Perfect, I'll publish a new release today. |
Fixed in 4.12.2 |
The issue can be reproduced with the example project, by building it locally and adding a cluster.bat to the output folder that has the following lines:
Run:
del -r node* && .\cluster.bat
I can reproduce it with 2-6 attempts in a windows x64 machine. The issue was originally found in a raspberry pi (arm - linux).
The leader prints this its log:
Other nodes print (+ the leader prints the save messages)
When done with the run, the RaftNode processes need to be killed via task manager since they are running in the background.
Originally posted by @freddyrios in #167 (reply in thread)
The text was updated successfully, but these errors were encountered: