New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sentinels votes for themselves #2243
Comments
I met the same problem。 when i kill all process( 'pkill -9 redis') on mechine A, but B's log: log-26380--------------------------[4761] 26 Dec 17:41:18.927 # +sdown sentinel 192.168.163.91:26380 192.168.163.91 26380 @ master1 192.168.163.91 6379
[4761] 26 Dec 17:41:19.046 # +sdown master master1 192.168.163.91 6379
[4761] 26 Dec 17:41:19.120 # +new-epoch 50
[4761] 26 Dec 17:41:19.122 # +vote-for-leader 6c46bafb874d7b183f02ca5c084e7637a1825c9e 50
[4761] 26 Dec 17:41:20.165 # +odown master master1 192.168.163.91 6379 #quorum 2/2
[4761] 26 Dec 17:41:20.165 # Next failover delay: I will not start a failover before Fri Dec 26 17:47:19 2014 log-26379--------------------------[4760] 26 Dec 17:41:18.984 # +sdown sentinel 192.168.163.91:26380 192.168.163.91 26380 @ master1 192.168.163.91 6379
[4760] 26 Dec 17:41:18.985 # +sdown sentinel 192.168.163.91:26379 192.168.163.91 26379 @ master1 192.168.163.91 6379
[4760] 26 Dec 17:41:19.048 # +sdown master master1 192.168.163.91 6379
[4760] 26 Dec 17:41:19.111 # +odown master master1 192.168.163.91 6379 #quorum 2/2
[4760] 26 Dec 17:41:19.111 # +new-epoch 50
[4760] 26 Dec 17:41:19.111 # +try-failover master master1 192.168.163.91 6379
[4760] 26 Dec 17:41:19.117 # +vote-for-leader 6c46bafb874d7b183f02ca5c084e7637a1825c9e 50
[4760] 26 Dec 17:41:19.123 # 192.168.163.90:26380 voted for 6c46bafb874d7b183f02ca5c084e7637a1825c9e 50
[4760] 26 Dec 17:41:29.601 # -failover-abort-not-elected master master1 192.168.163.91 6379
[4760] 26 Dec 17:41:29.653 # Next failover delay: I will not start a failover before Fri Dec 26 17:47:19 2014 |
At epoch 289, every Sentinel votes for the same leader: [17193] 24 Dec 20:16:51.443 # +vote-for-leader 32f2d6d6bb62b7404263df90ca7ef23a64827276 289 At epoch 290, two Sentinels vote But at epoch 291, every sentinel votes only for itself (!): [17193] 24 Dec 20:20:35.315 # +vote-for-leader d2e257f5af8becf766f350139af3526efa5d2741 291
[17193] 24 Dec 20:20:35.316 # 10.120.45.88:26379 voted for 32f2d6d6bb62b7404263df90ca7ef23a64827276 291
[17193] 24 Dec 20:20:35.316 # 10.120.45.89:26379 voted for ccdbea66faa41eabc7482e07b47bf3c3c59d9ebd 291 The char *sentinelVoteLeader(sentinelRedisInstance *master, uint64_t req_epoch, char *req_runid, uint64_t *leader_epoch) {
.
.
.
/* If we did not voted for ourselves, set the master failover start
* time to now, in order to force a delay before we can start a
* failover for the same master. */
if (strcasecmp(master->leader,server.runid))
master->failover_start_time = mstime()+rand()%SENTINEL_MAX_DESYNC; I'm not sure why at epoch 291 each Sentinel has given up believing in all the others. |
Hello, this usually is the result of slow interaction or poor desynchronization between Sentinels. A Senitel votes for itself unless it already received a request for vote from another Sentinel. Usually one is faster than the other, since they are desynchronized, but if they are slow to communicate, the communication time becomes larger than the time desync, and a split brain condition happens, with a new desynchronization, and a new vote attempt (so the failover will eventually happen, anyway). It is probably possible to improve on the desynchronization (a long time TODO item of mine...), but here would be more interesting to see why sometimes Sentinels are likely slow to communicate, assuming this is the case. Otherwise there is to understand if the desynchronization is not effective enough... I'll investigate this issue the next week and report back. Btw in your environment, is this simple to reproduce by running the Sentinel unit tests? Thanks. |
Hello! How is the work on this going? I'm currently having the same issue: sentinels keep on voting for themselves until at some point they both vote for the same. |
@wangjn01 Please let me know if you were able to find the solution for this issue? I am facing the same issue |
i have 2 redis and 3 sentinels. Sometimes when master down, sentinels votes for themselves so no leader voted, failover abort
sentinel 1 log
sentinel 2 log
sentinel 3 log
The text was updated successfully, but these errors were encountered: