New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sentinel tries to elect odown master. #1821
Comments
check the redis sentinel doc, based on your master/slave config, u need to have 3 sentinels min, quroum is 2. so when 1 sentinel goes down, the other 2 could elect leader, then kick in the failover process. |
Or... a single one if you don't care about Sentinel being a single point of failure (discouraged approach). However note that if you go for three, you need to setup Sentinel in three different computers (or virtual machines) that are likely to fail independently, otherwise you have a setup that is only valid under the assumption of single processes failing (like Redis server crashing) but not working on netsplits, since two or more Sentinels will run into the same physical host (so will always get partitioned together). |
@DutchMark So did you figure out what the issue was ? Im having the exact same issue as you. My quorum is set to 1 since Im just testing it out. Eventually ill have 3 separate nodes with quorum set to 1. |
Why is it closed? I am facing the same issue. |
I have a very simple setup, one master and one slave. On both instances I have also a sentinel, with the same configuration file on both:
sentinel monitor mymaster 10.99.13.107 6379 1
sentinel down-after-milliseconds mymaster 10000
sentinel failover-timeout mymaster 10000
loglevel verbose
When I kill the master instance, the failover procedure kicks in correctly and the slave gets promoted to master. However, when I kill both the master and the sentinel on the same instance (I want to simulate what happens when an instance crashes or goes down completely) then the failover procedure does not happen. The sentinel that lives on the slave instance keeps trying to elect the original master. The log of that sentinel is this:
[28069] 17 Jun 22:57:20.302 # Sentinel runid is 7d08ab54ddce7931c745459996aa0cf1e33f98c1
[28069] 17 Jun 22:57:20.302 # +monitor master mymaster 10.99.13.107 6379 quorum 1
[28069] 17 Jun 22:57:20.900 * +sentinel sentinel 10.99.13.107:26379 10.99.13.107 26379 @ mymaster 10.99.13.107 6379
[28069] 17 Jun 22:57:20.914 # +new-epoch 283
[28069] 17 Jun 22:57:22.395 - Accepted 10.99.13.107:49615
[28069] 17 Jun 22:57:40.395 * +slave slave 10.194.250.140:6379 10.194.250.140 6379 @ mymaster 10.99.13.107 6379
[28069] 17 Jun 22:58:50.896 - Client closed connection
[28069] 17 Jun 22:59:00.940 # +sdown sentinel 10.99.13.107:26379 10.99.13.107 26379 @ mymaster 10.99.13.107 6379
[28069] 17 Jun 22:59:01.196 # +sdown master mymaster 10.99.13.107 6379
[28069] 17 Jun 22:59:01.196 # +odown master mymaster 10.99.13.107 6379 #quorum 1/1
[28069] 17 Jun 22:59:01.196 # +new-epoch 284
[28069] 17 Jun 22:59:01.196 # +try-failover master mymaster 10.99.13.107 6379
[28069] 17 Jun 22:59:01.203 # +vote-for-leader 7d08ab54ddce7931c745459996aa0cf1e33f98c1 284
[28069] 17 Jun 22:59:11.538 # -failover-abort-not-elected master mymaster 10.99.13.107 6379
[28069] 17 Jun 22:59:11.628 # Next failover delay: I will not start a failover before Tue Jun 17 22:59:21 2014
[28069] 17 Jun 22:59:21.475 # +new-epoch 285
[28069] 17 Jun 22:59:21.475 # +try-failover master mymaster 10.99.13.107 6379
[28069] 17 Jun 22:59:21.482 # +vote-for-leader 7d08ab54ddce7931c745459996aa0cf1e33f98c1 285
[28069] 17 Jun 22:59:32.449 # -failover-abort-not-elected master mymaster 10.99.13.107 6379
[28069] 17 Jun 22:59:32.525 # Next failover delay: I will not start a failover before Tue Jun 17 22:59:42 2014
etc. etc. it keeps retrying. As you can see from this log it knows about the slave (+slave). It also knows about the master going down ("+sdown master mymaster" and "+odown master mymaster"). So why does it keep doing "+try-failover master mymaster 10.99.13.107 6379" and elects the master that it knows is down?
redis-server --version: Redis server v=2.8.10 sha=00000000:0 malloc=tcmalloc-2.0 bits=64 build=176d015270bbec54
The text was updated successfully, but these errors were encountered: