Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sentinel tries to elect odown master. #1821

Closed
DutchMark opened this issue Jun 17, 2014 · 4 comments
Closed

Sentinel tries to elect odown master. #1821

DutchMark opened this issue Jun 17, 2014 · 4 comments

Comments

@DutchMark
Copy link

I have a very simple setup, one master and one slave. On both instances I have also a sentinel, with the same configuration file on both:

sentinel monitor mymaster 10.99.13.107 6379 1
sentinel down-after-milliseconds mymaster 10000
sentinel failover-timeout mymaster 10000
loglevel verbose

When I kill the master instance, the failover procedure kicks in correctly and the slave gets promoted to master. However, when I kill both the master and the sentinel on the same instance (I want to simulate what happens when an instance crashes or goes down completely) then the failover procedure does not happen. The sentinel that lives on the slave instance keeps trying to elect the original master. The log of that sentinel is this:

[28069] 17 Jun 22:57:20.302 # Sentinel runid is 7d08ab54ddce7931c745459996aa0cf1e33f98c1
[28069] 17 Jun 22:57:20.302 # +monitor master mymaster 10.99.13.107 6379 quorum 1
[28069] 17 Jun 22:57:20.900 * +sentinel sentinel 10.99.13.107:26379 10.99.13.107 26379 @ mymaster 10.99.13.107 6379
[28069] 17 Jun 22:57:20.914 # +new-epoch 283
[28069] 17 Jun 22:57:22.395 - Accepted 10.99.13.107:49615
[28069] 17 Jun 22:57:40.395 * +slave slave 10.194.250.140:6379 10.194.250.140 6379 @ mymaster 10.99.13.107 6379
[28069] 17 Jun 22:58:50.896 - Client closed connection
[28069] 17 Jun 22:59:00.940 # +sdown sentinel 10.99.13.107:26379 10.99.13.107 26379 @ mymaster 10.99.13.107 6379
[28069] 17 Jun 22:59:01.196 # +sdown master mymaster 10.99.13.107 6379
[28069] 17 Jun 22:59:01.196 # +odown master mymaster 10.99.13.107 6379 #quorum 1/1
[28069] 17 Jun 22:59:01.196 # +new-epoch 284
[28069] 17 Jun 22:59:01.196 # +try-failover master mymaster 10.99.13.107 6379
[28069] 17 Jun 22:59:01.203 # +vote-for-leader 7d08ab54ddce7931c745459996aa0cf1e33f98c1 284
[28069] 17 Jun 22:59:11.538 # -failover-abort-not-elected master mymaster 10.99.13.107 6379
[28069] 17 Jun 22:59:11.628 # Next failover delay: I will not start a failover before Tue Jun 17 22:59:21 2014
[28069] 17 Jun 22:59:21.475 # +new-epoch 285
[28069] 17 Jun 22:59:21.475 # +try-failover master mymaster 10.99.13.107 6379
[28069] 17 Jun 22:59:21.482 # +vote-for-leader 7d08ab54ddce7931c745459996aa0cf1e33f98c1 285
[28069] 17 Jun 22:59:32.449 # -failover-abort-not-elected master mymaster 10.99.13.107 6379
[28069] 17 Jun 22:59:32.525 # Next failover delay: I will not start a failover before Tue Jun 17 22:59:42 2014

etc. etc. it keeps retrying. As you can see from this log it knows about the slave (+slave). It also knows about the master going down ("+sdown master mymaster" and "+odown master mymaster"). So why does it keep doing "+try-failover master mymaster 10.99.13.107 6379" and elects the master that it knows is down?

redis-server --version: Redis server v=2.8.10 sha=00000000:0 malloc=tcmalloc-2.0 bits=64 build=176d015270bbec54

@icyice80
Copy link

check the redis sentinel doc, based on your master/slave config, u need to have 3 sentinels min, quroum is 2. so when 1 sentinel goes down, the other 2 could elect leader, then kick in the failover process.

@antirez
Copy link
Contributor

antirez commented Jun 18, 2014

Or... a single one if you don't care about Sentinel being a single point of failure (discouraged approach). However note that if you go for three, you need to setup Sentinel in three different computers (or virtual machines) that are likely to fail independently, otherwise you have a setup that is only valid under the assumption of single processes failing (like Redis server crashing) but not working on netsplits, since two or more Sentinels will run into the same physical host (so will always get partitioned together).

@mattsta mattsta closed this as completed Oct 29, 2014
@jeuniii
Copy link

jeuniii commented Oct 29, 2017

@DutchMark So did you figure out what the issue was ? Im having the exact same issue as you. My quorum is set to 1 since Im just testing it out. Eventually ill have 3 separate nodes with quorum set to 1.

@feigyfroilich
Copy link

Why is it closed? I am facing the same issue.
@DutchMark Have you find a solution ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants