New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sentinel does not trigger failover in case of master node reboot #1297

Open
kaarelk opened this Issue Sep 24, 2013 · 10 comments

Comments

Projects
None yet
4 participants
@kaarelk

kaarelk commented Sep 24, 2013

Master reboot is detected by sentinel but it doesn't trigger failover.

[7678] 24 Sep 10:53:16.164 * +reboot master redis1 192.168.250.128 2148

Keeping old master will create possibility for data loss as slave sync is usually faster that persistence. Using AOF and syncing to disc with every write is the only case when master should have newer data.

@jasongoodwin

This comment has been minimized.

Show comment
Hide comment
@jasongoodwin

jasongoodwin commented Oct 21, 2013

See also: #1281

@jasongoodwin

This comment has been minimized.

Show comment
Hide comment
@jasongoodwin

jasongoodwin Oct 21, 2013

See also: #1281
I agree - I don't like this behaviour - it is fairly high risk in production. I slave restarting in production could take down an app if someone doesn't understand this behaviour.

The current solution I'm moving toward is to use a client (or extend an existing redis client in my case as there aren't many redis clients with sentinel support out there) to subscribe to the -slave-restart-as-master pubsub channel. That message should be treated the same as a -failover-end message triggering the reconfiguration of the client's state.

The problem is if you're using the reconfigure script hooks and someone triggers a slave-restart-as-master, your production system could get into a bad state as that hook isn't called on -slave-restart-as-master.

This behaviour really needs to be documented in the redis-sentinel documentation.

jasongoodwin commented Oct 21, 2013

See also: #1281
I agree - I don't like this behaviour - it is fairly high risk in production. I slave restarting in production could take down an app if someone doesn't understand this behaviour.

The current solution I'm moving toward is to use a client (or extend an existing redis client in my case as there aren't many redis clients with sentinel support out there) to subscribe to the -slave-restart-as-master pubsub channel. That message should be treated the same as a -failover-end message triggering the reconfiguration of the client's state.

The problem is if you're using the reconfigure script hooks and someone triggers a slave-restart-as-master, your production system could get into a bad state as that hook isn't called on -slave-restart-as-master.

This behaviour really needs to be documented in the redis-sentinel documentation.

@nikolayspb

This comment has been minimized.

Show comment
Hide comment
@nikolayspb

nikolayspb Oct 3, 2017

@antirez
this is still happening even in 4.0.1 version.

1:X 03 Oct 17:43:05.094 * +reboot master master-name 10.10.10.243 6379

and data got lost.

nikolayspb commented Oct 3, 2017

@antirez
this is still happening even in 4.0.1 version.

1:X 03 Oct 17:43:05.094 * +reboot master master-name 10.10.10.243 6379

and data got lost.

@antirez

This comment has been minimized.

Show comment
Hide comment
@antirez

antirez Oct 3, 2017

Owner

I've the feeling that, by default, failing over on reboot is a pretty bold move, however I can definitely see that there is a case to have an option for this... However please, note that there was also a problem with AOF leading to AOF persistence problems on reboots, usually the master will have all the data if it's not an hard reboot, like power outage. However this was not the case because of a bug, so even gentle shutdowns created an issue with AOF data written in the latest second. Btw I'm planning to add such a feature regardless of the fact that the AOF problem should be gone now in latest 4.0.x, 3.2.x.

Owner

antirez commented Oct 3, 2017

I've the feeling that, by default, failing over on reboot is a pretty bold move, however I can definitely see that there is a case to have an option for this... However please, note that there was also a problem with AOF leading to AOF persistence problems on reboots, usually the master will have all the data if it's not an hard reboot, like power outage. However this was not the case because of a bug, so even gentle shutdowns created an issue with AOF data written in the latest second. Btw I'm planning to add such a feature regardless of the fact that the AOF problem should be gone now in latest 4.0.x, 3.2.x.

@antirez antirez added this to the Urgent milestone Oct 3, 2017

@antirez

This comment has been minimized.

Show comment
Hide comment
@antirez

antirez Oct 3, 2017

Owner

Btw @nikolayspb 4.0.1 still had that bug. 4.0.2 is the first one with the fix...

Owner

antirez commented Oct 3, 2017

Btw @nikolayspb 4.0.1 still had that bug. 4.0.2 is the first one with the fix...

@nikolayspb

This comment has been minimized.

Show comment
Hide comment
@nikolayspb

nikolayspb Oct 3, 2017

Ok, I will use 4.0.2. I don't use AOF/RDB because Redis would use unexpected amount of memory because of spawned child process. So it either hangs VM if swap enabled or gets killed by OOM killer.

nikolayspb commented Oct 3, 2017

Ok, I will use 4.0.2. I don't use AOF/RDB because Redis would use unexpected amount of memory because of spawned child process. So it either hangs VM if swap enabled or gets killed by OOM killer.

@kaarelk

This comment has been minimized.

Show comment
Hide comment
@kaarelk

kaarelk Oct 3, 2017

I would assume that slave has more recent data in memory than master has on disc, as disc flushes are done with an interval (usually 1sec), but data is pushed to slave immediately.

kaarelk commented Oct 3, 2017

I would assume that slave has more recent data in memory than master has on disc, as disc flushes are done with an interval (usually 1sec), but data is pushed to slave immediately.

@antirez

This comment has been minimized.

Show comment
Hide comment
@antirez

antirez Oct 3, 2017

Owner

@kaarelk we had this exact conversation a few minutes ago in an internal chat about this issue. Things are a bit more complex, copying and pasting what I wrote:

Only in a very odd condition of an immediate power outage basically,
because while fsync() happens every second, the write to flush data on
disk should happen continuously.
[This is the moment when after writing a sentence you go to check the
code... wait a minute please :-)]

Ok I said the truth :-) Basically the function flushAppendOnlyFile()
is called in the beforeSleep() function, so every time the event loop
returns, after serving the clients that were already ready to serve,
we write the data on disk. However writing just as in write(2) call,
the flushing only happens every second. So if Redis is rebooted in any
way but sudden kernel crash or power outage, the master should have
all the data. Examples: Redis crashes, Redis is manually SHUTDOWN, the
kernel kills Redis for OOM, the Virtual Machine or computer host is
cleanly shutdown and restarted by the provider. For this reason to
failover on reboot by default looks a bit too harsh, because we detect
reboot when the master is already available, so in the first
instance the non-availability time configured was longer, now the
master is working again, why to trigger a failover at this point? Only
reason is that it may have less data but it may very well have more
now that AOF is fixed :-D

Owner

antirez commented Oct 3, 2017

@kaarelk we had this exact conversation a few minutes ago in an internal chat about this issue. Things are a bit more complex, copying and pasting what I wrote:

Only in a very odd condition of an immediate power outage basically,
because while fsync() happens every second, the write to flush data on
disk should happen continuously.
[This is the moment when after writing a sentence you go to check the
code... wait a minute please :-)]

Ok I said the truth :-) Basically the function flushAppendOnlyFile()
is called in the beforeSleep() function, so every time the event loop
returns, after serving the clients that were already ready to serve,
we write the data on disk. However writing just as in write(2) call,
the flushing only happens every second. So if Redis is rebooted in any
way but sudden kernel crash or power outage, the master should have
all the data. Examples: Redis crashes, Redis is manually SHUTDOWN, the
kernel kills Redis for OOM, the Virtual Machine or computer host is
cleanly shutdown and restarted by the provider. For this reason to
failover on reboot by default looks a bit too harsh, because we detect
reboot when the master is already available, so in the first
instance the non-availability time configured was longer, now the
master is working again, why to trigger a failover at this point? Only
reason is that it may have less data but it may very well have more
now that AOF is fixed :-D

@nikolayspb

This comment has been minimized.

Show comment
Hide comment
@nikolayspb

nikolayspb Oct 3, 2017

So If I don't use AOF/RDB I would still lose the whole data? Master would enforce slaves to erase all entries.

nikolayspb commented Oct 3, 2017

So If I don't use AOF/RDB I would still lose the whole data? Master would enforce slaves to erase all entries.

@antirez

This comment has been minimized.

Show comment
Hide comment
@antirez

antirez Oct 3, 2017

Owner

@nikolayspb yes, a master configured to return back without data is dangerous. In the documentation, what we do is to advise about to configure persistence-less setups so that on restart the machine does not automatically restarts the Redis instance, so that instances cannot recover from reboot. Note that this feature in Sentinel would not avoid this problem of persistence-less masters: the slaves will connect back and will empty. There is a proposed feature called "protected mode" in order to address this problem. It consists of an explicit Redis feature that avoids masters to be available on restarts, but to "unblock" only once they are reconfigured as slaves of some other master, in this way a restart will always trigger a failover. However this feature was conceived to work in certain ways but now probably there are better implementations of the same idea with PSYNC. About RDB persistence that you mentioned, in that case, yep, it could be ways better to promote a slave if the master reboots, but right now the problem is always the same as persistence-less, but just at a lesser degree of severity: slaves will reconnect once the master is up again. This can be avoided with a delay in the startup script of the Redis instances so that the Sentinel failover time is smaller and triggers before, or could be another good use case for "protected mode". In practical terms, in order to provide much more reliability to no-persistence or RDB-persistence HA setups, we could just:

  1. Add a PSYNC2 feature that does not allow to replicate from masters that have a replication ID which is an unknown history or is backward compared to the current history. This is a simple addition.
  2. Also implement the feature proposed here of detecting a restart as a permanent failure, in case of masters.

With such additions slaves would fail to connect to restarted masters unless they have all the data needed (but in the specific case of a master just turned into slave, so without prior master history knowledge), and masters would be failed over on restart even if they restarted fast enough to not even trigger the Sentinel down-after-milliseconds limit.

Owner

antirez commented Oct 3, 2017

@nikolayspb yes, a master configured to return back without data is dangerous. In the documentation, what we do is to advise about to configure persistence-less setups so that on restart the machine does not automatically restarts the Redis instance, so that instances cannot recover from reboot. Note that this feature in Sentinel would not avoid this problem of persistence-less masters: the slaves will connect back and will empty. There is a proposed feature called "protected mode" in order to address this problem. It consists of an explicit Redis feature that avoids masters to be available on restarts, but to "unblock" only once they are reconfigured as slaves of some other master, in this way a restart will always trigger a failover. However this feature was conceived to work in certain ways but now probably there are better implementations of the same idea with PSYNC. About RDB persistence that you mentioned, in that case, yep, it could be ways better to promote a slave if the master reboots, but right now the problem is always the same as persistence-less, but just at a lesser degree of severity: slaves will reconnect once the master is up again. This can be avoided with a delay in the startup script of the Redis instances so that the Sentinel failover time is smaller and triggers before, or could be another good use case for "protected mode". In practical terms, in order to provide much more reliability to no-persistence or RDB-persistence HA setups, we could just:

  1. Add a PSYNC2 feature that does not allow to replicate from masters that have a replication ID which is an unknown history or is backward compared to the current history. This is a simple addition.
  2. Also implement the feature proposed here of detecting a restart as a permanent failure, in case of masters.

With such additions slaves would fail to connect to restarted masters unless they have all the data needed (but in the specific case of a master just turned into slave, so without prior master history knowledge), and masters would be failed over on restart even if they restarted fast enough to not even trigger the Sentinel down-after-milliseconds limit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment