You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We are running a Redis Cluster (3.0.7) on 3 physical nodes. We have 9 master processes with 2 slaves each (18 slaves) for a total of 27 processes. We ran into a situation where 2 / 3 physical nodes crashed, resulting in an unexpected loss of 1/9 of the data in the cluster.
Here's the state of our cluster during the crash, with only one physical node left alive:
We expected that the last physical node left alive would have 9 master processes containing all of the data in the cluster. We did indeed see 9 processes, but instead of 9 masters we saw that this last node left alive had a slave process for one of the master procs (so - 8 masters, 1 slave).
So, looks like we thus lost 1/9th of the data in our cluster.
What could have gone wrong? Is it possible we did not configure something correctly and we can avoid such a situation in the future with a config change? Is it possible that there is a bug in the cluster algorithm?
There is one more possibility. Now all slots separated between 8 masters. Is it possible that Redis regrouped 9 masters to 8 masters and moved data to this 8 masters? And just made one slave for one master. It looks strange, but it's possibility how Redis could be actually fine in described flow.
The text was updated successfully, but these errors were encountered:
Spikhalskiy
changed the title
Corrupting cus
Corrupted cluster with data loss
Apr 7, 2016
Spikhalskiy
changed the title
Corrupted cluster with data loss
Corrupted RedisCluster with data loss
Apr 7, 2016
We are running a Redis Cluster (3.0.7) on 3 physical nodes. We have 9 master processes with 2 slaves each (18 slaves) for a total of 27 processes. We ran into a situation where 2 / 3 physical nodes crashed, resulting in an unexpected loss of 1/9 of the data in the cluster.
Here's the state of our cluster during the crash, with only one physical node left alive:
We expected that the last physical node left alive would have 9 master processes containing all of the data in the cluster. We did indeed see 9 processes, but instead of 9 masters we saw that this last node left alive had a slave process for one of the master procs (so - 8 masters, 1 slave).
So, looks like we thus lost 1/9th of the data in our cluster.
What could have gone wrong? Is it possible we did not configure something correctly and we can avoid such a situation in the future with a config change? Is it possible that there is a bug in the cluster algorithm?
There is one more possibility. Now all slots separated between 8 masters. Is it possible that Redis regrouped 9 masters to 8 masters and moved data to this 8 masters? And just made one slave for one master. It looks strange, but it's possibility how Redis could be actually fine in described flow.
The text was updated successfully, but these errors were encountered: