Redis Cluster: set node in error mode if after a restart there are foreign keys. #115

Closed
antirez opened this Issue Oct 3, 2011 · 0 comments

Comments

Projects
None yet
1 participant
@antirez
Owner

antirez commented Oct 3, 2011

Redis Cluster nodes save the cluster configuration ASAP in a file every time a configuration change is made to the cluster schema. It may happen, especially after a resharding state, that the cluster setup is modified as we moved keys from node A to node B, for instance migrating the hash slot 100. However if A or B will not persist on disk ASAP and there is a server restart, one of the servers can start with keys that belong to hash slots no longer associated with that node.

When this happens the node should start in error state. This way redis-trib can be executed to fix the condition moving all the keys from one node to another one (the node associated with the hash slot accordingly to the current configuration).

For the same reason in Redis cluster the best persistence to use is the AOF and not RDB based persistence.

There are other alternatives. For instance when a node starts, detecting "foreing keys" (keys not associated to the node, with the current node config), it can simply update the table to map this hash slots to itself. This way we'll have the same hash slot assigned to multiple nodes in the cluster. When nodes will start propagating this information via gossip, they can automatically setup a routing table where, of N nodes having the same hash slot associated, one is the real owner, setting the slot as IMPORTING, and all the other nodes will set the table as MIGRATING, in a cascade. Like, A, B, C detect having slot 1000 associated, so they configure things like:

A: migrating to B.
B: migrating to C.
C: importing from B.

The chain can be made univoque using the node IDs in a lexicographically way sorting from smaller to bigger.
Redis-trib can later detect this setup and fix it performing migration.

The improvement with the above system is that the cluster will recover when restarted.

Definitely there is to think more about this issues.

@antirez antirez closed this Jul 22, 2013

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment