You can clone with
(Please see http://groups.google.com/group/redis-db/browse_thread/thread/67d1b0bbe7669071 for full details, here I'm just summarizing the problem with some cut&paste from that thread).
If you are familiar with the design you know that the key space is split into 4096 parts.
Every part is called an "hash slot", and every node has a routing table to map every hash slot with a cluster node.
This way if a client sends a query to a node that is not responsible for the keys mentioned in the query, it gets a -MOVED message redirecting it to the right node. However we also have the ability to reconfigure the cluster while it
is running. So for instance I've hash slot 100 that is assigned to node A. And I want to move it to node B.
This is accomplished (and redis-trib is already able to do it for you automatically) with the following steps.
1) Node A hash slot 100 is marked as "Migrating to B" (using the CLUSTER SETSLOT MIGRATING command).
2) Node B hash slot 100 is marked as "Importing from A" (using the
CLUSTER SETSLOT IMPORTING command).
3) An external client, usually redis-trib, starts using the commands
CLUSTER GETKEYSINSLOT and the MIGRATE command to atomically move keys from A to B.
What is interesting is that while the hash slot is set as "Migrating to B", node A will reply to all the requests about this hash slot of keys that are still present in the hash slot, but if a request is about a key that is in hash slot 100 but is not
found inside the key space, it generates a "-ASK" error, that is like "-MOVED" but means: please only ask this exact query to the specified node, but don't update your table about it. Ask new queries about hash
slot 100 to me again.
This way all the new keys about hash slot 100 are created directly in B, but A handles all the queries about keys that are still in A. At the same time redis-trib moves keys from A to B. Eventually all the keys are moved and the hash slot configuration is consolidated to the new one, using other CLUSTER SETSLOT subcommands.
So far this is pretty cool. But there is a subtle problem about this.
When the cluster is stable, that is, there no resharding in progress, a client may ask a query to a random node.
There is only one node that will reply to queries related to a specific hash slot. All the other nodes will redirect the client to
this node. However when rehashing is in progress there are two nodes that will reply to queries for a given hash slot, that is, the MIGRATING node and the IMPORTING node.
If the client is a "smart" client with an internal routing table, it starts every connection to a cluster asking for the slot->node map, and makes sure to update the table when -MOVED messages are received. But there are also clients that are not smart, without a table, or even clients that are smart but perhaps don't update the table since a lot of time since they are idle, and the cluster moved a lot of hash slots recently. But to make things simpler let's just focus on the stupid client that has no internal map. It just send queries to a random node among a list of configured nodes, expecting to get
redirected if the wrong node was selected.
Such a simple client is only able to deal with -MOVED and -ASK redirections. And the two messages are handled in the same way, that is, just asking to the node specified in the redirection message. It is easy to see how this client may create a race condition, like that:
1) We are migrating slot 100 from A to B.
2) Node A will only accept queries about slot 100 that are related to keys already in the key space. Otherwise it will reply with -ASK.
3) Node B instead will accept all queries about hash slot 100.
4) Our stupid client need to perform an LPUSH against a key in hash slot 100. It picks a random client.
5) If it picks "C" or "D" it will be redirected to "A". "A" in turn will redirect it to "B" with -ASK if the key is not present in A key space.
6) If it picks "B" directly the query will be accepted by "B", but what about if "A" already had that key? RACE!
1) Node B is importing hash slot 100.
2) Node B receives a query about key "foo" in hash slot 100. If it already hash "foo" the query is served. Otherwise it issues a "-MOVED" to redirect the client to A.
3) Node B however will serve the query if the client started the chat using the command "ASKING", that indicates that this query was issued after being redirected by a -ASK message. If the client comes from a -ASK redirection we are sure we can serve the client.
So in the case above what happens is that all the smart clients will have no problems, after a -ASK redirection they will send:
LPUSH foo bar
ASKING sets a flag that is cleared after the command is executed. If a client is dummy (no internal routing tables caching) but still is able to remember that after a -ASK redirection it should start the next query with ASKING, everything is fine as well.
A completely stupid client that is not able to start the chat with ASKING will simply ping/pong from A to B until the hash slot migration is completed, and will finally be served.
This was implemented but implementation still to verify. Taking the issue open for now.
Implemented a long time ago but issue was not closed, closing.