[Inconsistent cluster state] Slave Migrates to new master but does not migrate back #3054

irfanurrehman · 2016-02-01T13:09:33Z

A very peculiar bug, albeit easy to replicate, found in some in-house tests for production

I have checked it on 3.0.5 and the latest build from 3.0 branch in github

Steps are as below:

build a cluster with 6 nodes and replication turned on

./redis-trib.rb create --replicas 1 192.168.10.25:8000 192.168.10.25:8001 192.168.10.25:8002

192.168.10.25:8003 192.168.10.25:8004 192.168.10.25:8005

the cluster will have 3 masters and 3 slaves replicating each of the masters

192.168.10.25:8004> cluster nodes
74efdfbbacd99745a27d43aabce947d80d3a9051 192.168.10.25:8002 master - 0 1454072496887 3 connected 10923-16383
6d367efab8a48baf7d1c0e924049e86099dbb272 192.168.10.25:8000 master - 0 1454072495885 1 connected 0-5460
478e1f5a49363ff8a78dd4192cee0389c9344763 192.168.10.25:8003 slave 6d367efab8a48baf7d1c0e924049e86099dbb272 0 1454072497888 4 connected
bcb8b8d4b4349860fc2d7fea2a0d99e07d12ab7a 192.168.10.25:8005 slave 74efdfbbacd99745a27d43aabce947d80d3a9051 0 1454072497088 6 connected
532b72609d8264caeab21fc39bb380b98b81cc34 192.168.10.25:8004 myself,slave 07598e66b97494e18dd55ce3c8cd44d6ace0a2c0 0 0 5 connected
07598e66b97494e18dd55ce3c8cd44d6ace0a2c0 192.168.10.25:8001 master - 0 1454072497389 2 connected 5461-10922

slots info

192.168.10.25:8004> cluster slots

1. (integer) 10923
2. (integer) 16383
3. 1. "192.168.10.25"
  2. (integer) 8002
4. 1. "192.168.10.25"
  2. (integer) 8005
1. (integer) 0
2. (integer) 5460
3. 1. "192.168.10.25"
  2. (integer) 8000
4. 1. "192.168.10.25"
  2. (integer) 8003
1. (integer) 5461
2. (integer) 10922
3. 1. "192.168.10.25"
  2. (integer) 8001
4. 1. "192.168.10.25"
  2. (integer) 8004

migrate all slots of a particular master to another master (from node on port 8000 to node on port 8001 in this case)

./redis-trib.rb reshard --from 6d367efab8a48baf7d1c0e924049e86099dbb272 --to 07598e66b97494e18dd55ce3c8cd44d6ace0a2c0 --slots 5461 --yes 192.168.10.25:8001

the slave of the original slot holder also migrates

slot info

1. (integer) 10923
2. (integer) 16383
3. 1. "192.168.10.25"
  2. (integer) 8002
4. 1. "192.168.10.25"
  2. (integer) 8005
1. (integer) 0
2. (integer) 10922
3. 1. "192.168.10.25"
  2. (integer) 8001
4. 1. "192.168.10.25"
  2. (integer) 8004
5. 1. "192.168.10.25"
  2. (integer) 8003

node info (node on port 8001 has 2 slaves now and that on 8000 is left without slaves)

74efdfbbacd99745a27d43aabce947d80d3a9051 192.168.10.25:8002 master - 0 1454072791540 3 connected 10923-16383
6d367efab8a48baf7d1c0e924049e86099dbb272 192.168.10.25:8000 master - 0 1454072792041 1 connected
478e1f5a49363ff8a78dd4192cee0389c9344763 192.168.10.25:8003 slave 07598e66b97494e18dd55ce3c8cd44d6ace0a2c0 0 1454072792542 7 connected
bcb8b8d4b4349860fc2d7fea2a0d99e07d12ab7a 192.168.10.25:8005 slave 74efdfbbacd99745a27d43aabce947d80d3a9051 0 1454072789534 6 connected
532b72609d8264caeab21fc39bb380b98b81cc34 192.168.10.25:8004 myself,slave 07598e66b97494e18dd55ce3c8cd44d6ace0a2c0 0 0 5 connected
07598e66b97494e18dd55ce3c8cd44d6ace0a2c0 192.168.10.25:8001 master - 0 1454072790538 7 connected 0-10922

migrate all the slots back

./redis-trib.rb reshard --from 07598e66b97494e18dd55ce3c8cd44d6ace0a2c0 --to 6d367efab8a48baf7d1c0e924049e86099dbb272 --slots 5461 --yes 192.168.10.25:8001

the slave doesnt migrate back

slot info

1. (integer) 10923
2. (integer) 16383
3. 1. "192.168.10.25"
  2. (integer) 8002
4. 1. "192.168.10.25"
  2. (integer) 8005
1. (integer) 0
2. (integer) 5460
3. 1. "192.168.10.25"
  2. (integer) 8000
1. (integer) 5461
2. (integer) 10922
3. 1. "192.168.10.25"
  2. (integer) 8001
4. 1. "192.168.10.25"
  2. (integer) 8004
5. 1. "192.168.10.25"
  2. (integer) 8003

node info

74efdfbbacd99745a27d43aabce947d80d3a9051 192.168.10.25:8002 master - 0 1454072917769 3 connected 10923-16383
6d367efab8a48baf7d1c0e924049e86099dbb272 192.168.10.25:8000 master - 0 1454072917769 8 connected 0-5460
478e1f5a49363ff8a78dd4192cee0389c9344763 192.168.10.25:8003 slave 07598e66b97494e18dd55ce3c8cd44d6ace0a2c0 0 1454072918772 7 connected
bcb8b8d4b4349860fc2d7fea2a0d99e07d12ab7a 192.168.10.25:8005 slave 74efdfbbacd99745a27d43aabce947d80d3a9051 0 1454072916766 6 connected
532b72609d8264caeab21fc39bb380b98b81cc34 192.168.10.25:8004 myself,slave 07598e66b97494e18dd55ce3c8cd44d6ace0a2c0 0 0 5 connected
07598e66b97494e18dd55ce3c8cd44d6ace0a2c0 192.168.10.25:8001 master - 0 1454072916766 7 connected 5461-10922

If I migrate all slots from the remaining master and migrate all of them back then the cluster will have one master
with all 3 slaves and other 2 masters with no slaves, whereas the slots are all equally distributed.

I can't literally say, what should be the expected behavour or what not, but logical points as

below:

either no mechanism, which detects "oh all slots migrated" so lets migrate the slave
or if above mechanism is in place then, if the original master gets some/all slots back, it

should have some slave also (not necessarily the original one), if there are enough slaves

NOW THE PECULIER BITS.. :)

Which I found after spending some more time to understand the issue, and possibly useful for

correct/clear analysis.

The issue does not exist on 3.0.5 (with redis-trib.rb from 3.0.5). The slave does not migrate

in the first place (even if master loses all the slots), but this code exists:

in function clusterUpdateSlotsConfigWith()

/* If at least one slot was reassigned from a node to another node
* with a greater configEpoch, it is possible that:
* 1) We are a master left without slots. This means that we were
* failed over and we should turn into a replica of the new
* master.
* 2) We are a slave and our master is left without slots. We need
* to replicate to the new slots owner. */
if (newmaster && curmaster->numslots == 0) {
redisLog(REDIS_WARNING,
"Configuration change detected. Reconfiguring myself "
"as a replica of %.40s", sender->name);
clusterSetMaster(sender);
clusterDoBeforeSleep(CLUSTER_TODO_SAVE_CONFIG|
CLUSTER_TODO_UPDATE_STATE|
CLUSTER_TODO_FSYNC_CONFIG);

The issue happens on the latest build from 3.2 branch, (with redis-trib.rb from latest of 3.2

branch). The slave migrates when all slots are moved but does not migrate back, when slots are

moved back, but this code exists:

in function clusterCron()

/* Orphaned master check, useful only if the current instance
     * is a slave that may migrate to another master. */
    if (nodeIsSlave(myself) && nodeIsMaster(node) && !nodeFailed(node)) {
        int okslaves = clusterCountNonFailingSlaves(node);

        /* A master is orphaned if it is serving a non-zero number of
         * slots, have no working slaves, but used to have at least one
         * slave, or failed over a master that used to have slaves. */
        if (okslaves == 0 && node->numslots > 0 &&
            node->flags & REDIS_NODE_MIGRATE_TO)
        {
            orphaned_masters++;
        }
        if (okslaves > max_slaves) max_slaves = okslaves;
        if (nodeIsSlave(myself) && myself->slaveof == node)
            this_slaves = okslaves;
    }

The most peculier bit.
The difference in behaviour is because of some update in redis-trib.rb

the move slot flow is
set slot to receiving in destination
set slot to migrating in source
actual setslot on all nodes (or only on master nodes) <<-- this is the difference

if "cluster setslot node " is done only on master nodes, the first behaviour is

observed (slave migrates, but does not migrate)
if "cluster setslot node " is done all nodes, the second behaviour is observed

(slave does not migrate in the first place)

The above is consistent from 3.0.5 onward.. :-)

execute final setslot only in masters was introduced somewhere after 3.0.6
in redis-trib.rb
move_slot...

    # Set the new node as the owner of the slot in all the known nodes.
    if !o[:cold]
        @nodes.each{|n|
      **  _    next if n.has_flag?("slave")_**
            n.r.cluster("setslot",slot,"node",target.info[:name])
        }
    end

I am guessing both migrate and migrate back should happen as per server code, but eventual

percolation of the info across the cluster doesn't happen/gets overridden by older info; but I

am just guessing..!

Hope the above would be of some use to resolve this.

I believe the system info would not be much relevant, all the config details are as below

(ports will change)

port 8000
dir ./
bind 192.168.10.25
dbfilename redis-2-0.rdb
pidfile ./rdbredis-2-0.pid
logfile ./rdbredis-2-0.log
syslog-ident test-db1
daemonize yes
cluster-enabled yes
cluster-config-file nodes.conf
cluster-node-timeout 7000
tcp-backlog 511

Close the connection after a client is idle for N seconds (0 to disable)

timeout 0

A reasonable value for this option is 60 seconds.

tcp-keepalive 0
slave-serve-stale-data yes

administrative / dangerous commands.

slave-read-only no

works better.

repl-diskless-sync no

it entirely just set it to 0 seconds and the transfer will start ASAP.

repl-diskless-sync-delay 5

be a good idea.

repl-disable-tcp-nodelay no

By default the priority is 100.

slave-priority 100
appendonly no

appendfsync always

appendfsync everysec
no-appendfsync-on-rewrite no
auto-aof-rewrite-percentage 100
auto-aof-rewrite-min-size 64mb
aof-load-truncated yes
lua-time-limit 5000
slowlog-log-slower-than 10000
slowlog-max-len 128
latency-monitor-threshold 0
notify-keyspace-events ""
hash-max-ziplist-entries 512
hash-max-ziplist-value 64
list-max-ziplist-entries 512
list-max-ziplist-value 64
set-max-intset-entries 512
zset-max-ziplist-entries 128
zset-max-ziplist-value 64
hll-sparse-max-bytes 3000
activerehashing yes
client-output-buffer-limit normal 0 0 0
client-output-buffer-limit slave 256mb 64mb 60
client-output-buffer-limit pubsub 32mb 8mb 60
hz 10
aof-rewrite-incremental-fsync yes

The text was updated successfully, but these errors were encountered:

irfanurrehman · 2016-02-01T13:11:30Z

wrongly raised while refreshing browser from an older state.. hence closing it immediately

antirez · 2016-02-01T13:12:12Z

no problem, I replied in the original issue in the meantime.

irfanurrehman closed this as completed Feb 1, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Inconsistent cluster state] Slave Migrates to new master but does not migrate back #3054

[Inconsistent cluster state] Slave Migrates to new master but does not migrate back #3054

irfanurrehman commented Feb 1, 2016

irfanurrehman commented Feb 1, 2016

antirez commented Feb 1, 2016

[Inconsistent cluster state] Slave Migrates to new master but does not migrate back #3054

[Inconsistent cluster state] Slave Migrates to new master but does not migrate back #3054

Comments

irfanurrehman commented Feb 1, 2016

build a cluster with 6 nodes and replication turned on

the cluster will have 3 masters and 3 slaves replicating each of the masters

slots info

migrate all slots of a particular master to another master (from node on port 8000 to node on port 8001 in this case)

the slave of the original slot holder also migrates

slot info

node info (node on port 8001 has 2 slaves now and that on 8000 is left without slaves)

migrate all the slots back

the slave doesnt migrate back

slot info

node info

NOW THE PECULIER BITS.. :)

Close the connection after a client is idle for N seconds (0 to disable)

A reasonable value for this option is 60 seconds.

administrative / dangerous commands.

works better.

it entirely just set it to 0 seconds and the transfer will start ASAP.

be a good idea.

By default the priority is 100.

appendfsync always

irfanurrehman commented Feb 1, 2016

antirez commented Feb 1, 2016