You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
If I migrate all slots from the remaining master and migrate all of them back then the cluster will have one master
with all 3 slaves and other 2 masters with no slaves, whereas the slots are all equally distributed.
I can't literally say, what should be the expected behavour or what not, but logical points as
below:
either no mechanism, which detects "oh all slots migrated" so lets migrate the slave
or if above mechanism is in place then, if the original master gets some/all slots back, it
should have some slave also (not necessarily the original one), if there are enough slaves
NOW THE PECULIER BITS.. :)
Which I found after spending some more time to understand the issue, and possibly useful for
correct/clear analysis.
The issue does not exist on 3.0.5 (with redis-trib.rb from 3.0.5). The slave does not migrate
in the first place (even if master loses all the slots), but this code exists:
in function clusterUpdateSlotsConfigWith()
/* If at least one slot was reassigned from a node to another node
* with a greater configEpoch, it is possible that:
* 1) We are a master left without slots. This means that we were
* failed over and we should turn into a replica of the new
* master.
* 2) We are a slave and our master is left without slots. We need
* to replicate to the new slots owner. */
if (newmaster && curmaster->numslots == 0) {
redisLog(REDIS_WARNING,
"Configuration change detected. Reconfiguring myself "
"as a replica of %.40s", sender->name);
clusterSetMaster(sender);
clusterDoBeforeSleep(CLUSTER_TODO_SAVE_CONFIG|
CLUSTER_TODO_UPDATE_STATE|
CLUSTER_TODO_FSYNC_CONFIG);
The issue happens on the latest build from 3.2 branch, (with redis-trib.rb from latest of 3.2
branch). The slave migrates when all slots are moved but does not migrate back, when slots are
moved back, but this code exists:
in function clusterCron()
/* Orphaned master check, useful only if the current instance
* is a slave that may migrate to another master. */
if (nodeIsSlave(myself) && nodeIsMaster(node) && !nodeFailed(node)) {
int okslaves = clusterCountNonFailingSlaves(node);
/* A master is orphaned if it is serving a non-zero number of
* slots, have no working slaves, but used to have at least one
* slave, or failed over a master that used to have slaves. */
if (okslaves == 0 && node->numslots > 0 &&
node->flags & REDIS_NODE_MIGRATE_TO)
{
orphaned_masters++;
}
if (okslaves > max_slaves) max_slaves = okslaves;
if (nodeIsSlave(myself) && myself->slaveof == node)
this_slaves = okslaves;
}
The most peculier bit.
The difference in behaviour is because of some update in redis-trib.rb
the move slot flow is
set slot to receiving in destination
set slot to migrating in source
actual setslot on all nodes (or only on master nodes) <<-- this is the difference
if "cluster setslot node " is done only on master nodes, the first behaviour is
observed (slave migrates, but does not migrate)
if "cluster setslot node " is done all nodes, the second behaviour is observed
(slave does not migrate in the first place)
The above is consistent from 3.0.5 onward.. :-)
execute final setslot only in masters was introduced somewhere after 3.0.6
in redis-trib.rb
move_slot...
# Set the new node as the owner of the slot in all the known nodes.
if !o[:cold]
@nodes.each{|n|
** _ next if n.has_flag?("slave")_**
n.r.cluster("setslot",slot,"node",target.info[:name])
}
end
I am guessing both migrate and migrate back should happen as per server code, but eventual
percolation of the info across the cluster doesn't happen/gets overridden by older info; but I
am just guessing..!
Hope the above would be of some use to resolve this.
I believe the system info would not be much relevant, all the config details are as below
A very peculiar bug, albeit easy to replicate, found in some in-house tests for production
I have checked it on 3.0.5 and the latest build from 3.0 branch in github
build a cluster with 6 nodes and replication turned on
./redis-trib.rb create --replicas 1 192.168.10.25:8000 192.168.10.25:8001 192.168.10.25:8002
192.168.10.25:8003 192.168.10.25:8004 192.168.10.25:8005
the cluster will have 3 masters and 3 slaves replicating each of the masters
192.168.10.25:8004> cluster nodes
74efdfbbacd99745a27d43aabce947d80d3a9051 192.168.10.25:8002 master - 0 1454072496887 3 connected 10923-16383
6d367efab8a48baf7d1c0e924049e86099dbb272 192.168.10.25:8000 master - 0 1454072495885 1 connected 0-5460
478e1f5a49363ff8a78dd4192cee0389c9344763 192.168.10.25:8003 slave 6d367efab8a48baf7d1c0e924049e86099dbb272 0 1454072497888 4 connected
bcb8b8d4b4349860fc2d7fea2a0d99e07d12ab7a 192.168.10.25:8005 slave 74efdfbbacd99745a27d43aabce947d80d3a9051 0 1454072497088 6 connected
532b72609d8264caeab21fc39bb380b98b81cc34 192.168.10.25:8004 myself,slave 07598e66b97494e18dd55ce3c8cd44d6ace0a2c0 0 0 5 connected
07598e66b97494e18dd55ce3c8cd44d6ace0a2c0 192.168.10.25:8001 master - 0 1454072497389 2 connected 5461-10922
slots info
192.168.10.25:8004> cluster slots
migrate all slots of a particular master to another master (from node on port 8000 to node on port 8001 in this case)
./redis-trib.rb reshard --from 6d367efab8a48baf7d1c0e924049e86099dbb272 --to 07598e66b97494e18dd55ce3c8cd44d6ace0a2c0 --slots 5461 --yes 192.168.10.25:8001
the slave of the original slot holder also migrates
slot info
node info (node on port 8001 has 2 slaves now and that on 8000 is left without slaves)
74efdfbbacd99745a27d43aabce947d80d3a9051 192.168.10.25:8002 master - 0 1454072791540 3 connected 10923-16383
6d367efab8a48baf7d1c0e924049e86099dbb272 192.168.10.25:8000 master - 0 1454072792041 1 connected
478e1f5a49363ff8a78dd4192cee0389c9344763 192.168.10.25:8003 slave 07598e66b97494e18dd55ce3c8cd44d6ace0a2c0 0 1454072792542 7 connected
bcb8b8d4b4349860fc2d7fea2a0d99e07d12ab7a 192.168.10.25:8005 slave 74efdfbbacd99745a27d43aabce947d80d3a9051 0 1454072789534 6 connected
532b72609d8264caeab21fc39bb380b98b81cc34 192.168.10.25:8004 myself,slave 07598e66b97494e18dd55ce3c8cd44d6ace0a2c0 0 0 5 connected
07598e66b97494e18dd55ce3c8cd44d6ace0a2c0 192.168.10.25:8001 master - 0 1454072790538 7 connected 0-10922
migrate all the slots back
./redis-trib.rb reshard --from 07598e66b97494e18dd55ce3c8cd44d6ace0a2c0 --to 6d367efab8a48baf7d1c0e924049e86099dbb272 --slots 5461 --yes 192.168.10.25:8001
the slave doesnt migrate back
slot info
node info
74efdfbbacd99745a27d43aabce947d80d3a9051 192.168.10.25:8002 master - 0 1454072917769 3 connected 10923-16383
6d367efab8a48baf7d1c0e924049e86099dbb272 192.168.10.25:8000 master - 0 1454072917769 8 connected 0-5460
478e1f5a49363ff8a78dd4192cee0389c9344763 192.168.10.25:8003 slave 07598e66b97494e18dd55ce3c8cd44d6ace0a2c0 0 1454072918772 7 connected
bcb8b8d4b4349860fc2d7fea2a0d99e07d12ab7a 192.168.10.25:8005 slave 74efdfbbacd99745a27d43aabce947d80d3a9051 0 1454072916766 6 connected
532b72609d8264caeab21fc39bb380b98b81cc34 192.168.10.25:8004 myself,slave 07598e66b97494e18dd55ce3c8cd44d6ace0a2c0 0 0 5 connected
07598e66b97494e18dd55ce3c8cd44d6ace0a2c0 192.168.10.25:8001 master - 0 1454072916766 7 connected 5461-10922
If I migrate all slots from the remaining master and migrate all of them back then the cluster will have one master
with all 3 slaves and other 2 masters with no slaves, whereas the slots are all equally distributed.
I can't literally say, what should be the expected behavour or what not, but logical points as
below:
should have some slave also (not necessarily the original one), if there are enough slaves
NOW THE PECULIER BITS.. :)
Which I found after spending some more time to understand the issue, and possibly useful for
correct/clear analysis.
The issue does not exist on 3.0.5 (with redis-trib.rb from 3.0.5). The slave does not migrate
in the first place (even if master loses all the slots), but this code exists:
in function clusterUpdateSlotsConfigWith()
/* If at least one slot was reassigned from a node to another node
* with a greater configEpoch, it is possible that:
* 1) We are a master left without slots. This means that we were
* failed over and we should turn into a replica of the new
* master.
* 2) We are a slave and our master is left without slots. We need
* to replicate to the new slots owner. */
if (newmaster && curmaster->numslots == 0) {
redisLog(REDIS_WARNING,
"Configuration change detected. Reconfiguring myself "
"as a replica of %.40s", sender->name);
clusterSetMaster(sender);
clusterDoBeforeSleep(CLUSTER_TODO_SAVE_CONFIG|
CLUSTER_TODO_UPDATE_STATE|
CLUSTER_TODO_FSYNC_CONFIG);
The issue happens on the latest build from 3.2 branch, (with redis-trib.rb from latest of 3.2
branch). The slave migrates when all slots are moved but does not migrate back, when slots are
moved back, but this code exists:
in function clusterCron()
The most peculier bit.
The difference in behaviour is because of some update in redis-trib.rb
the move slot flow is
set slot to receiving in destination
set slot to migrating in source
actual setslot on all nodes (or only on master nodes) <<-- this is the difference
if "cluster setslot node " is done only on master nodes, the first behaviour is
observed (slave migrates, but does not migrate)
if "cluster setslot node " is done all nodes, the second behaviour is observed
(slave does not migrate in the first place)
The above is consistent from 3.0.5 onward.. :-)
execute final setslot only in masters was introduced somewhere after 3.0.6
in redis-trib.rb
move_slot...
I am guessing both migrate and migrate back should happen as per server code, but eventual
percolation of the info across the cluster doesn't happen/gets overridden by older info; but I
am just guessing..!
Hope the above would be of some use to resolve this.
I believe the system info would not be much relevant, all the config details are as below
(ports will change)
port 8000
dir ./
bind 192.168.10.25
dbfilename redis-2-0.rdb
pidfile ./rdbredis-2-0.pid
logfile ./rdbredis-2-0.log
syslog-ident test-db1
daemonize yes
cluster-enabled yes
cluster-config-file nodes.conf
cluster-node-timeout 7000
tcp-backlog 511
Close the connection after a client is idle for N seconds (0 to disable)
timeout 0
A reasonable value for this option is 60 seconds.
tcp-keepalive 0
slave-serve-stale-data yes
administrative / dangerous commands.
slave-read-only no
works better.
repl-diskless-sync no
it entirely just set it to 0 seconds and the transfer will start ASAP.
repl-diskless-sync-delay 5
be a good idea.
repl-disable-tcp-nodelay no
By default the priority is 100.
slave-priority 100
appendonly no
appendfsync always
appendfsync everysec
no-appendfsync-on-rewrite no
auto-aof-rewrite-percentage 100
auto-aof-rewrite-min-size 64mb
aof-load-truncated yes
lua-time-limit 5000
slowlog-log-slower-than 10000
slowlog-max-len 128
latency-monitor-threshold 0
notify-keyspace-events ""
hash-max-ziplist-entries 512
hash-max-ziplist-value 64
list-max-ziplist-entries 512
list-max-ziplist-value 64
set-max-intset-entries 512
zset-max-ziplist-entries 128
zset-max-ziplist-value 64
hll-sparse-max-bytes 3000
activerehashing yes
client-output-buffer-limit normal 0 0 0
client-output-buffer-limit slave 256mb 64mb 60
client-output-buffer-limit pubsub 32mb 8mb 60
hz 10
aof-rewrite-incremental-fsync yes
The text was updated successfully, but these errors were encountered: