-
Notifications
You must be signed in to change notification settings - Fork 23.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unexpected resync with master #828
Comments
Cool work @jokea, thank you... I'll work on this today. |
More test on this issue: redis 127.0.0.1:6379> slaveof 10.68.6.165 6379 // unreachable OK redis 127.0.0.1:6379> slaveof 10.68.6.165 6380 // unreachable OK redis 127.0.0.1:6379> slaveof 10.68.6.165 6381 // unreachable OK redis 127.0.0.1:6379> slaveof 10.68.6.165 6382 // unreachable OK redis 127.0.0.1:6379> slaveof 127.0.0.1 6380 // the right one Wait for all connection timeout and here's the info on master: redis 127.0.0.1:6380> info Replication # Replication role:master connected_slaves:5 slave0:127.0.0.1,6379,online slave1:127.0.0.1,6379,online slave2:127.0.0.1,6379,online slave3:127.0.0.1,6379,online slave4:127.0.0.1,6379,online redis 127.0.0.1:6380> client list addr=127.0.0.1:60663 fd=5 age=345 idle=0 flags=S db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=0 omem=0 events=r cmd=sync addr=127.0.0.1:60678 fd=6 age=331 idle=0 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=32768 obl=0 oll=0 omem=0 events=r cmd=client addr=127.0.0.1:60815 fd=7 age=209 idle=0 flags=S db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=0 omem=0 events=r cmd=sync addr=127.0.0.1:46090 fd=8 age=181 idle=0 flags=S db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=0 omem=0 events=r cmd=sync addr=127.0.0.1:46098 fd=9 age=174 idle=0 flags=S db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=0 omem=0 events=r cmd=sync addr=127.0.0.1:46104 fd=10 age=169 idle=0 flags=S db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=0 omem=0 events=r cmd=sync info on slave: redis 127.0.0.1:6379> info replication # Replication role:slave master_host:127.0.0.1 master_port:6380 master_link_status:up master_last_io_seconds_ago:0 master_sync_in_progress:0 slave_priority:100 slave_read_only:1 connected_slaves:0 redis 127.0.0.1:6379> client list addr=127.0.0.1:6380 fd=10 age=384 idle=9 flags=M db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=0 omem=0 events=r cmd=ping addr=127.0.0.1:6380 fd=5 age=248 idle=9 flags=M db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=0 omem=0 events=r cmd=ping addr=127.0.0.1:6380 fd=6 age=220 idle=9 flags=M db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=0 omem=0 events=r cmd=ping addr=127.0.0.1:6380 fd=7 age=213 idle=9 flags=M db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=0 omem=0 events=r cmd=ping addr=127.0.0.1:6380 fd=8 age=208 idle=9 flags=M db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=0 omem=0 events=r cmd=ping addr=127.0.0.1:53536 fd=9 age=16 idle=0 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=32768 obl=0 oll=0 omem=0 events=r cmd=client and an INCR against master gets executed 5 times in slave: redis 127.0.0.1:6380> incr foo (integer) 1 ------------------------------------------ redis 127.0.0.1:6379> get foo "5" |
I think it's the same problem with issue #680 |
hey.. I think the bug is in replication.c, slaveofCommand(). The fix is to undo connect with master when slave of command is issued. Tested against unstable branch. 731 if (server.repl_state == REDIS_REPL_TRANSFER)
732 replicationAbortSyncTransfer();
733 else if (server.repl_state == REDIS_REPL_CONNECTING ||
734 server.repl_state == REDIS_REPL_RECEIVE_PONG)
735 undoConnectWithMaster(); |
Issue #828 shows how Redis was not correctly undoing a non-blocking connection attempt with the previous master when the master was set to a new address using the SLAVEOF command. This was also a result of lack of refactoring, so now there is a function to cancel the non blocking handshake with the master. The new function is now used when SLAVEOF NO ONE is called or when SLAVEOF is used to set the master to a different address.
Issue #828 shows how Redis was not correctly undoing a non-blocking connection attempt with the previous master when the master was set to a new address using the SLAVEOF command. This was also a result of lack of refactoring, so now there is a function to cancel the non blocking handshake with the master. The new function is now used when SLAVEOF NO ONE is called or when SLAVEOF is used to set the master to a different address.
Issue #828 shows how Redis was not correctly undoing a non-blocking connection attempt with the previous master when the master was set to a new address using the SLAVEOF command. This was also a result of lack of refactoring, so now there is a function to cancel the non blocking handshake with the master. The new function is now used when SLAVEOF NO ONE is called or when SLAVEOF is used to set the master to a different address.
Issue #828 shows how Redis was not correctly undoing a non-blocking connection attempt with the previous master when the master was set to a new address using the SLAVEOF command. This was also a result of lack of refactoring, so now there is a function to cancel the non blocking handshake with the master. The new function is now used when SLAVEOF NO ONE is called or when SLAVEOF is used to set the master to a different address.
Closing since the fix was merged in all the branches. |
Issue redis#828 shows how Redis was not correctly undoing a non-blocking connection attempt with the previous master when the master was set to a new address using the SLAVEOF command. This was also a result of lack of refactoring, so now there is a function to cancel the non blocking handshake with the master. The new function is now used when SLAVEOF NO ONE is called or when SLAVEOF is used to set the master to a different address.
Redis slave will resync with its master if it has been assigned to an unreachable host as its master before.
How to reproduce:
(1). find a host that will take some long time before tcp connection fails:
it takes telnet more than 3 minutes to detect connection timeout.
(2). start 2 servers with port 6379 and 6380:
(3). assign the unreachable host to be the master of redis running on port 6379:
(4). wait 60 seconds before redis detect timeout, (this step is not necessary, just to confirm that it takes some time before a connect error happens),
(5). assign the server running on port 6380 as the new master:
(6). after 2 minutes, timeout detected for the connection in step 3, and redis resync with its new master.
log below:
Slave:
Master:
The text was updated successfully, but these errors were encountered: