Skip to content
Browse files

Undo slave-master handshake when SLAVEOF sets a new slave.

Issue #828 shows how Redis was not correctly undoing a non-blocking
connection attempt with the previous master when the master was set to a
new address using the SLAVEOF command.

This was also a result of lack of refactoring, so now there is a
function to cancel the non blocking handshake with the master.
The new function is now used when SLAVEOF NO ONE is called or when
SLAVEOF is used to set the master to a different address.
  • Loading branch information...
1 parent a33869c commit f8e40cdcbd6bdf81918d3e7895a359aec67d40d8 @antirez committed Jan 14, 2013
Showing with 23 additions and 7 deletions.
  1. +23 −7 src/replication.c
View
30 src/replication.c
@@ -695,18 +695,35 @@ void undoConnectWithMaster(void) {
server.repl_state = REDIS_REPL_CONNECT;
}
+/* This function aborts a non blocking replication attempt if there is one
+ * in progress, by canceling the non-blocking connect attempt or
+ * the initial bulk transfer.
+ *
+ * If there was a replication handshake in progress 1 is returned and
+ * the replication state (server.repl_state) set to REDIS_REPL_CONNECT.
+ *
+ * Otherwise zero is returned and no operation is perforemd at all. */
+int cancelReplicationHandshake(void) {
+ if (server.repl_state == REDIS_REPL_TRANSFER) {
+ replicationAbortSyncTransfer();
+ } else if (server.repl_state == REDIS_REPL_CONNECTING ||
+ server.repl_state == REDIS_REPL_RECEIVE_PONG)
+ {
+ undoConnectWithMaster();
+ } else {
+ return 0;
+ }
+ return 1;
+}
+
void slaveofCommand(redisClient *c) {
if (!strcasecmp(c->argv[1]->ptr,"no") &&
!strcasecmp(c->argv[2]->ptr,"one")) {
if (server.masterhost) {
sdsfree(server.masterhost);
server.masterhost = NULL;
if (server.master) freeClient(server.master);
- if (server.repl_state == REDIS_REPL_TRANSFER)
- replicationAbortSyncTransfer();
- else if (server.repl_state == REDIS_REPL_CONNECTING ||
- server.repl_state == REDIS_REPL_RECEIVE_PONG)
- undoConnectWithMaster();
+ cancelReplicationHandshake();
server.repl_state = REDIS_REPL_NONE;
redisLog(REDIS_NOTICE,"MASTER MODE enabled (user request)");
}
@@ -730,8 +747,7 @@ void slaveofCommand(redisClient *c) {
server.masterport = port;
if (server.master) freeClient(server.master);
disconnectSlaves(); /* Force our slaves to resync with us as well. */
- if (server.repl_state == REDIS_REPL_TRANSFER)
- replicationAbortSyncTransfer();
+ cancelReplicationHandshake();
server.repl_state = REDIS_REPL_CONNECT;
redisLog(REDIS_NOTICE,"SLAVE OF %s:%d enabled (user request)",
server.masterhost, server.masterport);

5 comments on commit f8e40cd

@charsyam

It misses repl_state is REDIS_REPL_CONNECTED. at that time, Redis won't close replication handle.

How about this. add checking code for REDIS_REPL_CONNECTED in undoConnectWithMaster and

int cancelReplicationHandshake(void) {
   if (server.repl_state == REDIS_REPL_TRANSFER) {
        replicationAbortSyncTransfer();
    } else if (server.repl_state == REDIS_REPL_CONNECTING ||
             server.repl_state == REDIS_REPL_RECEIVE_PONG ||
             server.repl_state == REDIS_REPL_CONNECTED)
    {
        undoConnectWithMaster();
    } else {
        return 0;
    }
    return 1;
}

void undoConnectWithMaster(void) {
    int fd = server.repl_transfer_s;

    redisAssert(server.repl_state == REDIS_REPL_CONNECTING ||
                     server.repl_state == REDIS_REPL_RECEIVE_PONG ||
                     server.repl_state == REDIS_REPL_CONNECTED);
    aeDeleteFileEvent(server.el,fd,AE_READABLE|AE_WRITABLE);
    close(fd);
    server.repl_transfer_s = -1;
    server.repl_state = REDIS_REPL_CONNECT;
}
@charsyam

and I also think.
It is better to add to initialize code for repl_transfer_s = -1

@srned
srned commented on f8e40cd Jan 15, 2013

@charsyam I think it is taken care by freeClient(server.master). server.master = createClient(server.repl_transfer_s) is done when state changes to REDIS_REPL_CONNECTED from REDIS_REPL_TRANSFER.

@antirez
Owner

@charsyam, @snerd AFAIK if there is an event currently registered to connect with the master or to transfer the initial RDB file, the state must be either REDIS_REPL_CONNECTING or RECEIVE_POING or TRANSFER.

Because of the bug the state was CONNECTED while there was still the event handler registered, since Redis was not canceling it at the right time.

About freeClient(), it actually when server.master is not NULL, the state is always CONNECTED, so when there was a previous master and we freeClient(server.master) what happen is that the resulting state will always be CONNECT and no handler should be registered.

In short, we should be safe, but now I'll test manually if my fix appears to work as I honestly did not. Unfortunately a regression test will be hard to add...

@antirez
Owner

I can confirm the fix works as expected simulating the problem as @jokea advised to do. Without the patch it reconnects after 3 minutes. With the patch everything works as expected. Thanks for the help, I'll release Redis 2.6.9 today including this fix.

Please sign in to comment.
Something went wrong with that request. Please try again.