New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

min-slaves-to-write configuration on a slave breaks replication #1434

Closed
shanemadden opened this Issue Dec 5, 2013 · 12 comments

Comments

Projects
None yet
5 participants
@shanemadden

shanemadden commented Dec 5, 2013

In Redis 2.8.2 slaves which have min-slaves-to-write set in their config file (I was putting it in there for when that slave gets promoted to master - I would guess that it'd be inert on a slave, but hopefully active once that slave became a master?) seem to have problems with accepting replication of writes from the master.

Master and two slaves set up and running..

root@redis3:~# redis-cli info replication
# Replication
role:master
connected_slaves:2
slave0:ip=192.168.99.82,port=6379,state=online,offset=2973,lag=1
slave1:ip=192.168.99.81,port=6379,state=online,offset=2973,lag=1
master_repl_offset:2973
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:2
repl_backlog_histlen:2972

root@redis3:~# redis-cli info keyspace
# Keyspace
db0:keys=1,expires=0,avg_ttl=0

One slave, redis1, has min-slaves-to-write 1 in its config file. redis2 does not. Otherwise their configurations are identical.

On startup, both slaves do their initial sync great.. but redis1 has a slightly different output for info replication (with min_slaves_good_slaves:0):

root@redis2:~# redis-cli info replication
# Replication
role:slave
master_host:192.168.99.83
master_port:6379
master_link_status:up
master_last_io_seconds_ago:2
master_sync_in_progress:0
slave_repl_offset:2973
slave_priority:100
slave_read_only:1
connected_slaves:0
master_repl_offset:0
repl_backlog_active:0
repl_backlog_size:1048576
repl_backlog_first_byte_offset:0
repl_backlog_histlen:0

root@redis2:~# redis-cli info keyspace
# Keyspace
db0:keys=1,expires=0,avg_ttl=0


root@redis1:~# redis-cli info replication
# Replication
role:slave
master_host:192.168.99.83
master_port:6379
master_link_status:up
master_last_io_seconds_ago:2
master_sync_in_progress:0
slave_repl_offset:2973
slave_priority:100
slave_read_only:1
connected_slaves:0
min_slaves_good_slaves:0
master_repl_offset:0
repl_backlog_active:0
repl_backlog_size:1048576
repl_backlog_first_byte_offset:0
repl_backlog_histlen:0

root@redis1:~# redis-cli info keyspace
# Keyspace
db0:keys=1,expires=0,avg_ttl=0

A key change on the master is replicated to redis2 just fine, but not to redis1, despite it still being apparently synced:

root@redis3:~# redis-cli set foo 1
OK

root@redis2:~# redis-cli get foo
"1"

root@redis1:~# redis-cli get foo
(nil)

root@redis3:~# redis-cli info replication
# Replication
role:master
connected_slaves:2
slave0:ip=192.168.99.82,port=6379,state=online,offset=66350,lag=0
slave1:ip=192.168.99.81,port=6379,state=online,offset=66350,lag=0
master_repl_offset:66350
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:2
repl_backlog_histlen:66349
root@redis3:~# redis-cli info keyspace
# Keyspace
db0:keys=2,expires=0,avg_ttl=0

root@redis2:~# redis-cli info replication
# Replication
role:slave
master_host:192.168.99.83
master_port:6379
master_link_status:up
master_last_io_seconds_ago:2
master_sync_in_progress:0
slave_repl_offset:66350
slave_priority:100
slave_read_only:1
connected_slaves:0
master_repl_offset:0
repl_backlog_active:0
repl_backlog_size:1048576
repl_backlog_first_byte_offset:0
repl_backlog_histlen:0
root@redis2:~# redis-cli info keyspace
# Keyspace
db0:keys=2,expires=0,avg_ttl=0

root@redis1:~# redis-cli info replication
# Replication
role:slave
master_host:192.168.99.83
master_port:6379
master_link_status:up
master_last_io_seconds_ago:0
master_sync_in_progress:0
slave_repl_offset:66488
slave_priority:100
slave_read_only:1
connected_slaves:0
min_slaves_good_slaves:0
master_repl_offset:0
repl_backlog_active:0
repl_backlog_size:1048576
repl_backlog_first_byte_offset:0
repl_backlog_histlen:0
root@redis1:~# redis-cli info keyspace
# Keyspace
db0:keys=1,expires=0,avg_ttl=0

root@redis3:~# redis-cli flushall

root@redis2:~# redis-cli info keyspace
# Keyspace

root@redis1:~# redis-cli info keyspace
# Keyspace
db0:keys=1,expires=0,avg_ttl=0

A config rewrite on the malfunctioning slave removes the min-slaves-to-write 1 line, then a restart of the process gets it back into a normal replication state.

@rhoml

This comment has been minimized.

Show comment
Hide comment
@rhoml

rhoml Jun 5, 2014

Contributor

I have the same issue on redis 2.8.8, the weird escenario is that data replicates when you attach the nodes to the master node, and then sync stops.

A work around to fix this is to use the reconfigure script from sentinel so when promoting a new master it sets the 'min-slaves-to-write 1'.

Contributor

rhoml commented Jun 5, 2014

I have the same issue on redis 2.8.8, the weird escenario is that data replicates when you attach the nodes to the master node, and then sync stops.

A work around to fix this is to use the reconfigure script from sentinel so when promoting a new master it sets the 'min-slaves-to-write 1'.

@charsyam

This comment has been minimized.

Show comment
Hide comment
@charsyam

charsyam Jun 5, 2014

Contributor

@rhoml @shanemadden
I just search code. and if repl_good_slaves_count < repl_min_slaves_to_write then
it rejects all write_command.

so. I might think. one of them is not good slave. so if 1 < 1 will not accept write_commands
it checks good slave

void refreshGoodSlavesCount(void) {
    listIter li;
    listNode *ln;
    int good = 0;

    if (!server.repl_min_slaves_to_write ||
        !server.repl_min_slaves_max_lag) return;

    listRewind(server.slaves,&li);
    while((ln = listNext(&li))) {
        redisClient *slave = ln->value;
        time_t lag = server.unixtime - slave->repl_ack_time;

        if (slave->replstate == REDIS_REPL_ONLINE &&
            lag <= server.repl_min_slaves_max_lag) good++;
    }
    server.repl_good_slaves_count = good;
}
Contributor

charsyam commented Jun 5, 2014

@rhoml @shanemadden
I just search code. and if repl_good_slaves_count < repl_min_slaves_to_write then
it rejects all write_command.

so. I might think. one of them is not good slave. so if 1 < 1 will not accept write_commands
it checks good slave

void refreshGoodSlavesCount(void) {
    listIter li;
    listNode *ln;
    int good = 0;

    if (!server.repl_min_slaves_to_write ||
        !server.repl_min_slaves_max_lag) return;

    listRewind(server.slaves,&li);
    while((ln = listNext(&li))) {
        redisClient *slave = ln->value;
        time_t lag = server.unixtime - slave->repl_ack_time;

        if (slave->replstate == REDIS_REPL_ONLINE &&
            lag <= server.repl_min_slaves_max_lag) good++;
    }
    server.repl_good_slaves_count = good;
}
@rhoml

This comment has been minimized.

Show comment
Hide comment
@rhoml

rhoml Jun 5, 2014

Contributor

@charsyam that is exactly what is happening, if you have a cluster of 1 redis master and 2 redis slaves with a min-slaves-to-write 1 on each node, replication won't work because slaves doesn't comply with the min-slaves-to-write 1 so replication will fail.

Either this option is not meant to be written on the slaves so that why I'll do the workaround with sentinel or it shouldn't apply for slaves so the code must be aware of the redis node role when deciding of can replicate.

Contributor

rhoml commented Jun 5, 2014

@charsyam that is exactly what is happening, if you have a cluster of 1 redis master and 2 redis slaves with a min-slaves-to-write 1 on each node, replication won't work because slaves doesn't comply with the min-slaves-to-write 1 so replication will fail.

Either this option is not meant to be written on the slaves so that why I'll do the workaround with sentinel or it shouldn't apply for slaves so the code must be aware of the redis node role when deciding of can replicate.

@antirez

This comment has been minimized.

Show comment
Hide comment
@antirez

antirez Jun 5, 2014

Owner

Horrible bug, fixing.

Owner

antirez commented Jun 5, 2014

Horrible bug, fixing.

@rhoml

This comment has been minimized.

Show comment
Hide comment
@rhoml

rhoml Jun 5, 2014

Contributor

@antirez thank you.

Contributor

rhoml commented Jun 5, 2014

@antirez thank you.

antirez added a commit that referenced this issue Jun 5, 2014

Don't process min-slaves-to-write for slaves.
Replication is totally broken when a slave has this option, since it
stops accepting updates from masters.

This fixes issue #1434.

antirez added a commit that referenced this issue Jun 5, 2014

Don't process min-slaves-to-write for slaves.
Replication is totally broken when a slave has this option, since it
stops accepting updates from masters.

This fixes issue #1434.

antirez added a commit that referenced this issue Jun 5, 2014

Don't process min-slaves-to-write for slaves.
Replication is totally broken when a slave has this option, since it
stops accepting updates from masters.

This fixes issue #1434.
@antirez

This comment has been minimized.

Show comment
Hide comment
@antirez

antirez Jun 5, 2014

Owner

Fixed and added tests for min-slaves-to-write. It is very bad that I didn't noticed this issue when @shanemadden originally reported it 4 months ago. Redis 2.8.10 will go live today.

Owner

antirez commented Jun 5, 2014

Fixed and added tests for min-slaves-to-write. It is very bad that I didn't noticed this issue when @shanemadden originally reported it 4 months ago. Redis 2.8.10 will go live today.

@antirez antirez closed this Jun 5, 2014

@antirez

This comment has been minimized.

Show comment
Hide comment
@antirez

antirez Jun 5, 2014

Owner

You are welcome, 2.8.10 is live if you want to upgrade.

Owner

antirez commented Jun 5, 2014

You are welcome, 2.8.10 is live if you want to upgrade.

@rhoml

This comment has been minimized.

Show comment
Hide comment
@rhoml

rhoml Jun 5, 2014

Contributor

@antirez awesome, thanks this was fast.

Contributor

rhoml commented Jun 5, 2014

@antirez awesome, thanks this was fast.

@rhoml

This comment has been minimized.

Show comment
Hide comment
@rhoml

rhoml Jun 11, 2014

Contributor

@antirez How will this behave when you have a cluster like M - S - S? Will the first slave will still be affected by the min-slaves-to-write?

Contributor

rhoml commented Jun 11, 2014

@antirez How will this behave when you have a cluster like M - S - S? Will the first slave will still be affected by the min-slaves-to-write?

@antirez

This comment has been minimized.

Show comment
Hide comment
@antirez

antirez Jun 11, 2014

Owner

No, it will not because, to start, slaves are by default read-only, so it is basically meaningless most of the times, and also, slaves totally ignore the option now, regardless of the fact they have additional slaves or not.

Owner

antirez commented Jun 11, 2014

No, it will not because, to start, slaves are by default read-only, so it is basically meaningless most of the times, and also, slaves totally ignore the option now, regardless of the fact they have additional slaves or not.

@rhoml

This comment has been minimized.

Show comment
Hide comment
@rhoml

rhoml Jun 11, 2014

Contributor

@antirez awesome, that's what I expected. Thanks.

Contributor

rhoml commented Jun 11, 2014

@antirez awesome, that's what I expected. Thanks.

@Natarajan77

This comment has been minimized.

Show comment
Hide comment
@Natarajan77

Natarajan77 Apr 10, 2018

minslavestowrite
maxlagnumberofsecs

we set on master and slaves in the conf file. Do we need to set in sentinel nodes also?

Natarajan77 commented Apr 10, 2018

minslavestowrite
maxlagnumberofsecs

we set on master and slaves in the conf file. Do we need to set in sentinel nodes also?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment