Don't expire keys when loading an RDB after a SYNC #296

Merged
merged 1 commit into from Jan 16, 2012

Projects

None yet

3 participants

Contributor
pietern commented Jan 14, 2012

A user reported a slave that would show an monotonically increase input buffer length, shortly after completing a SYNC. Also, INFO output showed a single blocked client, which could only be the master link. Investigation showed that indeed the BRPOP command was fed by the master. This command can only end up in the stream of write operations when it did NOT block, and effectively executed RPOP. However, when the key involved in the BRPOP is expired BEFORE the command is executed, the client executing it will block. The client in this case, is the master link.

@pietern pietern Don't expire keys when loading an RDB after a SYNC
The cron is responsible for expiring keys. When keys are expired at
load time, it is possible that the snapshot of a master node gets
modified. This can in turn lead to inconsistencies in the data set.

A more concrete example of this behavior follows. A user reported a
slave that would show an monotonically increase input buffer length,
shortly after completing a SYNC. Also, `INFO` output showed a single
blocked client, which could only be the master link. Investigation
showed that indeed the `BRPOP` command was fed by the master. This
command can only end up in the stream of write operations when it did
NOT block, and effectively executed `RPOP`. However, when the key
involved in the `BRPOP` is expired BEFORE the command is executed, the
client executing it will block. The client in this case, is the master
link.
aa794ac
Contributor
pietern commented Jan 14, 2012

/cc @xb95

Contributor
pietern commented Jan 14, 2012

I was not able to reproduce the issue because of timing issues. However, this is very likely to fix the issue.

zorkian commented Jan 14, 2012

I have deployed this patch on our slave and it is now synced and replicating as expected. Thank you!

Owner
antirez commented Jan 16, 2012

This is a good one, the slave should never expire by itself. To reason about the changes introduced by this patch, now that the slave is not expiring at loading time the master still accumulates the synthesized DELs on expire server side, so those keys will be correctly handled. However it is strange that the replication test with expires was never able to catch this bug.

I'm merging but I'm taking this issue open to merge it into 2.4 and in the hope to write a regression test for it, or at least, to modify the current replication with expires test so that it is likely that we can catch it even 1 time every 1000 runs (the CI will make it consistent).

Thank you.

@antirez antirez merged commit 58bfbd1 into antirez:2.4 Jan 16, 2012
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment