-
Notifications
You must be signed in to change notification settings - Fork 23.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replication inconsistent issue #2694
Comments
I think in step 2 C should reset backlog, then D can only full sync with C. |
Thanks for submitting, I think I found the cause for this issue. Working on a fix right now. |
Probably it will never be useful again, but given that I wrote it, we can use it to better document the bug for the future: here is the script to reproduce it easily: #!/bin/bash
mkdir -p /tmp/a; rm -rf /tmp/a/*
mkdir -p /tmp/b; rm -rf /tmp/b/*
mkdir -p /tmp/c; rm -rf /tmp/c/*
mkdir -p /tmp/d; rm -rf /tmp/d/*
A=8888
B=8889
C=8810
D=8811
BIN=~/hack/redis/src/redis-server
$BIN --logfile /tmp/a/redis.log --port $A &
$BIN --logfile /tmp/b/redis.log --port $B &
$BIN --logfile /tmp/c/redis.log --port $C &
$BIN --logfile /tmp/d/redis.log --port $D &
sleep 2
redis-cli -p $A SLAVEOF NO ONE
redis-cli -p $B SLAVEOF NO ONE
redis-cli -p $C SLAVEOF NO ONE
redis-cli -p $D SLAVEOF NO ONE
redis-cli -p $A FLUSHALL
redis-cli -p $B FLUSHALL
redis-cli -p $C FLUSHALL
redis-cli -p $D FLUSHALL
# Setup A, B <- C <- D
redis-cli -p $C SLAVEOF 127.0.0.1 $B
redis-cli -p $D SLAVEOF 127.0.0.1 $C
# Write the two keys
redis-cli -p $A set a 1
redis-cli -p $B set a 2
sleep 2
# Setup the SLEEP & RECONNECT condition for D
redis-cli -p $D client list
(echo -e "multi\nclient kill id 5\ndebug sleep 5\nexec\n" | redis-cli -p $D) &
# Make B slave of A
sleep 1
redis-cli -p $B SLAVEOF 127.0.0.1 $A
redis-cli -p $A ping
redis-cli -p $B ping
redis-cli -p $C ping
redis-cli -p $D ping
# Fetch the value
sleep 6
echo "The following value should be 1 but is 2 because of the bug:"
redis-cli -p $D get a
# Kill servers
redis-cli -p $A SHUTDOWN NOSAVE
redis-cli -p $B SHUTDOWN NOSAVE
redis-cli -p $C SHUTDOWN NOSAVE
redis-cli -p $D SHUTDOWN NOSAVE |
I wrote a first patch, than realized that this bug is just a manifestation of a deeper problem. Redis replication code used to do two things:
So I'm writing a different fix that only forces a full SYNC of the connected slaves once the slave has to full SYNC with its master. |
Using chained replication where C is slave of B which is in turn slave of A, if B reconnects the replication link with A but discovers it is no longer possible to PSYNC, slaves of B must be disconnected and PSYNC not allowed, since the new B dataset may be completely different after the synchronization with the master. Note that there are varius semantical differences in the way this is handled now compared to the past. In the past the semantics was: 1. When a slave lost connection with its master, disconnected the chained slaves ASAP. Which is not needed since after a successful PSYNC with the master, the slaves can continue and don't need to resync in turn. 2. However after a failed PSYNC the replication backlog was not reset, so a slave was able to PSYNC successfully even if the instance did a full sync with its master, containing now an entirely different data set. Now instead chained slaves are not disconnected when the slave lose the connection with its master, but only when it is forced to full SYNC with its master. This means that if the slave having chained slaves does a successful PSYNC all its slaves can continue without troubles. See issue #2694 for more details.
Using chained replication where C is slave of B which is in turn slave of A, if B reconnects the replication link with A but discovers it is no longer possible to PSYNC, slaves of B must be disconnected and PSYNC not allowed, since the new B dataset may be completely different after the synchronization with the master. Note that there are varius semantical differences in the way this is handled now compared to the past. In the past the semantics was: 1. When a slave lost connection with its master, disconnected the chained slaves ASAP. Which is not needed since after a successful PSYNC with the master, the slaves can continue and don't need to resync in turn. 2. However after a failed PSYNC the replication backlog was not reset, so a slave was able to PSYNC successfully even if the instance did a full sync with its master, containing now an entirely different data set. Now instead chained slaves are not disconnected when the slave lose the connection with its master, but only when it is forced to full SYNC with its master. This means that if the slave having chained slaves does a successful PSYNC all its slaves can continue without troubles. See issue #2694 for more details.
Using chained replication where C is slave of B which is in turn slave of A, if B reconnects the replication link with A but discovers it is no longer possible to PSYNC, slaves of B must be disconnected and PSYNC not allowed, since the new B dataset may be completely different after the synchronization with the master. Note that there are varius semantical differences in the way this is handled now compared to the past. In the past the semantics was: 1. When a slave lost connection with its master, disconnected the chained slaves ASAP. Which is not needed since after a successful PSYNC with the master, the slaves can continue and don't need to resync in turn. 2. However after a failed PSYNC the replication backlog was not reset, so a slave was able to PSYNC successfully even if the instance did a full sync with its master, containing now an entirely different data set. Now instead chained slaves are not disconnected when the slave lose the connection with its master, but only when it is forced to full SYNC with its master. This means that if the slave having chained slaves does a successful PSYNC all its slaves can continue without troubles. See issue #2694 for more details.
Using chained replication where C is slave of B which is in turn slave of A, if B reconnects the replication link with A but discovers it is no longer possible to PSYNC, slaves of B must be disconnected and PSYNC not allowed, since the new B dataset may be completely different after the synchronization with the master. Note that there are varius semantical differences in the way this is handled now compared to the past. In the past the semantics was: 1. When a slave lost connection with its master, disconnected the chained slaves ASAP. Which is not needed since after a successful PSYNC with the master, the slaves can continue and don't need to resync in turn. 2. However after a failed PSYNC the replication backlog was not reset, so a slave was able to PSYNC successfully even if the instance did a full sync with its master, containing now an entirely different data set. Now instead chained slaves are not disconnected when the slave lose the connection with its master, but only when it is forced to full SYNC with its master. This means that if the slave having chained slaves does a successful PSYNC all its slaves can continue without troubles. See issue #2694 for more details.
Using chained replication where C is slave of B which is in turn slave of A, if B reconnects the replication link with A but discovers it is no longer possible to PSYNC, slaves of B must be disconnected and PSYNC not allowed, since the new B dataset may be completely different after the synchronization with the master. Note that there are varius semantical differences in the way this is handled now compared to the past. In the past the semantics was: 1. When a slave lost connection with its master, disconnected the chained slaves ASAP. Which is not needed since after a successful PSYNC with the master, the slaves can continue and don't need to resync in turn. 2. However after a failed PSYNC the replication backlog was not reset, so a slave was able to PSYNC successfully even if the instance did a full sync with its master, containing now an entirely different data set. Now instead chained slaves are not disconnected when the slave lose the connection with its master, but only when it is forced to full SYNC with its master. This means that if the slave having chained slaves does a successful PSYNC all its slaves can continue without troubles. See issue redis#2694 for more details.
Redis version 2.8 and 3.0.3
Expect Result: value of a is 1 in D
Actual Result: value of a is 2 in D
The text was updated successfully, but these errors were encountered: