-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Total data loss on failover #14
Comments
Your observation is correct, the problem in this (and the previous) case is with applying the commands to Redis. A missing re-entrancy check causes the applied commands to be re-raftized (i.e. sent again to the Raft log instead of to the FSM). |
Confirmed--this looks fixed as of d589127 |
sjpotter
added a commit
to sjpotter/redisraft
that referenced
this issue
Oct 7, 2021
raft_get_last_log_term() is used for filling in vote requests, except it didn't match 100% with the logic in should_grant_vote * update test to prove that change works as needed previous test tested when we don't snapshot everything, but code was broken when we did. modified test tested only when we snapshot everything, now we do both in sequence. * use raft_get_last_log_term in __should_grant_vote() also modify __should_grant_vote to take a raft_server_t instead of raft_server_privat_t so its not constantly casting it to void.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
With Redis f88f866 and Redisraft 1b3fbf6, every failover results in the complete loss of all data in the cluster. This can occur both in healthy clusters due to sporadic leader elections, by pausing or killing nodes, network partitions, etc.
redis monitor
does show leader-executed commands on followers, but I'm not clear if that's being proxied from the Raft leader, replicated but not actually applied to the local state machine, applied but then destroyed somehow, or what.For example:
Here, a leader election occurred during our set of t. We retry on n2...
And lo and behold, all the keys we wrote before are gone.
$ ssh n2 /opt/redis/redis-cli keys "'*'" t
If we kill n2 as a leader, even that disappears.
$ ssh n2 killall -9 redis-server $ ssh n3 /opt/redis/redis-cli keys "'*'" __raft_snapshot_info__
The text was updated successfully, but these errors were encountered: