New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
After a network split, a node can make a write progress and end-up with a diverged local seqno #401
Comments
|
Hi, I took a look at the logs and noticed that view ids jump backwards in logs: This suggests that cluster may have been rebootstrapped between test runs. Jepsen seems to start a cluster so that the primary node is given --wsrep-new-cluster option, which in turn bootstraps a new cluster. If nodes were shut down in the order n1, n2, ... n5 so that n1 was not up to date with n5 (n5 was the most advanced in the cluster, n1 was in non-primary state), restarting nodes in order of n1, n2, ... n5 makes n5 to detect data divergence when it tries to join the cluster. While this kind of data divergence is real, it is not result of replication protocol malfunction but rather of the way how the cluster is managed (use of --wsrep-new-cluster) and current limitations of galera cluster management. While I can't be certain from the logs if this is the case, it is the most probable explanation. Also galera provider version used in tests is rather old, 3.8, while the most recent release is 3.16. |
|
Note, I modified Jepsen test to rely on external cluster bootsrapping, which is driven by a Pacemaker OCF RA. It searches for a seed node (the one shall be starting with a --wsrep-new-cluster) as the one who has:
Although, the Pacemaker cluster's DC node may be NOT aligned with the Galera PRIMARY. I mean could the prim node belong to the minority partition? Do you think that was the case of the "ids jump", do you see such a pattern in logs?.. Do you think this issue has nothing to the Galera, but a bootstrap (or a seed/DC worldview of clusters) specific? Which way to search for a seed node do you recommend? |
The details was given here https://bugs.launchpad.net/codership-mysql/+bug/1583521
The text was updated successfully, but these errors were encountered: