Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

After a network split, a node can make a write progress and end-up with a diverged local seqno #401

Closed
bogdando opened this issue May 19, 2016 · 2 comments

Comments

@bogdando
Copy link

The details was given here https://bugs.launchpad.net/codership-mysql/+bug/1583521

@temeo
Copy link
Contributor

temeo commented May 20, 2016

Hi, I took a look at the logs and noticed that view ids jump backwards in logs:

2016-05-18 09:06:00 12954 [Note] WSREP: New cluster view: global state: e71ec250
-1919-11e6-9f84-be5f61687f0f:92582, view# 99: Primary, number of nodes: 5, my in
dex: 4, protocol version 3
...
2016-05-18 09:12:20 17875 [Note] WSREP: New cluster view: global state: e71ec250
-1919-11e6-9f84-be5f61687f0f:92604, view# 2: Primary, number of nodes: 4, my index: 3, protocol version 3

This suggests that cluster may have been rebootstrapped between test runs. Jepsen seems to start a cluster so that the primary node is given --wsrep-new-cluster option, which in turn bootstraps a new cluster. If nodes were shut down in the order n1, n2, ... n5 so that n1 was not up to date with n5 (n5 was the most advanced in the cluster, n1 was in non-primary state), restarting nodes in order of n1, n2, ... n5 makes n5 to detect data divergence when it tries to join the cluster.

While this kind of data divergence is real, it is not result of replication protocol malfunction but rather of the way how the cluster is managed (use of --wsrep-new-cluster) and current limitations of galera cluster management.

While I can't be certain from the logs if this is the case, it is the most probable explanation. Also galera provider version used in tests is rather old, 3.8, while the most recent release is 3.16.

@bogdando
Copy link
Author

bogdando commented May 23, 2016

Note, I modified Jepsen test to rely on external cluster bootsrapping, which is driven by a Pacemaker OCF RA. It searches for a seed node (the one shall be starting with a --wsrep-new-cluster) as the one who has:

  • the most seen of UUIDs across a visible network partition with a Pacemaker quorum (so it doesn't in the minority partition), then
  • the max value of SEQNO across those who have that UUID.
    Note, that the Galera network partitions always aligned (by nodes) with a Pacemaker ones, by the test env layout. This means, that Galera perhaps must handle connections made from clients to nodes w/o write quorum so they never has made a progress to a minority partition nodes.

Although, the Pacemaker cluster's DC node may be NOT aligned with the Galera PRIMARY. I mean could the prim node belong to the minority partition? Do you think that was the case of the "ids jump", do you see such a pattern in logs?.. Do you think this issue has nothing to the Galera, but a bootstrap (or a seed/DC worldview of clusters) specific? Which way to search for a seed node do you recommend?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants