Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gcs protocol violation with JOIN and SYNC messages #101

Open
temeo opened this issue Aug 28, 2014 · 6 comments
Open

gcs protocol violation with JOIN and SYNC messages #101

temeo opened this issue Aug 28, 2014 · 6 comments

Comments

@temeo
Copy link
Contributor

temeo commented Aug 28, 2014

Setup: Two nodes (db1, db2) and garbd. Node db1 was joined and got SST from db2. After SST was finished garbd logged:

WARN: Protocol violation. JOIN message sender 1.0 (db2) is not in state transfer (SYNCED). Message ignored.
WARN: Rejecting JOIN message from 2.0 (db1): new State Transfer required.
WARN: SYNC message sender from non-JOINED 2.0 (db1). Ignored.

but no such a messages were found from db2 log and db1 join was successful.

The issue is why garbd had a wrong view about db1 state.

@dirtysalt
Copy link
Contributor

Seems garbd has wrong view of db1 and db2 state.

Rejecting JOIN message from 2.0 (db1): new State Transfer required.

db1 should be in JOINER state, but garbd thinks db1 is in PRIM state.

JOIN message sender 1.0 (db2) is not in state transfer (SYNCED)

db2 should be in DONOR state, but garbd thinks db2 is in SYNCED state.

@ronin13
Copy link

ronin13 commented Oct 14, 2014

Noticed same without garbd in a recent test:

2014-10-08 21:43:58 21085 [Note] WSREP: Signalling provider to continue.
2014-10-08 21:43:58 21085 [Note] WSREP: inited wsrep sidno 1
2014-10-08 21:43:58 21085 [Note] WSREP: SST received: e8fb9b5a-f7d6-11e3-813d-376a55693bec:8277710506
2014-10-08 21:43:58 21085 [Note] WSREP: Receiving IST: 58780 writesets, seqnos 8277710506-8277769286
2014-10-08 21:43:58 21085 [Note] /usr/sbin/mysqld: ready for connections.
Version: '5.6.21-69.0-56'  socket: '/var/run/mysqld/mysqld.sock'  port: 3306  Percona XtraDB Cluster (GPL), Release rel69.0, Revision 909, WSREP version 25.8, wsrep_25.8.r4126
2014-10-08 21:43:59 21085 [Note] WSREP: (9869c3e6, 'tcp://0.0.0.0:4567') turning message relay requesting off
2014-10-08 21:44:17 21085 [Note] WSREP: IST received: e8fb9b5a-f7d6-11e3-813d-376a55693bec:8277769286
2014-10-08 21:44:17 21085 [Note] WSREP: 0.0 (nuc4): State transfer from 1.0 (nuc3) complete.
2014-10-08 21:44:17 21085 [Note] WSREP: Shifting JOINER -> JOINED (TO: 8277780572)
2014-10-08 21:44:17 21085 [Warning] WSREP: Protocol violation. JOIN message sender 1.0 (nuc3) is not in state transfer (SYNCED). Message ignored.
2014-10-08 21:44:22 21085 [Note] WSREP: Member 0.0 (nuc4) synced with group.
2014-10-08 21:44:22 21085 [Note] WSREP: Shifting JOINED -> SYNCED (TO: 8277783013)
2014-10-08 21:44:22 21085 [Note] WSREP: Synchronized with group, ready for connections
2014-10-08 21:44:22 21085 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.

@dirtysalt
Copy link
Contributor

related to #106

@philip-galera
Copy link
Contributor

Saw this when the garbd test is run parallel to other workload:

galera_3nodes.galera_garbd               w3 [ fail ]  Found warnings/errors in server log file!
        Test ended at 2015-03-31 03:19:14
line
2015-03-31 13:18:48 41508 [Warning] WSREP: Protocol violation. JOIN message sender 1.0 (localhost.localdomain) is not in state transfer (SYNCED). Message ignored.
^ Found warnings in /home/philips/git/mysql-wsrep-parallel/mysql-test/var/3/log/mysqld.1.err
ok
line
2015-03-31 13:18:48 41599 [Warning] WSREP: Protocol violation. JOIN message sender 1.0 (localhost.localdomain) is not in state transfer (SYNCED). Message ignored.
^ Found warnings in /home/philips/git/mysql-wsrep-parallel/mysql-test/var/3/log/mysqld.2.err
ok
line
2015-03-31 13:18:48 41929 [Warning] WSREP: Protocol violation. JOIN message sender 1.0 (localhost.localdomain) is not in state transfer (SYNCED). Message ignored.
^ Found warnings in /home/philips/git/mysql-wsrep-parallel/mysql-test/var/3/log/mysqld.3.err
ok

 - saving '/home/philips/git/mysql-wsrep-parallel/mysql-test/var/3/log/galera_3nodes.galera_garbd/' to '/home/philips/git/mysql-wsrep-parallel/mysql-test/var/log/galera_3nodes.galera_garbd/'

@ronin13
Copy link

ronin13 commented Mar 31, 2015

Yes, these protocol violations are still logged during tests on our end as well. Some of the logs attached to other bugs do have them.

Not all messages are identical though.

So, is this indicative of any bug or harmless always. Asking for latter because I tend to see this in conjunction with other bugs.

@ayurchen
Copy link
Member

This is inevitable as long as different stack layers have asynchronous state machines. So it will show under stress. This is logged as something to look for. But its significance depends on the actual situation. For example, above, there is a stray JOIN message from a node that's already SYNCED. So it may be a simple redundancy, which is nothing. Or that everybody erroneously thinks that it is SYNCED, while it is not - and then it is a serious issue. That message by itself is not a statement of what is going on. It is an additional info.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants