-
Notifications
You must be signed in to change notification settings - Fork 174
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gcs protocol violation with JOIN and SYNC messages #101
Comments
Seems garbd has wrong view of db1 and db2 state.
db1 should be in JOINER state, but garbd thinks db1 is in PRIM state.
db2 should be in DONOR state, but garbd thinks db2 is in SYNCED state. |
Noticed same without garbd in a recent test:
|
related to #106 |
Saw this when the garbd test is run parallel to other workload:
|
Yes, these protocol violations are still logged during tests on our end as well. Some of the logs attached to other bugs do have them. Not all messages are identical though. So, is this indicative of any bug or harmless always. Asking for latter because I tend to see this in conjunction with other bugs. |
This is inevitable as long as different stack layers have asynchronous state machines. So it will show under stress. This is logged as something to look for. But its significance depends on the actual situation. For example, above, there is a stray JOIN message from a node that's already SYNCED. So it may be a simple redundancy, which is nothing. Or that everybody erroneously thinks that it is SYNCED, while it is not - and then it is a serious issue. That message by itself is not a statement of what is going on. It is an additional info. |
Setup: Two nodes (db1, db2) and garbd. Node db1 was joined and got SST from db2. After SST was finished garbd logged:
but no such a messages were found from db2 log and db1 join was successful.
The issue is why garbd had a wrong view about db1 state.
The text was updated successfully, but these errors were encountered: