-
Notifications
You must be signed in to change notification settings - Fork 227
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Leader controller loses all the callback handlers after leadership switch #394
Comments
Do I understand it right? I just want to understand the whole picture more clearly since there's a bunch of race-condition issues happened recently.
|
Yes.
This is not confirmed. Maybe GC, maybe network issue. Or combined. The result is obvious though.
Please refer to the fix. The previous design cannot handle more than one leader node change event in a graceful way.
The 1st controller does lose the leadership, but that does not cause the problem. The issue was in one controller always. If this one has a leftover controller change event unprocessed, it will for sure fall into this bad situation. |
A problem was found that the leader controller may lose all the callback handlers after leadership switch.
To reproduce the issue, the cluster must be using leader election mode (DistributedLeaderElection). Then frequent leadership switch caused by ZK session expiring may trigger the problem.
The symptom is that, although the leader controller exists, it won't process any ZK notification. So the cluster will not be managed.
The text was updated successfully, but these errors were encountered: