Skip to content

LeaderLatch reporting multiple leaders #216

Closed
coryb opened this Issue Dec 4, 2012 · 1 comment

2 participants

@coryb
Netflix, Inc. member
coryb commented Dec 4, 2012

We had a period were 3 of 12 instances in a cluster were all reporting true for "hasLeadership". The problem presisted for at least 10 min (after I noticed and until I restarted the cluster). A single instance-leader termination just caused the leadership to move to a new instance bringing count back to 3. Restarting all instances resolved the problem.

Jordan's suggestion is "LeaderLatch code isn't good about clearing the internal leader state when there are connection problems".

Please look into it. Thanks,
-Cory

@Randgalt

I was never able to write a test that reproduces this. However, I can think of several edge cases that might cause it. In the end, I re-wrote LeaderLatch to better handle connection/server instability. At the same time, made most of the calls async which will help concurrency and performance.

This will be in the next release.

@Randgalt Randgalt closed this Dec 16, 2012
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.