ZK connectivity failure with multiple watchers leads to permanent failure #92

minkovich · 2017-04-18T17:14:40Z

Setup:
6 nerve service watchers on the same instance connected to the same ZK pool

How to reproduce:

The instance has a problem connecting to ZK
Nerve ->
Nerve::Nerve: nerve: watcher service1 not alive; reaping and relaunching
Nerve::ServiceWatcher: nerve: stopping service watch service1
Nerve::Nerve: nerve: could not reap service1, got #<Zookeeper::Exceptions::NotConnected: Zookeeper::Exceptions::NotConnected>
This continues in a loop for each service watcher until nerve is restarted.

Actual problem:
The problem is that in start() in zookeeper.rb there are no checks to see if the ZK connection is alive before re-using in.

The text was updated successfully, but these errors were encountered:

jolynch · 2017-04-19T22:24:16Z

@minkovich what is your desired behavior here? I suppose that we would like it if Nerve threw out the bad cached connection and tried again?

If the cluster is just not reachable this would lead to a similar infinite retry loop, but perhaps crash-recover is sufficient here?

minkovich · 2017-04-27T13:48:54Z

@jolynch The efficient solution would be for nerve to throw away the bad connection, but honestly in this situation a crash recovery would also be equivalent since connectivity was already lost.

Should fix #92

panchr · 2020-02-13T02:07:59Z

Closed because this was fixed in #113.

jolynch added a commit that referenced this issue Apr 30, 2017

Recover from bad zk connections

c68caee

Should fix #92

jolynch mentioned this issue Apr 30, 2017

Recover from bad zk connections #93

Closed

jolynch added a commit that referenced this issue Apr 30, 2017

Recover from bad zk connections

8ce57a5

Should fix #92

anson627 mentioned this issue May 30, 2019

Gracefully handle Zookeeper::Exceptions::NotConnected #113

Merged

panchr closed this as completed Feb 13, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ZK connectivity failure with multiple watchers leads to permanent failure #92

ZK connectivity failure with multiple watchers leads to permanent failure #92

minkovich commented Apr 18, 2017

jolynch commented Apr 19, 2017

minkovich commented Apr 27, 2017

panchr commented Feb 13, 2020

ZK connectivity failure with multiple watchers leads to permanent failure #92

ZK connectivity failure with multiple watchers leads to permanent failure #92

Comments

minkovich commented Apr 18, 2017

jolynch commented Apr 19, 2017

minkovich commented Apr 27, 2017

panchr commented Feb 13, 2020