Nodes won't die if zookeeper went away. #2016

KenjiTakahashi · 2015-11-27T02:02:07Z

Scenario:

Everything is up and running.
Zookeeper goes away (for whatever reason).
Nodes start spamming:

2015-11-27T01:44:08,767 INFO [Curator-Framework-0-SendThread(monitowl-dev:2181)] org.apache.zookeeper.ClientCnxn - Opening socket connection to server monitowl-dev/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error)
2015-11-27T01:44:08,767 WARN [Curator-Framework-0-SendThread(monitowl-dev:2181)] org.apache.zookeeper.ClientCnxn - Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) ~[?:1.8.0_66]
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717) ~[?:1.8.0_66]
at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361) ~[zookeeper-3.4.6.jar:3.4.6-1569965]
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081) [zookeeper-3.4.6.jar:3.4.6-1569965]

(Which is "fine", I guess.)
4. I do kill -TERM <node_pid>.
5. Node says:

2015-11-27T01:44:09,446 INFO [Thread-42] com.metamx.common.lifecycle.Lifecycle - Running shutdown hook
2015-11-27 01:44:09,464 FATAL Unable to register shutdown hook because JVM is shutting down.

(Which seems to be normal, they always say that before dying.)
6. But... it does not die. Process is still displayed as running (in htop, etc.), no further logs or anything, though.

Using 0.8.2, happens for all kinds of nodes AFAICT.

Possibly worth noting that when I deliberately start a node when zookeeper is down, it exits fine on SIGTERM.

Probably wouldn't even notice it, but this confuses our systemd configs quite a bit :-/.

The text was updated successfully, but these errors were encountered:

gianm · 2015-11-27T20:45:40Z

I believe this is the same issue as #660.

KenjiTakahashi · 2015-12-17T01:26:27Z

I think you're right, although in my case it took ~40 minutes for nodes to go down after receiving SIGTERM. Anyway, any chances of this being resolved anytime soon?

drcrallen · 2015-12-17T01:28:32Z

Absence of logging is probably because of #1387 not being in stable yet.

stale · 2019-06-21T17:13:35Z

This issue has been marked as stale due to 280 days of inactivity. It will be closed in 2 weeks if no further activity occurs. If this issue is still relevant, please simply write any comment. Even if closed, you can still revive the issue at any time or discuss it on the dev@druid.apache.org list. Thank you for your contributions.

stale · 2019-07-05T18:10:07Z

This issue has been closed due to lack of activity. If you think that is incorrect, or the issue requires additional review, you can revive the issue at any time.

leventov added the Area - ZooKeeper/Curator label Jun 22, 2017

stale bot added the stale label Jun 21, 2019

stale bot closed this as completed Jul 5, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Nodes won't die if zookeeper went away. #2016

Nodes won't die if zookeeper went away. #2016

KenjiTakahashi commented Nov 27, 2015

gianm commented Nov 27, 2015

KenjiTakahashi commented Dec 17, 2015

drcrallen commented Dec 17, 2015

stale bot commented Jun 21, 2019

stale bot commented Jul 5, 2019

Nodes won't die if zookeeper went away. #2016

Nodes won't die if zookeeper went away. #2016

Comments

KenjiTakahashi commented Nov 27, 2015

gianm commented Nov 27, 2015

KenjiTakahashi commented Dec 17, 2015

drcrallen commented Dec 17, 2015

stale bot commented Jun 21, 2019

stale bot commented Jul 5, 2019