Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nodes won't die if zookeeper went away. #2016

Closed
KenjiTakahashi opened this issue Nov 27, 2015 · 5 comments
Closed

Nodes won't die if zookeeper went away. #2016

KenjiTakahashi opened this issue Nov 27, 2015 · 5 comments

Comments

@KenjiTakahashi
Copy link
Contributor

Scenario:

  1. Everything is up and running.
  2. Zookeeper goes away (for whatever reason).
  3. Nodes start spamming:
2015-11-27T01:44:08,767 INFO [Curator-Framework-0-SendThread(monitowl-dev:2181)] org.apache.zookeeper.ClientCnxn - Opening socket connection to server monitowl-dev/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error)
2015-11-27T01:44:08,767 WARN [Curator-Framework-0-SendThread(monitowl-dev:2181)] org.apache.zookeeper.ClientCnxn - Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) ~[?:1.8.0_66]
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717) ~[?:1.8.0_66]
at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361) ~[zookeeper-3.4.6.jar:3.4.6-1569965]
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081) [zookeeper-3.4.6.jar:3.4.6-1569965]

(Which is "fine", I guess.)
4. I do kill -TERM <node_pid>.
5. Node says:

2015-11-27T01:44:09,446 INFO [Thread-42] com.metamx.common.lifecycle.Lifecycle - Running shutdown hook
2015-11-27 01:44:09,464 FATAL Unable to register shutdown hook because JVM is shutting down.

(Which seems to be normal, they always say that before dying.)
6. But... it does not die. Process is still displayed as running (in htop, etc.), no further logs or anything, though.

Using 0.8.2, happens for all kinds of nodes AFAICT.

Possibly worth noting that when I deliberately start a node when zookeeper is down, it exits fine on SIGTERM.

Probably wouldn't even notice it, but this confuses our systemd configs quite a bit :-/.

@gianm
Copy link
Contributor

gianm commented Nov 27, 2015

I believe this is the same issue as #660.

@KenjiTakahashi
Copy link
Contributor Author

I think you're right, although in my case it took ~40 minutes for nodes to go down after receiving SIGTERM. Anyway, any chances of this being resolved anytime soon?

@drcrallen
Copy link
Contributor

Absence of logging is probably because of #1387 not being in stable yet.

@stale
Copy link

stale bot commented Jun 21, 2019

This issue has been marked as stale due to 280 days of inactivity. It will be closed in 2 weeks if no further activity occurs. If this issue is still relevant, please simply write any comment. Even if closed, you can still revive the issue at any time or discuss it on the dev@druid.apache.org list. Thank you for your contributions.

@stale stale bot added the stale label Jun 21, 2019
@stale
Copy link

stale bot commented Jul 5, 2019

This issue has been closed due to lack of activity. If you think that is incorrect, or the issue requires additional review, you can revive the issue at any time.

@stale stale bot closed this as completed Jul 5, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants