Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stop java process results in unclean shutdown with EXIT 143 or 130 depending on signal #1378

Closed
bjozet opened this issue May 21, 2015 · 10 comments
Labels

Comments

@bjozet
Copy link
Contributor

bjozet commented May 21, 2015

When i try to stop my processes, they EXIT with 143 or 130 status, depending on signal i send it (SIGTERM/SIGINT/SIGTTIN etc). I would wish for a clean shutdown with EXIT 0.
I'm running druid 0.7.1.1 (from binary package), as a non-privileged user on Ubuntu 12.04.5 LTS

My Java is:
java version "1.7.0_71" Java(TM) SE Runtime Environment (build 1.7.0_71-b14) Java HotSpot(TM) 64-Bit Server VM (build 24.71-b01, mixed mode)

The command-line to start the process I use is:

$ java -cp /etc/druid/broker:/opt/druid/current/lib/* -server -Xmx4G -Duser.timezone=UTC -Dfile.encoding=UTF-8 -Djava.io.tmpdir=/opt/druid/tmpdir -Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager io.druid.cli.Main server broker

when killing it i get:

INFO [Thread-43] com.metamx.common.lifecycle.Lifecycle - Running shutdown hook FATAL Unable to register shutdown hook because JVM is shutting down.

I've pasted full TRACE loglevel here:
http://paste.ubuntu.com/11241783/

An exit status of 0 would be nice!

@gianm
Copy link
Contributor

gianm commented May 22, 2015

I actually think it makes sense that they exit with those codes, since it lets you know what signal was used to kill them. That's a pretty standard thing that Unix processes do. Can you please explain a bit what your setup looks like and why an exit status of 0 would help you out?

@bjozet
Copy link
Contributor Author

bjozet commented May 22, 2015

A non-zero exit status signifies an abnormal program termination. See references.

http://en.wikipedia.org/wiki/Exit_status
https://www.gnu.org/software/libc/manual/html_node/Exit-Status.html
http://docs.oracle.com/javase/7/docs/api/java/lang/System.html#exit(int)

An exit status of 0, which signifies success, will help my start/stop/monitoring scripts determine if the process was shutdown successfully as a result of an administrative action, or if it failed during its execution and should be respawned or trigger alerts etc.

@drcrallen
Copy link
Contributor

I fixed the shutdown FATAL message in #1387

I'm in agreement with @gianm that if the process terminates due to signal a status of 0 is improper, and the return code should reflect a way to distinguish the signal used to terminate the task.

Adding a way to cleanly shutdown the server without relying on shutdown hooks is reasonable request though.

@gianm
Copy link
Contributor

gianm commented May 22, 2015

I suppose I can see both arguments making sense. @bjozet, openssh at least appears to agree with you. But, I don't see an easy way to influence the exit status of a java program that got SIGTERM'ed. You can register shutdown hooks to clean stuff up but you can't call System.exit or Runtime.exit from within those hooks.

Would it work to have your stop script touch a 'down' file so your monitoring script knows that the service is supposed to be down and isn't upset when it exits?

@bjozet
Copy link
Contributor Author

bjozet commented May 23, 2015

I suspect I might be asking the wrong questions, making false assumptions, and possibly having missed something in the docs despite reading it more than once.

What I want is a way to stop the druid processes gracefully, with a zero exit status.

I was under the impression that SIGTERM's was the proper way to do it for druid; and it would initiate shutdown-hooks accordingly. If there is another way, such as, for example using a start/stop jar in the same manner as tomcat/catalina does, that's totally fine too! Sure I can work around issues as these in various ways, including touching 'down'-files here and there, but that also adds abstraction of process logic (from an operations point of view).

Is there a proper way of stopping druid cleanly that I've totally overlooked?

@drcrallen
Copy link
Contributor

@bjozet : There is not currently a way to cleanly shutdown a general druid jvm and get an exit status of 0. In general the nodes are intended to never go down, and such having to shut one down is not normal course of action (Peons excluded).

I have some experimental Mesos work going on, and check for the return status of 143 as you have pointed out to verify that the task was killed.

I can see two kinds of shutdowns requested:

  1. Shutdown now and forever. Meaning the instance never intends to come back up and should ensure that the items it is responsible for retain high availability before it shuts down
  2. Restart. Meaning the instance expects to come back up with a few seconds of being shut down (due to upgrade or simply a reset of the JVM/heap)

@drcrallen
Copy link
Contributor

Note : #1521 adds the ability to shutdown Peons via an HTTP endpoint.

@drcrallen
Copy link
Contributor

The PR in question has been closed down until a proper replacement for Remote Task Runner is implemented.

MrAlias pushed a commit to MrAlias/druid that referenced this issue Sep 1, 2015
Accept the return of exit code 143 and 130 as these are the default
return codes for the standard way of stopping druid (either with a
SIGKILL, SIGINT, or SIGTERM).

This is intended to be a temporary fix until
[this](apache/druid#1378) bug is resolved.
@stale
Copy link

stale bot commented Jun 21, 2019

This issue has been marked as stale due to 280 days of inactivity. It will be closed in 2 weeks if no further activity occurs. If this issue is still relevant, please simply write any comment. Even if closed, you can still revive the issue at any time or discuss it on the dev@druid.apache.org list. Thank you for your contributions.

@stale stale bot added the stale label Jun 21, 2019
@stale
Copy link

stale bot commented Jul 5, 2019

This issue has been closed due to lack of activity. If you think that is incorrect, or the issue requires additional review, you can revive the issue at any time.

@stale stale bot closed this as completed Jul 5, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants