Do not kill process on service shutdown #12298

tlrx · 2015-07-16T19:44:38Z

When installed as a service with a DEB or RPM package, we should gently wait for elasticsearch to stop (flushing indices on closing can take some time) and never kill the process.

Closes #11248

tlrx · 2015-07-16T19:47:15Z

@costin the Windows service manager accepts a ES_STOP_TIMEOUT variable which is set to 0. I don't really find any documentation about this. I guess procrun is used but I'm not sure of how the process is terminated on Windows. Do you have any clue or pointer? Thanks.

costin · 2015-07-16T20:16:01Z

From the (commons-daemon docs](
http://commons.apache.org/proper/commons-daemon/procrun.html):

--StopTimeoutNo TimeoutDefines the timeout in seconds that procrun waits
for service to exit gracefully.

On Thu, Jul 16, 2015 at 10:48 PM, Tanguy Leroux notifications@github.com
wrote:

@costin https://github.com/costin the Windows service manager accepts a
ES_STOP_TIMEOUT variable which is set to 0. I don't really find any
documentation about this. I guess procrun is used but I'm not sure of how
the process is terminated on Windows. Do you have any clue or pointer?
Thanks.

—
Reply to this email directly or view it on GitHub
#12298 (comment)
.

tlrx · 2015-07-16T20:19:31Z

@costin I supposed - wrongly - that 0 had a specific signification compared with no timeout at all. So it means that nothing need to be done in windows scripts.

Thanks for your quick response.

nik9000 · 2015-08-03T14:01:06Z

distribution/rpm/src/main/packaging/packaging.properties

@@ -14,3 +14,6 @@ packaging.type=rpm
 # Custom header for package scripts
 packaging.scripts.header=
 packaging.scripts.footer=# Built for ${project.name}-${project.version} (${packaging.type})
+
+# Maximum time to wait for elasticsearch to stop (default to 1 day)
+packaging.elasticsearch.stopping.timeout=86400


Complains about no newline at end of file.

nik9000 · 2015-08-03T14:09:47Z

So I'm right there with you that kill -9 is going to cause trouble. But maybe we should have some more documentation on what to do if the process doesn't die? When you hit a bug that eats all the memory nothing is going to stop elasticsearch but kill -9. Maybe a --force option or something. Its just that I've relied on this kill -9 behavior in the past when some nodes have filled their heap.

This is a partial backport of elastic#12298. This fixes an issue that rpms could not be upgraded, because of a bad number check in the postrm script, which exits with a failure. Closes elastic#12606 Closes elastic#12630

clintongormley · 2015-08-05T12:27:23Z

I'm OK with this going in without a force option. What does any sysadmin do if any process won't stop? They kill -9 it. I don't think we need to document everything here. I'd rather get the fix in.

tlrx · 2015-08-07T08:37:36Z

@nik9000 I tend to agree with @clintongormley on this. I rebased the code, can I push this?

nik9000 · 2015-08-07T11:39:31Z

Sure
On Aug 7, 2015 4:37 AM, "Tanguy Leroux" notifications@github.com wrote:

@nik9000 https://github.com/nik9000 I tend to agree with @clintongormley
https://github.com/clintongormley on this. I rebased the code, can I
push this?

—
Reply to this email directly or view it on GitHub
#12298 (comment)
.

When installed as a service with a DEB or RPM package, we should gently wait for elasticsearch to stop (flushing indices on closing can take some time) and never kill the process. Closes elastic#11248

tlrx · 2015-08-10T08:10:37Z

@nik9000 thanks!

sathieu · 2017-09-15T08:59:16Z

I had a problem with this setting blocking a server shutdown. Maybe change this to 10 minutes or so ?

(NB: the problem was coming from a snapshot directory on an unavailable NAS)

tlrx · 2017-09-15T09:12:43Z

Maybe change this to 10 minutes or so ?

We can't do that: the scripts must wait for Elasticsearch to stop nicely and it can sometimes takes more than 10 min. The good thing is that it let you the time to investigate the issue and take the appropriate action (here, kill the node manually?). Doing this automatically in the scripts would have obfuscate the underlying issue.

Also, if you have logs and a reproducing scenario it might worth it to create a new issue for the snapshot vs unavailable NAS.

sathieu · 2017-09-15T09:20:38Z

The problem is that it will prevent the shutdown, but I can't investigate anything as sshd is stopped and even I can't have a console (ttys are already closed).

tlrx added v2.0.0-beta1 review :Delivery/Packaging RPM and deb packaging, tar and zip archives, shell and batch scripts labels Jul 16, 2015

clintongormley added the >enhancement label Jul 17, 2015

tlrx force-pushed the do-not-kill-process branch from 1e12523 to 2f1ca0f Compare August 3, 2015 12:13

nik9000 reviewed Aug 3, 2015
View reviewed changes

tlrx mentioned this pull request Aug 4, 2015

1.7.1 RPM warns "post remove script called with unknown argument" #12606

Closed

spinscale mentioned this pull request Aug 4, 2015

Fix upgrade RPM script #12630

Merged

tlrx force-pushed the do-not-kill-process branch from 2f1ca0f to 0f99b18 Compare August 7, 2015 08:37

Do not kill process on service shutdown

b1fd0a6

When installed as a service with a DEB or RPM package, we should gently wait for elasticsearch to stop (flushing indices on closing can take some time) and never kill the process. Closes elastic#11248

tlrx force-pushed the do-not-kill-process branch from 0f99b18 to b1fd0a6 Compare August 10, 2015 08:05

tlrx merged commit b1fd0a6 into elastic:master Aug 10, 2015

tlrx removed the review label Aug 10, 2015

tlrx deleted the do-not-kill-process branch August 10, 2015 08:07

clintongormley mentioned this pull request Sep 19, 2015

Upgrading 1.7.1 to 1.7.2 with RPM causes postun script failed. #13565

Closed

jasontedor mentioned this pull request Jan 21, 2016

/etc/init.d/elasticsearch stop sleeps for 86400 seconds #15520

Closed

mark-vieira added the Team:Delivery Meta label for Delivery team label Nov 11, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Do not kill process on service shutdown #12298

Do not kill process on service shutdown #12298

tlrx commented Jul 16, 2015

tlrx commented Jul 16, 2015

costin commented Jul 16, 2015

tlrx commented Jul 16, 2015

nik9000 Aug 3, 2015

nik9000 commented Aug 3, 2015

clintongormley commented Aug 5, 2015

tlrx commented Aug 7, 2015

nik9000 commented Aug 7, 2015

tlrx commented Aug 10, 2015

sathieu commented Sep 15, 2017

tlrx commented Sep 15, 2017

sathieu commented Sep 15, 2017

Do not kill process on service shutdown #12298

Do not kill process on service shutdown #12298

Conversation

tlrx commented Jul 16, 2015

tlrx commented Jul 16, 2015

costin commented Jul 16, 2015

tlrx commented Jul 16, 2015

nik9000 Aug 3, 2015

Choose a reason for hiding this comment

nik9000 commented Aug 3, 2015

clintongormley commented Aug 5, 2015

tlrx commented Aug 7, 2015

nik9000 commented Aug 7, 2015

tlrx commented Aug 10, 2015

sathieu commented Sep 15, 2017

tlrx commented Sep 15, 2017

sathieu commented Sep 15, 2017