Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not kill process on service shutdown #12298

Merged
merged 1 commit into from Aug 10, 2015
Merged

Conversation

tlrx
Copy link
Member

@tlrx tlrx commented Jul 16, 2015

When installed as a service with a DEB or RPM package, we should gently wait for elasticsearch to stop (flushing indices on closing can take some time) and never kill the process.

Closes #11248

@tlrx tlrx added v2.0.0-beta1 review :Delivery/Packaging RPM and deb packaging, tar and zip archives, shell and batch scripts labels Jul 16, 2015
@tlrx
Copy link
Member Author

tlrx commented Jul 16, 2015

@costin the Windows service manager accepts a ES_STOP_TIMEOUT variable which is set to 0. I don't really find any documentation about this. I guess procrun is used but I'm not sure of how the process is terminated on Windows. Do you have any clue or pointer? Thanks.

@costin
Copy link
Member

costin commented Jul 16, 2015

From the (commons-daemon docs](
http://commons.apache.org/proper/commons-daemon/procrun.html):

--StopTimeoutNo TimeoutDefines the timeout in seconds that procrun waits
for service to exit gracefully.

On Thu, Jul 16, 2015 at 10:48 PM, Tanguy Leroux notifications@github.com
wrote:

@costin https://github.com/costin the Windows service manager accepts a
ES_STOP_TIMEOUT variable which is set to 0. I don't really find any
documentation about this. I guess procrun is used but I'm not sure of how
the process is terminated on Windows. Do you have any clue or pointer?
Thanks.


Reply to this email directly or view it on GitHub
#12298 (comment)
.

@tlrx
Copy link
Member Author

tlrx commented Jul 16, 2015

@costin I supposed - wrongly - that 0 had a specific signification compared with no timeout at all. So it means that nothing need to be done in windows scripts.

Thanks for your quick response.

@@ -14,3 +14,6 @@ packaging.type=rpm
# Custom header for package scripts
packaging.scripts.header=
packaging.scripts.footer=# Built for ${project.name}-${project.version} (${packaging.type})

# Maximum time to wait for elasticsearch to stop (default to 1 day)
packaging.elasticsearch.stopping.timeout=86400
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Complains about no newline at end of file.

@nik9000
Copy link
Member

nik9000 commented Aug 3, 2015

So I'm right there with you that kill -9 is going to cause trouble. But maybe we should have some more documentation on what to do if the process doesn't die? When you hit a bug that eats all the memory nothing is going to stop elasticsearch but kill -9. Maybe a --force option or something. Its just that I've relied on this kill -9 behavior in the past when some nodes have filled their heap.

spinscale added a commit to spinscale/elasticsearch that referenced this pull request Aug 5, 2015
This is a partial backport of elastic#12298. This fixes an
issue that rpms could not be upgraded, because of a bad
number check in the postrm script, which exits with a
failure.

Closes elastic#12606
Closes elastic#12630
@clintongormley
Copy link

I'm OK with this going in without a force option. What does any sysadmin do if any process won't stop? They kill -9 it. I don't think we need to document everything here. I'd rather get the fix in.

@tlrx
Copy link
Member Author

tlrx commented Aug 7, 2015

@nik9000 I tend to agree with @clintongormley on this. I rebased the code, can I push this?

@nik9000
Copy link
Member

nik9000 commented Aug 7, 2015

Sure
On Aug 7, 2015 4:37 AM, "Tanguy Leroux" notifications@github.com wrote:

@nik9000 https://github.com/nik9000 I tend to agree with @clintongormley
https://github.com/clintongormley on this. I rebased the code, can I
push this?


Reply to this email directly or view it on GitHub
#12298 (comment)
.

When installed as a service with a DEB or RPM package, we should gently wait for elasticsearch to stop (flushing indices on closing can take some time) and never kill the process.

Closes elastic#11248
@tlrx tlrx merged commit b1fd0a6 into elastic:master Aug 10, 2015
@tlrx tlrx removed the review label Aug 10, 2015
@tlrx tlrx deleted the do-not-kill-process branch August 10, 2015 08:07
@tlrx
Copy link
Member Author

tlrx commented Aug 10, 2015

@nik9000 thanks!

@sathieu
Copy link

sathieu commented Sep 15, 2017

I had a problem with this setting blocking a server shutdown. Maybe change this to 10 minutes or so ?

(NB: the problem was coming from a snapshot directory on an unavailable NAS)

@tlrx
Copy link
Member Author

tlrx commented Sep 15, 2017

Maybe change this to 10 minutes or so ?

We can't do that: the scripts must wait for Elasticsearch to stop nicely and it can sometimes takes more than 10 min. The good thing is that it let you the time to investigate the issue and take the appropriate action (here, kill the node manually?). Doing this automatically in the scripts would have obfuscate the underlying issue.

Also, if you have logs and a reproducing scenario it might worth it to create a new issue for the snapshot vs unavailable NAS.

@sathieu
Copy link

sathieu commented Sep 15, 2017

The problem is that it will prevent the shutdown, but I can't investigate anything as sshd is stopped and even I can't have a console (ttys are already closed).

@mark-vieira mark-vieira added the Team:Delivery Meta label for Delivery team label Nov 11, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Delivery/Packaging RPM and deb packaging, tar and zip archives, shell and batch scripts >enhancement Team:Delivery Meta label for Delivery team v2.0.0-beta1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Do not kill process on service shutdown
6 participants