Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[centos][init] Wait for supervisord to stop before kill -9ing it #2349

Merged
merged 1 commit into from
Mar 15, 2016

Conversation

olivielpeau
Copy link
Member

Without this fix, the stop command (-> killproc) would only wait for
0.1 second before kill -9ing supervisord, so if the collector for
instance took more than that to stop, a subsequent start would
fail because we'd try to start a collector before the old one had died.

This fix uses an option of killproc so that it waits for 30 seconds
(similar to our init script on debian) before kill -9ing
supervisord (which is plenty of time since supervisord kill -9s
the processes that haven't stopped after 10 seconds, by default).

Without the fix, here is what can happen when we restart the agent (supervisord.log, I removed the log lines about jmxfetch and go-metro since they're not relevant):

2016-03-11 21:50:28,437 WARN received SIGTERM indicating exit request
2016-03-11 21:50:28,437 INFO waiting for dogstatsd, forwarder, collector to die
2016-03-11 21:50:28,458 INFO stopped: forwarder (exit status 0)
2016-03-11 21:50:28,498 INFO stopped: dogstatsd (exit status 0)
2016-03-11 21:50:32,232 CRIT Set uid to user 497
2016-03-11 21:50:32,562 INFO RPC interface 'supervisor' initialized
2016-03-11 21:50:32,562 CRIT Server 'unix_http_server' running without any HTTP authentication checking
2016-03-11 21:50:32,564 INFO daemonizing the supervisord process
2016-03-11 21:50:32,565 INFO supervisord started with pid 3688
2016-03-11 21:50:32,640 INFO spawned: 'dogstatsd' with pid 3694
2016-03-11 21:50:32,651 INFO spawned: 'forwarder' with pid 3696
2016-03-11 21:50:32,654 INFO spawned: 'collector' with pid 3697
2016-03-11 21:50:34,837 INFO exited: collector (exit status 1; not expected)

(and if the old collector process still hasn't died after 3 start attempts to start a new collector process, supervisor gives up)

With the fix:

2016-03-11 22:25:02,841 WARN received SIGTERM indicating exit request
2016-03-11 22:25:02,847 INFO waiting for dogstatsd, forwarder, collector to die
2016-03-11 22:25:02,880 INFO stopped: forwarder (exit status 0)
2016-03-11 22:25:02,929 INFO stopped: dogstatsd (exit status 0)
2016-03-11 22:25:05,932 INFO waiting for collector to die
2016-03-11 22:25:08,939 INFO waiting for collector to die
2016-03-11 22:25:12,249 INFO waiting for collector to die
2016-03-11 22:25:13,250 WARN killing 'collector' (4210) with SIGKILL
2016-03-11 22:25:13,254 INFO stopped: collector (terminated by SIGKILL)
2016-03-11 22:25:14,867 CRIT Set uid to user 497
2016-03-11 22:25:14,918 INFO RPC interface 'supervisor' initialized
2016-03-11 22:25:14,918 CRIT Server 'unix_http_server' running without any HTTP authentication checking
2016-03-11 22:25:14,921 INFO daemonizing the supervisord process
2016-03-11 22:25:14,922 INFO supervisord started with pid 4451
2016-03-11 22:25:15,174 INFO spawned: 'dogstatsd' with pid 4457
2016-03-11 22:25:15,206 INFO spawned: 'forwarder' with pid 4459
2016-03-11 22:25:15,208 INFO spawned: 'collector' with pid 4460
2016-03-11 22:25:20,428 INFO success: dogstatsd entered RUNNING state, process has stayed up for > than 5 seconds (startsecs)
2016-03-11 22:25:20,428 INFO success: forwarder entered RUNNING state, process has stayed up for > than 5 seconds (startsecs)
2016-03-11 22:25:20,428 INFO success: collector entered RUNNING state, process has stayed up for > than 5 seconds (startsecs)

Tested successfully on CentOS 5, 6 and 7

@olivielpeau olivielpeau added this to the 5.7.2 milestone Mar 11, 2016
Without this fix, the `stop` command (-> `killproc`) would only wait for
0.1 second before `kill -9`ing supervisord, so if the collector for
instance took more than that to stop, a subsequent `start` would
fail because we'd try to start a collector before the old one had died.

This fix uses an option of `killproc` so that it waits for 30 seconds
(similar to our init script on debian) before `kill -9`ing
supervisord (which is plenty of time since supervisord `kill -9`s
the processes that haven't stopped after 10 seconds, by default).

[skip ci]
@truthbk
Copy link
Member

truthbk commented Mar 15, 2016

make sense 👍

olivielpeau added a commit that referenced this pull request Mar 15, 2016
…-init

[centos][init] Wait for supervisord to stop before `kill -9`ing it
@olivielpeau olivielpeau merged commit eab4b5f into master Mar 15, 2016
@olivielpeau olivielpeau deleted the olivielpeau/wait-on-stop-centos-init branch March 15, 2016 22:06
@olivielpeau olivielpeau modified the milestones: 5.7.3, 5.7.2 Mar 21, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants