Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

god continues to monitor and resurrect process after executing 'god stop <task name>' #250

Open
eanlain opened this issue Nov 10, 2017 · 0 comments

Comments

@eanlain
Copy link

eanlain commented Nov 10, 2017

God resurrects a process after god stop <task name> is run if the god config for that process uses keepalive, a custom stop command using -QUIT, and does not terminate within the stop_timeout period.

Reproduction

Setup

Create a simple script, fodder.rb, that just logs messages:

require 'logger'

@logger = Logger.new "/home/pair/fodder/output.log"
@logger.info "Start feeding!"
@feeding_time = true

def quit_feeding
  @feeding_time = false
end

Signal.trap("QUIT") { quit_feeding }

while @feeding_time do
  @logger.info "More fodder!"
  sleep 1
end

@logger.info "Starting graceful quit..."
sleep_time = 12
@logger.info "Sleeping for #{sleep_time}s..."
sleep sleep_time
@logger.info "We're finally quitting, BYE!"

Create the following simple god config, god-fodder.rb, which uses keepalive, a custom stop command, and an interval:

God.watch do |w|
  PID_PATH = "/var/run/god/fodder.pid"

  w.name = "fodder"
  w.start = "ruby /home/pair/fodder/fodder.rb"
  w.stop = "kill -QUIT `cat #{PID_PATH}`"
  w.keepalive

  w.interval = 1.second
end

Execution

Start god with the god-fodder config, then after a few seconds execute god's stop command:

$ god -c god-fodder.rb
$ god stop fodder

Output

The output from the fodder script:

I, [2017-10-24T15:10:29.797266 #30297]  INFO -- : Start feeding!
I, [2017-10-24T15:10:29.797423 #30297]  INFO -- : More fodder!
I, [2017-10-24T15:10:30.797660 #30297]  INFO -- : More fodder!
I, [2017-10-24T15:10:31.798036 #30297]  INFO -- : More fodder!
I, [2017-10-24T15:10:32.798408 #30297]  INFO -- : More fodder!
I, [2017-10-24T15:10:33.798786 #30297]  INFO -- : More fodder!
I, [2017-10-24T15:10:34.799169 #30297]  INFO -- : More fodder!
I, [2017-10-24T15:10:35.799517 #30297]  INFO -- : More fodder!
I, [2017-10-24T15:10:36.799891 #30297]  INFO -- : More fodder!
I, [2017-10-24T15:10:37.800278 #30297]  INFO -- : More fodder!
I, [2017-10-24T15:10:38.800637 #30297]  INFO -- : More fodder!
I, [2017-10-24T15:10:39.801020 #30297]  INFO -- : More fodder!
I, [2017-10-24T15:10:40.801379 #30297]  INFO -- : Starting graceful quit...
I, [2017-10-24T15:10:40.801521 #30297]  INFO -- : Sleeping for 12s...
I, [2017-10-24T15:10:50.689377 #30352]  INFO -- : Start feeding!
I, [2017-10-24T15:10:50.689526 #30352]  INFO -- : More fodder!
I, [2017-10-24T15:10:51.689765 #30352]  INFO -- : More fodder!
I, [2017-10-24T15:10:52.690183 #30352]  INFO -- : More fodder!
I, [2017-10-24T15:10:53.690517 #30352]  INFO -- : More fodder!
I, [2017-10-24T15:10:54.690874 #30352]  INFO -- : More fodder!
I, [2017-10-24T15:10:55.691255 #30352]  INFO -- : More fodder!
I, [2017-10-24T15:10:56.691608 #30352]  INFO -- : More fodder!

Output from god's log:

Oct 24 15:10:29 god[30255]: Syslog enabled.
Oct 24 15:10:29 god[30257]: Loading /etc/god/god.conf.rb
Oct 24 15:10:29 god[30257]: Using pid file directory: /var/run/god
Oct 24 15:10:29 god[30257]: Started on drbunix:///tmp/god.17165.sock
Oct 24 15:10:29 god[30257]: fodder move 'unmonitored' to 'init'
Oct 24 15:10:29 god[30257]: fodder moved 'unmonitored' to 'init'
Oct 24 15:10:29 god[30257]: fodder [trigger] process is not running (ProcessRunning)
Oct 24 15:10:29 god[30257]: fodder move 'init' to 'start'
Oct 24 15:10:29 god[30257]: fodder start: ruby /home/pair/fodder/fodder.rb
Oct 24 15:10:29 god[30257]: fodder moved 'init' to 'start'
Oct 24 15:10:29 god[30257]: fodder [trigger] process is running (ProcessRunning)
Oct 24 15:10:29 god[30257]: fodder move 'start' to 'up'
Oct 24 15:10:29 god[30257]: fodder registered 'proc_exit' event for pid 30297
Oct 24 15:10:29 god[30257]: fodder moved 'start' to 'up'
Oct 24 15:10:40 god[30257]: fodder stop: kill -QUIT `cat /var/run/god/fodder.pid`
Oct 24 15:10:40 god[30257]: fodder ensuring stop...
Oct 24 15:10:50 god[30257]: fodder still alive after 10s; sent SIGKILL
Oct 24 15:10:50 god[30257]: fodder move 'up' to 'unmonitored'
Oct 24 15:10:50 god[30257]: fodder deregistered 'proc_exit' event for pid 30322
Oct 24 15:10:50 god[30257]: fodder moved 'up' to 'unmonitored'
Oct 24 15:10:50 god[30257]: fodder [trigger] process 30297 exited {:pid=>30297, :exit_code=>9, :exit_signal=>17, :thread_group_id=>30297} (ProcessExits)
Oct 24 15:10:50 god[30257]: fodder move 'unmonitored' to 'start'
Oct 24 15:10:50 god[30257]: fodder start: ruby /home/pair/fodder/fodder.rb
Oct 24 15:10:50 god[30257]: fodder moved 'unmonitored' to 'start'
Oct 24 15:10:50 god[30257]: fodder [trigger] process is running (ProcessRunning)
Oct 24 15:10:50 god[30257]: fodder move 'start' to 'up'
Oct 24 15:10:50 god[30257]: fodder registered 'proc_exit' event for pid 30352
Oct 24 15:10:50 god[30257]: fodder moved 'start' to 'up' 

Investigation

During initialization, god creates a new thread for an EventHandler loop and then registers the :process_exits condition used in the keepalive method with the EventHandler.

When a command is sent through the CLI (e.g. god stop fodder), a new thread is created. This new thread pushes a :stop event, followed by an :unmonitored event onto the god event driver queue.

The :stop event will be processed on the original monitoring thread (not the EventHandler loop thread or the CLI command thread), and if the :stop_timeout time period is reached and the original process is still running, a KILL signal will be sent through this original thread. This particular stop logic never tries to detach or stop tracking the PID, instead, god continues to track our PID process.

At this point the KILL signal triggers the :process_exits condition on the EventHandler thread, which then pushes an event responsible for starting the process back up onto the event queue. The original monitoring thread immediately tries to handle that trigger event.

IF a stop_grace is added to the god config, the pause in code execution, via sleep in lib/god/watch.rb after executing the action actually allows one of the thread loops to clean up after itself and clear out our events queue containing the most recently triggered :process_exits event. Once that thread cleans up, there's no action for the original monitoring thread to take since our app is registered as :unmonitored at this point. This particular behavior skirts the problem and allows god to move the process to unmonitored, but it does not fix the underlying issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant