./script/delayed_job restart bounces processes, but no new PIDs #3

chrisfinne · 2009-08-28T17:21:51Z

the PID files are never created for the new process(es), so subsequent restarts and stops won't work.

Not sure if this is a problem the the daemons gem (i'm on the latest 1.0.10) or delayed_job.

I see this on my Mac and Ubuntu boxes.

ghost · 2009-08-31T02:47:22Z

I've had this problem too. I haven't been able to track it down, but it seems like an issue with the daemons library

chrisfinne · 2009-08-31T10:59:30Z

Here's my hack in my cap script to restart it. (I hard-code to launch 3 delayed_job processes).
http://gist.github.com/178397

jerodsanto · 2009-09-23T01:37:39Z

I have this problem as well. Will try chrisfinne's hack until somebody comes up with a fix.

rmm5t · 2009-09-23T17:41:53Z

I just ran into this problem as well. It appears to be a problem with the daemons gem. When the daemons gem "restarts" it stops, sleeps for 1 sec, and then starts. Pid file cleanup happens all over the place (I think incorrectly), and it looks like a race condition occurs. The daemons gem deletes pid files when you call stop, when the daemon exits normally, and when it traps a kill signal. It can take delayed_job a few seconds to respond to a kill signal because it first finishes what it was doing. Meanwhile, a new delayed_job daemon is forked (even after daemons 1 sec delay) and a new pid file is written. After the restart, the cleanup tasks of the original daemon clean up the pid files again -- incorrectly blowing away the new pid file.

In summary, the problem looks to be the daemons gem, and that gem should wait for the original process to stop running before restarting a new process.

rmm5t · 2009-09-23T17:55:54Z

Turns out this is a known problem with the daemons gem. Too bad the simple fix hasn't been applied.
http://rubyforge.org/tracker/index.php?func=detail&aid=21050&group_id=524&atid=2084

rmm5t · 2009-09-23T18:07:46Z

There's a fork of daemons that fixes the problem. http://github.com/ghazel/daemons

$ sudo gem uninstall daemons
$ sudo gem install ghazel-daemons

v1.0.11 works well for me and I'm going to start freezing my apps that use delayed_job to the ghazel-daemons gem.

chrisfinne · 2009-09-24T04:36:34Z

closing

rmm5t · 2009-09-24T13:02:02Z

I'm not sure this is worth closing until either daemons is fixed, delayed_job puts an explicit dependency on ghazel-daemons, or delayed_job implements a workaround.

jerodsanto · 2009-09-24T13:12:43Z

I agree, Its much easier for others to find if it is open and technically the issue has not been resolved.

chrisfinne · 2009-09-24T17:16:46Z

I figured that since it was definitively proven to be another package's bug and a solid workaround beyond my ugly hack was detailed, I'd close it, but you make some good points, so I'll leave it open.

bkeepers · 2009-09-28T16:16:05Z

So is everyone still having this issue with the latest version?

rmm5t · 2009-09-28T16:58:37Z

Brandon, Yes, the restart problem is in daemons-1.0.10 and has been there for about a year.

Edit: To clarify, ghazel-daemons-1.0.11 fixes the problem, but that fork is not well known and most people running delayed_job have daemons-1.0.10 installed.

bkeepers · 2009-09-29T01:17:15Z

Has anyone tried to contact the maintainer of daemons? I'm thinking about just ditching it altogether and trying to figure out a different solution.

rmm5t · 2009-09-29T02:31:29Z

Brandon, The maintainer of daemons (Thomas Uehlinger) responded to the associated ticket about a year ago, but nothing since. I haven't tried to contact him myself. It doesn't look like there's been any activity in daemons since either.
http://rubyforge.org/tracker/index.php?func=detail&aid=21050&group_id=524&atid=2084

Perhaps daemon-spawn helps, though the gem on rubyforge is either not published yet or missing.
http://github.com/alexvollmer/daemon-spawn

tcocca · 2009-10-06T12:11:24Z

chrisfinne's hack worked perfect for me after struggling with this for so long.

Can this be overriddden in the config/deploy.rb instead of editing the plugin or gem?

dlegg · 2009-10-08T13:19:23Z

One thing I have noticed out of all of this is that restart option definitely doesn't work regardless. The pidfile gets blown away but the process isn't actually stopped or started which means that you can start another process with the zombie hanging around. You have to do an explicit stop and then start if you want to restart delayed_job, and that seems to work. I haven't done enough testing yet, but with chrisfinne's script above are we saying that we can have a similar thing happen with an explicit stop/start, hence waiting for the process to actually end?

tcocca · 2009-10-08T20:55:55Z

dlegg, correct, the restart doesn't work. chrisfinne's script does an explicit stop, then keeps checking for the process to actually stop and then not call start again till that process has stopped and the pid has dissapeared. This worked great for me. In the cap deploy you will see how long it takes for the process to actually stop with the "waiting for process to stop ..." text.

I would recommend trying this.

I haven't tried the other version of the daemons gem though so I can't speak to that.

~ tom

scottj97 · 2009-10-30T01:57:11Z

chrisfinne's recipe works great, unless your deploy server is also your development server, in which case it will wait forever because it sees the 'cap delayed_job:restart' task in the process list. (I took care of that with another grep -v.)

What I don't understand is why an idle delayed_job server should take 20 seconds or more to exit??

jimeh · 2010-03-26T22:49:34Z

I've gone a slightly different way in making sure that delayed_job stops properly. With a combination of lsof, grep, and awk I'm killing all ruby processes which have the specific application's delayed_jobs.log file open.

It's working quite well and fast for me so far:
http://gist.github.com/345494

UPDATE: Here's a MUCH better fix, which adds the changes from ghazel's fork to the 1.0.10 deamons gem via overloading:
http://gist.github.com/346160

sunkencity · 2010-04-12T13:21:43Z

I still get the problem with no pid file no matter if I use ghazel-daemons gem or monkeypatching the same things. Right now I'm using chrisfinnes script to ensure shutdown and that works, and such a solution seems to be the right solution anyway, what if there's a long running email job or such.

jimeh · 2010-04-12T13:24:38Z

@sunkencity: I've actually ended up using the ghazel-daemons gem in the end. It's a little tricky, as you need to do this in your environment.rb file:

config.gem "ghazel-daemons", :lib => "daemons"
gem "ghazel-daemons"
require "daemons"

sunkencity · 2010-04-12T13:34:53Z

OK, thanks! I thought I had uninstalled daemons and everything was fine but I had forgotten that my capistrano automatically installs any missing gem dependencies, so I guess I can switch to that now.

Here's an extra task I use to make sure that the reload went well

http://gist.github.com/363553

jimeh · 2010-04-12T15:34:38Z

With the above environment.rb config you don't have to uninstall the daemons gem, the ghazel-daemons gem is force loaded.

Also, I see you took a similar approach to me when it comes to seeing if there are any orphaned DJ daemons running.

Incase you might find it useful, here's the final delayed_job Capistrano tasks I've ended up using: http://gist.github.com/345494

seboslaw · 2010-06-29T14:33:58Z

I am also seeing this problem :(
Running rails3-beta4 with delayed_job installed as a plugin (have tried it as a gem before), daemons (1.0.10 - ghazel-daemons-1.0.11 didn't make any difference) and ruby 1.8.7p249.

Weird thing is that it runs fine under OSX but quits right after the start on my production ubuntu box. "script/delayed_job run" runs fine on both...

euanmaxwell · 2010-07-21T13:41:46Z

I hit this problem on Ubuntu 10.04 today. I'd upgraded the daemons gem to 1.1.0 yesterday and DJ stopped working, it claimed to be forking the workers but they were immediately dying and the log file had some funny binary input. Removing 1.1.0 to force DJ to use 1.0.10 seems to have solved the problem for me.

badnaam · 2010-08-02T21:36:18Z

I have this in my enviornment.rb

config.gem 'delayed_job', :source => 'http://rubygems.org', :version => "2.1.0.pre"
config.gem "ghazel-daemons", :lib => "daemons", :source => 'http://gems.github.com'
gem "ghazel-daemons"
require "daemons"
But I stil can't get delayed_job to restart from capistrano.

desc "Restart the delayed_job process"
task :delayed_job_restart, :roles => :app do
    run "cd #{current_path};#{get_rails_env} script/delayed_job restart"
end

MBO · 2010-09-07T10:36:38Z

I have same problem and I wrote this task to restart delayed_job without killing instantly all jobs of waiting in infinite loop if there are more dj's on server running. Works so far. Requires *NIX environment with awk and lsof

http://gist.github.com/568143

thoughtless · 2010-12-02T06:18:18Z

While this ticket is a real issue (and is related to https://github.com/collectiveidea/delayed_job/issues#issue/81 and https://github.com/collectiveidea/delayed_job/issues#issue/100) you must be careful when delayed job fails while writing nothing to delayed_job.log. Other problems besides this one can cause that problem. For example, if there is a database problem (such as an error in database.yml or a migration hasn't been run yet) you could get very similar symptoms.

Always make sure you check BOTH delayed_job.log and production.log (or whatever environment you are running delayed_job in). Delayed Job's catch-all exception handler outputs to the rails log, not to delayed_job.log.

christophercotton · 2011-03-08T02:12:44Z

In our case, starting multiple delayed_jobs -n 5 with restart, the PIDs do not get created. The reason is because intermediate processes are created and die before getting a chance to write out the PIDs. The Daemons::Controller.run if it is 'start' calls '@group.new_application.start' and if it is a restart calls '@group.start_all' The start_all forks a new process for each application to start (even if it is only one) the "start" just waits for the delayed_job to start correctly.

Processes in a restart

script/delayed_job (pid 1)
    (Daemons::ApplicationGroup) @group.start_all (which forks)
         application.start (pid 2)
              (since it isn't :ontop) call_as_daemon
                  delayed_worker_1 (pid 4)
         application.start (pid 3)
              (since it isn't :ontop) call_as_daemon
                  delayed_worker_1 (pid 5)

the start_all (and pid 1) will exit immediately after all the forks. It doesn't wait for each fork to finish. pid2 normally has enough time to write out the PID file. Though I'm guessing if your system is fast about launching everything, maybe none of them will write out PID files. Our case we would get the delayed_job started, but the PID won't be written out, and then we would start getting multiple processes because it had thought it wasn't started.

Our solution was just to put a "sleep 5" at the end of the script/delayed_job This seemed to be enough time to allow the PID to get created.

Daemons really seems to be broken and should be fixed (either just waiting for the forks to finish, or just don't fork during the start_all) or Delayed_Job should move to something else as a main method of daemonizing.

airblade · 2011-10-19T09:48:12Z

Why does it take so long (20s or more) for an idle delayed_job worker to stop?

I'm on delayed_job 2.1.4, daemons 1.1.0, Rails 3.0.10, and Ruby 1.9.2-p290.

/cc @scottj97

thoughtless · 2011-10-19T11:15:38Z

The script Delayed Job uses to stop idle processes loads the entire Rails environment before shutting down the worker. As far as I know, this is not necessary. Theoretically all that is needed to make it faster is to write a shutdown script that doesn't load Rails. But I don't remember the specifics well enough to estimate how easy/hard that would be for Delayed Job.
I've been toying with something like delayed job (https://github.com/thoughtless/angael) which takes this approach. That gem uses a manager process which does not use Rails, but the worker processes can use Rails. You just need to send the worker manager SIGINT and it will perform a graceful shutdown. I'm not recommend my gem as a drop-in replacement for delayed job. Delayed job is a battle-tested solution. My gem aims to be better in certain circumstances, but it is has only been used (to my knowledge) in a single production application. YMMV, etc.

andrewdsmith · 2012-03-01T14:45:36Z

According to the release announcement of 1.1.0, the broken behaviour reported here is fixed now. The referenced bug report has also been (long) closed by the maintainer. Not sure if issue #81 stops people upgrading, however.

garethrees · 2012-08-22T15:09:09Z

The v2.0 branch still seems to use daemons 1.0.10. Daemons is now on 1.1.9. Is there a reason delayed_job v2.0 is not using this?

johncant · 2012-10-02T08:27:40Z

@garethrees, yeah. daemons 1.1.0 breaks delayed_job incurring issue #81, but downgrading to daemons 1.0.10 seems to fix it. Daemons 1.1.9 didn't work for me either.

rchampourlier · 2012-12-19T09:37:17Z

daemons 1.1.0 wasn't working for me (neither script/delayed_job run nor start), and reverting to 1.0.10 only solved the run part.

So I decided to try the approach describe here, using daemon-spawn gem instead. Check this gist too.

jaredmoody mentioned this issue Oct 26, 2012

script/delayed_job start fails with daemons 1.1.0 on Ubuntu 8.10 #81

Closed

albus522 closed this as completed Sep 24, 2014

mariodarco mentioned this issue May 13, 2015

Delayed jobs using outdated code after a restart. #811

Closed

./script/delayed_job restart bounces processes, but no new PIDs #3

./script/delayed_job restart bounces processes, but no new PIDs #3

Comments

chrisfinne commented Aug 28, 2009

ghost commented Aug 31, 2009

chrisfinne commented Aug 31, 2009

jerodsanto commented Sep 23, 2009

rmm5t commented Sep 23, 2009

rmm5t commented Sep 23, 2009

rmm5t commented Sep 23, 2009

chrisfinne commented Sep 24, 2009

rmm5t commented Sep 24, 2009

jerodsanto commented Sep 24, 2009

chrisfinne commented Sep 24, 2009

bkeepers commented Sep 28, 2009

rmm5t commented Sep 28, 2009

bkeepers commented Sep 29, 2009

rmm5t commented Sep 29, 2009

tcocca commented Oct 6, 2009

dlegg commented Oct 8, 2009

tcocca commented Oct 8, 2009

scottj97 commented Oct 30, 2009

jimeh commented Mar 26, 2010

sunkencity commented Apr 12, 2010

jimeh commented Apr 12, 2010

sunkencity commented Apr 12, 2010

jimeh commented Apr 12, 2010

seboslaw commented Jun 29, 2010

euanmaxwell commented Jul 21, 2010

badnaam commented Aug 2, 2010

MBO commented Sep 7, 2010

thoughtless commented Dec 2, 2010

christophercotton commented Mar 8, 2011

airblade commented Oct 19, 2011

thoughtless commented Oct 19, 2011

andrewdsmith commented Mar 1, 2012

garethrees commented Aug 22, 2012

johncant commented Oct 2, 2012

rchampourlier commented Dec 19, 2012