-
Notifications
You must be signed in to change notification settings - Fork 954
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
./script/delayed_job restart bounces processes, but no new PIDs #3
Comments
I've had this problem too. I haven't been able to track it down, but it seems like an issue with the daemons library |
Here's my hack in my cap script to restart it. (I hard-code to launch 3 delayed_job processes). |
I have this problem as well. Will try chrisfinne's hack until somebody comes up with a fix. |
I just ran into this problem as well. It appears to be a problem with the daemons gem. When the daemons gem "restarts" it stops, sleeps for 1 sec, and then starts. Pid file cleanup happens all over the place (I think incorrectly), and it looks like a race condition occurs. The daemons gem deletes pid files when you call stop, when the daemon exits normally, and when it traps a kill signal. It can take delayed_job a few seconds to respond to a kill signal because it first finishes what it was doing. Meanwhile, a new delayed_job daemon is forked (even after daemons 1 sec delay) and a new pid file is written. After the restart, the cleanup tasks of the original daemon clean up the pid files again -- incorrectly blowing away the new pid file. In summary, the problem looks to be the daemons gem, and that gem should wait for the original process to stop running before restarting a new process. |
Turns out this is a known problem with the daemons gem. Too bad the simple fix hasn't been applied. |
There's a fork of daemons that fixes the problem. http://github.com/ghazel/daemons
v1.0.11 works well for me and I'm going to start freezing my apps that use delayed_job to the ghazel-daemons gem. |
closing |
I'm not sure this is worth closing until either daemons is fixed, delayed_job puts an explicit dependency on ghazel-daemons, or delayed_job implements a workaround. |
I agree, Its much easier for others to find if it is open and technically the issue has not been resolved. |
I figured that since it was definitively proven to be another package's bug and a solid workaround beyond my ugly hack was detailed, I'd close it, but you make some good points, so I'll leave it open. |
So is everyone still having this issue with the latest version? |
Brandon, Yes, the restart problem is in daemons-1.0.10 and has been there for about a year. Edit: To clarify, ghazel-daemons-1.0.11 fixes the problem, but that fork is not well known and most people running delayed_job have daemons-1.0.10 installed. |
Has anyone tried to contact the maintainer of daemons? I'm thinking about just ditching it altogether and trying to figure out a different solution. |
Brandon, The maintainer of daemons (Thomas Uehlinger) responded to the associated ticket about a year ago, but nothing since. I haven't tried to contact him myself. It doesn't look like there's been any activity in daemons since either. Perhaps daemon-spawn helps, though the gem on rubyforge is either not published yet or missing. |
chrisfinne's hack worked perfect for me after struggling with this for so long. Can this be overriddden in the config/deploy.rb instead of editing the plugin or gem? |
One thing I have noticed out of all of this is that restart option definitely doesn't work regardless. The pidfile gets blown away but the process isn't actually stopped or started which means that you can start another process with the zombie hanging around. You have to do an explicit stop and then start if you want to restart delayed_job, and that seems to work. I haven't done enough testing yet, but with chrisfinne's script above are we saying that we can have a similar thing happen with an explicit stop/start, hence waiting for the process to actually end? |
dlegg, correct, the restart doesn't work. chrisfinne's script does an explicit stop, then keeps checking for the process to actually stop and then not call start again till that process has stopped and the pid has dissapeared. This worked great for me. In the cap deploy you will see how long it takes for the process to actually stop with the "waiting for process to stop ..." text. I would recommend trying this. I haven't tried the other version of the daemons gem though so I can't speak to that. ~ tom |
chrisfinne's recipe works great, unless your deploy server is also your development server, in which case it will wait forever because it sees the 'cap delayed_job:restart' task in the process list. (I took care of that with another grep -v.) What I don't understand is why an idle delayed_job server should take 20 seconds or more to exit?? |
I've gone a slightly different way in making sure that delayed_job stops properly. With a combination of It's working quite well and fast for me so far: UPDATE: Here's a MUCH better fix, which adds the changes from ghazel's fork to the 1.0.10 deamons gem via overloading: |
I still get the problem with no pid file no matter if I use ghazel-daemons gem or monkeypatching the same things. Right now I'm using chrisfinnes script to ensure shutdown and that works, and such a solution seems to be the right solution anyway, what if there's a long running email job or such. |
@sunkencity: I've actually ended up using the ghazel-daemons gem in the end. It's a little tricky, as you need to do this in your
|
OK, thanks! I thought I had uninstalled daemons and everything was fine but I had forgotten that my capistrano automatically installs any missing gem dependencies, so I guess I can switch to that now. Here's an extra task I use to make sure that the reload went well |
With the above Also, I see you took a similar approach to me when it comes to seeing if there are any orphaned DJ daemons running. Incase you might find it useful, here's the final delayed_job Capistrano tasks I've ended up using: http://gist.github.com/345494 |
I am also seeing this problem :( Weird thing is that it runs fine under OSX but quits right after the start on my production ubuntu box. "script/delayed_job run" runs fine on both... |
I hit this problem on Ubuntu 10.04 today. I'd upgraded the daemons gem to 1.1.0 yesterday and DJ stopped working, it claimed to be forking the workers but they were immediately dying and the log file had some funny binary input. Removing 1.1.0 to force DJ to use 1.0.10 seems to have solved the problem for me. |
I have this in my enviornment.rb config.gem 'delayed_job', :source => 'http://rubygems.org', :version => "2.1.0.pre"
|
I have same problem and I wrote this task to restart delayed_job without killing instantly all jobs of waiting in infinite loop if there are more dj's on server running. Works so far. Requires *NIX environment with awk and lsof |
While this ticket is a real issue (and is related to https://github.com/collectiveidea/delayed_job/issues#issue/81 and https://github.com/collectiveidea/delayed_job/issues#issue/100) you must be careful when delayed job fails while writing nothing to delayed_job.log. Other problems besides this one can cause that problem. For example, if there is a database problem (such as an error in database.yml or a migration hasn't been run yet) you could get very similar symptoms. Always make sure you check BOTH delayed_job.log and production.log (or whatever environment you are running delayed_job in). Delayed Job's catch-all exception handler outputs to the rails log, not to delayed_job.log. |
In our case, starting multiple delayed_jobs -n 5 with restart, the PIDs do not get created. The reason is because intermediate processes are created and die before getting a chance to write out the PIDs. The Daemons::Controller.run if it is 'start' calls '@group.new_application.start' and if it is a restart calls '@group.start_all' The start_all forks a new process for each application to start (even if it is only one) the "start" just waits for the delayed_job to start correctly. Processes in a restart
the start_all (and pid 1) will exit immediately after all the forks. It doesn't wait for each fork to finish. pid2 normally has enough time to write out the PID file. Though I'm guessing if your system is fast about launching everything, maybe none of them will write out PID files. Our case we would get the delayed_job started, but the PID won't be written out, and then we would start getting multiple processes because it had thought it wasn't started. Our solution was just to put a "sleep 5" at the end of the script/delayed_job This seemed to be enough time to allow the PID to get created. Daemons really seems to be broken and should be fixed (either just waiting for the forks to finish, or just don't fork during the start_all) or Delayed_Job should move to something else as a main method of daemonizing. |
Why does it take so long (20s or more) for an idle delayed_job worker to stop? I'm on delayed_job 2.1.4, daemons 1.1.0, Rails 3.0.10, and Ruby 1.9.2-p290. /cc @scottj97 |
The script Delayed Job uses to stop idle processes loads the entire Rails environment before shutting down the worker. As far as I know, this is not necessary. Theoretically all that is needed to make it faster is to write a shutdown script that doesn't load Rails. But I don't remember the specifics well enough to estimate how easy/hard that would be for Delayed Job. |
According to the release announcement of 1.1.0, the broken behaviour reported here is fixed now. The referenced bug report has also been (long) closed by the maintainer. Not sure if issue #81 stops people upgrading, however. |
The v2.0 branch still seems to use daemons 1.0.10. Daemons is now on 1.1.9. Is there a reason delayed_job v2.0 is not using this? |
@garethrees, yeah. daemons 1.1.0 breaks delayed_job incurring issue #81, but downgrading to daemons 1.0.10 seems to fix it. Daemons 1.1.9 didn't work for me either. |
the PID files are never created for the new process(es), so subsequent restarts and stops won't work.
Not sure if this is a problem the the daemons gem (i'm on the latest 1.0.10) or delayed_job.
I see this on my Mac and Ubuntu boxes.
The text was updated successfully, but these errors were encountered: