Skip to content


Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP
Browse files

Make raising signal exceptions a configurable option

Idea credit goes to @nevinera
commit 90579c3047099b6a58595d4025ab0f4b7f0aa67a 1 parent a5ac868
@albus522 albus522 authored
Showing with 20 additions and 2 deletions.
  1. +20 −2 lib/delayed/worker.rb
22 lib/delayed/worker.rb
@@ -47,6 +47,15 @@ def self.reset
# (perhaps to inspect the reason for the failure), set this to false.
self.destroy_failed_jobs = true
+ # By default, Signals INT and TERM set @exit, and the worker exits upon completion of the current job.
+ # If you would prefer to raise a SignalException and exit immediately you can use this.
+ # Be aware daemons uses TERM to stop and restart
+ # false - No exceptions will be raised
+ # :term - Will only raise an exception on TERM signals but INT will wait for the current job to finish
+ # true - Will raise an exception on TERM and INT
+ cattr_accessor :raise_signal_exceptions
+ self.raise_signal_exceptions = false
self.logger = if defined?(Rails)
elsif defined?(RAILS_DEFAULT_LOGGER)
@@ -122,8 +131,17 @@ def name=(val)
def start
- trap('TERM') { say 'Exiting...'; stop }
- trap('INT') { say 'Exiting...'; stop }
+ trap('TERM') do
+ say 'Exiting...'
+ stop
+ raise'TERM') if self.class.raise_signal_exceptions
+ end
+ trap('INT') do
+ say 'Exiting...'
+ stop
+ raise'INT') if self.class.raise_signal_exceptions && self.class.raise_signal_exceptions != :term
+ end
say "Starting job worker"

9 comments on commit 90579c3


Hey folks. What's standard op for DJ on Heroku? Does Delayed::Worker.raise_signal_exceptions = :term correspond to SIGKILL when a dyno restart is fed up waiting?


Heroku 'restarts' processes for a variety of reasons (too much memory in use is the one i was mostly running into). When they issue a restart, they do it by sending a sigterm followed a few seconds later by a sigkill.

You can't do anything about sigkill, but if you rescue from sigterm by cleaning up and terminating quickly, the sigkill doesn't need to be issued.


Thanks nevinera,

So you use Delayed::Worker.raise_signal_exceptions = :term?

If I understand DJ correctly, the job will get flagged as failed and is unlocked immediately, so when the process spins back up the workers can pick it up again without having to wait until max_run_time has passed.


My implementation of this solution was subtly different, but yes that should work.

It doesn't have to get 'flagged as failed' - that's a question of whether you re-raise the exception (or some exception).
It will usually make more sense to do so though, unless your jobs are being started continuously, and future jobs will pick up all the work being missed by this one.

The point of it is to allow you to unlock any resources the job was using, instead of just terminating. If you don't have any resources being locked, it's unnecessary.


I tested it with foreman and it works beautifully. Terminating the worker immediately quits the job, flags the DJ record with an error, then gets picked up again when the worker restarts, all thanks to this single config line. Cheers guys.


How does this deal with issues where, for example, a mailer is communicating to an SMTP server, has completely sent the request and the server has received it, then a SignalException is raised before Ruby can close the connection and wrap up the response?

Delayed Job would then run the same job again. If there wasn't an emphasis on making jobs 100% atomic, with this feature, there surely will be. Is that assumption correct?


No, the whole point of getting exceptions is that you can handle them. You should be rescuing from SignalException, performing your cleanup, and then re-raising it.

Generally speaking, you will receive a SIGTERM, and then later a SIGKILL after you've ignored the SIGTERM for a while (times vary - it's a few seconds on Heroku). With the old model, you'd ignore the SIGTERM until you were done with the job, and if the SIGKILL came before you performed cleanup, you'd die in a dirty state. With this behavior though, you can receive the SIGTERM, initiate cleanup, and terminate the job all before the SIGKILL comes. That's the original motivation.


Can you offer up an example of what you are doing on cleanup? Just re-enqueuing the job?


Unlocking a resource in the database, deleting a file on disk, deleting rows from a database.. basically anything.
If your job has state that is external to its memory, then you usually have the choice of (a) making the job able to resume, (b) making the job detect and clean up prior attempts before running, or (c) making the job clean up after itself in the event of failure.

Please sign in to comment.
Something went wrong with that request. Please try again.