New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
deleting lock for master? #152
Comments
I'm going to bet that that is not the issue. If you are running more than one scheduler, yes, one will take the "master" role, and lock the others out. However, that lock only exists for 3 minutes (unless your override the default). If you want to inspect the key in reds, it takes the form of |
I'm not sure how it got 'stuck' but that was indeed the case. I took a look at the redis key and it was locked by a scheduler that no longer existed. As soon as I deleted the key, all the delayed jobs kicked in. |
I also ran into this. The lock had been stuck for a week. |
My lock gets stuck all the time. It's a major problem. I'll be digging into the code to see why that happens, but if anyone has any thoughts on what's happening, I would appreciate it. |
9 times out of 10 it is improper shut down behavior. How are you running your workers, how long do you give them before the SIGKILL signal and a hard shutdown (kill -9)? How long is your average job duration? |
@evantahler I'm quite certain I have invalid shutdown behavior, but I thought the timeout on the scheduler was to prevent improper shutdown from being an issue. Some of our jobs are a couple minutes long, but most are less than a minute (and 10% or so are very short). I'm running on Heroku. I'll look at my shutdown handling and see how it can be improved. |
I ended up with something like this and it seems to be working ok for now. Just thought I'd share. Not at all saying this is the best / most correct way. // Called from SIGINT and SIGTERM
async function gracefulShutdown(worker, scheduler, queue, librato) {
const stopProcessTimeout = function() {
throw new Error("process stop timeout reached. Terminating now.");
};
setTimeout(stopProcessTimeout, shutdownTimeout);
worker.on("exit", process.exit);
await Promise.all([worker.end(), scheduler.end(), queue.end()]);
} |
That's exactly what you should do if your application is controlled by signals (which it certainly is on Heroku). You should also stop your http server and any connections you have open as well. Actionhero does something similar https://github.com/actionhero/actionhero/blob/master/initializers/resque.js#L160-L166 (where those |
... would you mind contributing something to the README about this? |
Absolutely. I'll send a pull request tomorrow or Monday. |
I just noticed that delayed jobs are no longer getting scheduled. Is there a way to check and see if a scheduler lock is 'stuck'? I suspect that clearing the redis db would restart the scheduler polling, but I would like to avoid that, if possible.
The text was updated successfully, but these errors were encountered: