You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Seeing an issue here where jobs that raise exceptions and are rescheduled for a later retry remain stuck as Running according to GoodJob (as checked in the UI). As such, they never get reattempted.
GoodJob is running as a separate process. Killing/restarting the GJ process clears the stuck status and the jobs are then immediately run (assuming the retry time has been reached, of course).
Given that the Running state immediately clears upon restarting the process, I'm wondering if this is a stuck advisory lock?
I've been trying to track this for a bit as it's infrequent. Current version is 3.23, but have seen it going back to at least 3.19.4. It's happened in production and development.
Just saw it happen on a group of 13 simultaneously scheduled jobs where 2 of 13 jobs raised an exception for a later retry (the exception is expected in this case and not the issue itself). The retry is being handled using AJ's standard rescue/retry mechanism.
The 2 jobs that raised exceptions did so while other jobs were either starting or running. Both of those 2 ended up stuck as Running. Their retries were scheduled in the middle of other jobs completing (ie: after some, before others).
The job class of the stuck jobs does not use concurrency limits. 12 of the 13 jobs were for that same class.
Older instances of this problem have returned a "Jobs were interrupted" message--at least sometimes. This latest instance didn't, so it's possible there was something else going on in the older cases.
Any thoughts? How can I help debug this?
The text was updated successfully, but these errors were encountered:
I've seen a couple more instances of this. It turns out jobs only sometime report Running. Other times they report Queued. So, the reported state may be more of a symptom rather than a cause.
Perhaps a clue is that trying to Reschedule a Queued job results in an exception indicating the job is already advisory locked.
Restarting the GoodJob process continues to immediately unlock the affected jobs and let them run.
Seeing an issue here where jobs that raise exceptions and are rescheduled for a later retry remain stuck as Running according to GoodJob (as checked in the UI). As such, they never get reattempted.
GoodJob is running as a separate process. Killing/restarting the GJ process clears the stuck status and the jobs are then immediately run (assuming the retry time has been reached, of course).
Given that the Running state immediately clears upon restarting the process, I'm wondering if this is a stuck advisory lock?
I've been trying to track this for a bit as it's infrequent. Current version is 3.23, but have seen it going back to at least 3.19.4. It's happened in production and development.
Just saw it happen on a group of 13 simultaneously scheduled jobs where 2 of 13 jobs raised an exception for a later retry (the exception is expected in this case and not the issue itself). The retry is being handled using AJ's standard rescue/retry mechanism.
The 2 jobs that raised exceptions did so while other jobs were either starting or running. Both of those 2 ended up stuck as Running. Their retries were scheduled in the middle of other jobs completing (ie: after some, before others).
The job class of the stuck jobs does not use concurrency limits. 12 of the 13 jobs were for that same class.
Older instances of this problem have returned a "Jobs were interrupted" message--at least sometimes. This latest instance didn't, so it's possible there was something else going on in the older cases.
Any thoughts? How can I help debug this?
The text was updated successfully, but these errors were encountered: