Jobs that raise exceptions sometimes stay stuck as Running #1249

zarqman · 2024-02-15T20:19:05Z

Seeing an issue here where jobs that raise exceptions and are rescheduled for a later retry remain stuck as Running according to GoodJob (as checked in the UI). As such, they never get reattempted.

GoodJob is running as a separate process. Killing/restarting the GJ process clears the stuck status and the jobs are then immediately run (assuming the retry time has been reached, of course).

Given that the Running state immediately clears upon restarting the process, I'm wondering if this is a stuck advisory lock?

I've been trying to track this for a bit as it's infrequent. Current version is 3.23, but have seen it going back to at least 3.19.4. It's happened in production and development.

Just saw it happen on a group of 13 simultaneously scheduled jobs where 2 of 13 jobs raised an exception for a later retry (the exception is expected in this case and not the issue itself). The retry is being handled using AJ's standard rescue/retry mechanism.

The 2 jobs that raised exceptions did so while other jobs were either starting or running. Both of those 2 ended up stuck as Running. Their retries were scheduled in the middle of other jobs completing (ie: after some, before others).

The job class of the stuck jobs does not use concurrency limits. 12 of the 13 jobs were for that same class.

Older instances of this problem have returned a "Jobs were interrupted" message--at least sometimes. This latest instance didn't, so it's possible there was something else going on in the older cases.

Any thoughts? How can I help debug this?

zarqman · 2024-02-26T05:43:59Z

I've seen a couple more instances of this. It turns out jobs only sometime report Running. Other times they report Queued. So, the reported state may be more of a symptom rather than a cause.

Perhaps a clue is that trying to Reschedule a Queued job results in an exception indicating the job is already advisory locked.

Restarting the GoodJob process continues to immediately unlock the affected jobs and let them run.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Jobs that raise exceptions sometimes stay stuck as Running #1249

Jobs that raise exceptions sometimes stay stuck as Running #1249

zarqman commented Feb 15, 2024

zarqman commented Feb 26, 2024

Jobs that raise exceptions sometimes stay stuck as Running #1249

Jobs that raise exceptions sometimes stay stuck as Running #1249

Comments

zarqman commented Feb 15, 2024

zarqman commented Feb 26, 2024