Managing jobs with errors #929

oyeanuj · 2023-04-15T19:07:54Z

Hi @bensheldon, thank you for creating this library - it's an amazing library! This might be less of a bug and maybe more of a request for best practices or even documentation.

Context: I'm using good_job on Heroku on Rails-API app running Rails 6. Given that I'm using it on API-mode, I haven't yet set up the dashboard and I'm looking for non-dashboard solutions to the problems below:

Sometimes, I have jobs that fail, mostly related to email (inactive email, wrong email et al), and I notice that creates hundreds of job in my good_jobs table and if that happens for too long, then my regular jobs start to get affected even if they are in an urgent queue. I've tried deleting the duplicate or problematic jobs only to have them be created again, and I've tried deleting the original records only to have good_job try them again and give me an ActiveRecord's RecordNotFound error.

So I am wondering, what's the best practice when you have problematic records? Is there a recommended way to delete the job that works best with good_job and/or are there settings that I'm overlooking? FWIW, I've tried to look through the documentation but nothing has fixed it so far, hence the questions here :)

The text was updated successfully, but these errors were encountered:

bensheldon · 2023-04-16T02:58:27Z

Thanks for opening this issue. That's not good!

It sounds like it could be one or both of these:

You have config.good_job.retry_on_unhandled_error = true/GoodJob.retry_on_unhandled_error = true. That will cause jobs to be immediately retried when an unrescued exception is raised. I made the default false as of GoodJob v3 to help avoid the situation I think you may be experiencing.
You have a rescue_from handler in your job that does not have wait: :exponentially_longer set.

I do have a section in the readme that tries to explain this stuff, but I'll admit I haven't done a pass on it recently, so there is probably room for improvement:

https://github.com/bensheldon/good_job/#retries

(be sure to read down because you have to duplicate the rescue_from behavior for Action Mailer deliver_later jobs too)

In terms of recommendations, if you're able, I think running the Good Job Dashboard would give you the most administrative insights into what is happening and actions to take to discard these jobs: https://github.com/bensheldon/good_job/#api-only-rails-applications

Alternatively, please take a look at the methods defined on Jobs themselves, as it's likely that you're simply destroying the record while it's executing and you should instead use one of these methods that safely take a lock on the record:

good_job/app/models/good_job/job.rb

Lines 201 to 204 in ec385cf

    
           # Discard a job so that it will not be executed further. 
        
           # This action will add a {DiscardJobError} to the job's {Execution} and mark it as finished. 
        
           # @return [void] 
        
           def discard_job(message)

bensheldon closed this as completed Aug 13, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Managing jobs with errors #929

Managing jobs with errors #929

oyeanuj commented Apr 15, 2023

bensheldon commented Apr 16, 2023

Managing jobs with errors #929

Managing jobs with errors #929

Comments

oyeanuj commented Apr 15, 2023

bensheldon commented Apr 16, 2023