SQS ActiveJobs are retried if uncaught exceptions are raised or retry attempts are exceeded #114

bananastalktome · 2024-02-28T18:29:23Z

https://guides.rubyonrails.org/active_job_basics.html#retrying-or-discarding-failed-jobs notes that failed jobs are not retried unless the jobs are configured otherwise, however SQS backed ActiveJobs that raise exceptions which are not explicitly discarded or retried continue to be run. This occurs with either the amazon_sqs or amazon_sqs_async adapters. For example, a simple job such as:

class SampleJob < ActiveJob::Base
  queue_as :default

  def perform
    raise Exception, "testing"
  end
end

never gets deleted from the queue, and will be fetched and run again after the messages visibility_timeout expires. I am not sure if this is desired behavior or not, but it caught me off guard as it is different than the above note in the ActiveJob rails docs. A similar test with the Resque gem (being the only other queueing service I have familiarity with and an active project using) removed the job after a single failed run.

If this is intended behavior of the Rails SQS ActiveJob backend, a note in the Readme that the behavior differs from the rails noted behavior would be helpful.

Versions:
Rails 7.1.3
Ruby 3.2.2
aws-sdk-rails 3.10.0
SQS standard queue (not FIFO)
OSX Ventura 13.6.4

edit:
It looks like even if a retry_on Exception, wait: 40.seconds, attempts: 2 is added to the job, it's not removed after 2 attempts as would be expected. Are retries with limited attempts not supported with the SQS backend?

The text was updated successfully, but these errors were encountered:

alextwoods · 2024-02-28T22:30:34Z

Yes - you are correct that Exceptions currently are being left on the queue (and therefore retried, based on the SQS queues policies). The original intention of leaving messages on the queue here was to ensure any generic/transient errors in the executor/message/job runner were retired. However, you are right that this ends up creating a conflicting and confusing experience - especially when you do configure a retry_on - which will result in the job being retried more than the intended number of times. In the example you gave with attempts: 2, active job will retry the job twice by deleting it from the queue and queueing another, new SQS message. Then once it reaches its max attempts, the exception bubbles up and is caught by the exception handler and left on the queue to be re-tried again and will continue until it reaches the queue's configured max reads (and then should be moved to a DLQ).

I do believe we should fix this behavior to align with the rails ActiveJob documentation and document the behavior and its interaction with SQS retry mechanisms in the readme.

bananastalktome changed the title ~~SQS ActiveJobs are retried if uncaught exceptions are raised~~ SQS ActiveJobs are retried if uncaught exceptions are raised or retry attempts are exceeded Feb 28, 2024

alextwoods self-assigned this Feb 28, 2024

alextwoods added the bug This issue is a bug. label Feb 28, 2024

alextwoods mentioned this issue Feb 28, 2024

Add retry_standard_errors config for SQS ActiveJob #115

Merged

alextwoods closed this as completed in #115 Mar 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SQS ActiveJobs are retried if uncaught exceptions are raised or retry attempts are exceeded #114

SQS ActiveJobs are retried if uncaught exceptions are raised or retry attempts are exceeded #114

bananastalktome commented Feb 28, 2024 •

edited

Loading

alextwoods commented Feb 28, 2024

SQS ActiveJobs are retried if uncaught exceptions are raised or retry attempts are exceeded #114

SQS ActiveJobs are retried if uncaught exceptions are raised or retry attempts are exceeded #114

Comments

bananastalktome commented Feb 28, 2024 • edited Loading

alextwoods commented Feb 28, 2024

bananastalktome commented Feb 28, 2024 •

edited

Loading