Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SQS ActiveJobs are retried if uncaught exceptions are raised or retry attempts are exceeded #114

Closed
bananastalktome opened this issue Feb 28, 2024 · 1 comment · Fixed by #115
Assignees
Labels
bug This issue is a bug.

Comments

@bananastalktome
Copy link

bananastalktome commented Feb 28, 2024

https://guides.rubyonrails.org/active_job_basics.html#retrying-or-discarding-failed-jobs notes that failed jobs are not retried unless the jobs are configured otherwise, however SQS backed ActiveJobs that raise exceptions which are not explicitly discarded or retried continue to be run. This occurs with either the amazon_sqs or amazon_sqs_async adapters. For example, a simple job such as:

class SampleJob < ActiveJob::Base
  queue_as :default

  def perform
    raise Exception, "testing"
  end
end

never gets deleted from the queue, and will be fetched and run again after the messages visibility_timeout expires. I am not sure if this is desired behavior or not, but it caught me off guard as it is different than the above note in the ActiveJob rails docs. A similar test with the Resque gem (being the only other queueing service I have familiarity with and an active project using) removed the job after a single failed run.

If this is intended behavior of the Rails SQS ActiveJob backend, a note in the Readme that the behavior differs from the rails noted behavior would be helpful.

Versions:
Rails 7.1.3
Ruby 3.2.2
aws-sdk-rails 3.10.0
SQS standard queue (not FIFO)
OSX Ventura 13.6.4

edit:
It looks like even if a retry_on Exception, wait: 40.seconds, attempts: 2 is added to the job, it's not removed after 2 attempts as would be expected. Are retries with limited attempts not supported with the SQS backend?

@bananastalktome bananastalktome changed the title SQS ActiveJobs are retried if uncaught exceptions are raised SQS ActiveJobs are retried if uncaught exceptions are raised or retry attempts are exceeded Feb 28, 2024
@alextwoods alextwoods self-assigned this Feb 28, 2024
@alextwoods
Copy link
Contributor

Yes - you are correct that Exceptions currently are being left on the queue (and therefore retried, based on the SQS queues policies). The original intention of leaving messages on the queue here was to ensure any generic/transient errors in the executor/message/job runner were retired. However, you are right that this ends up creating a conflicting and confusing experience - especially when you do configure a retry_on - which will result in the job being retried more than the intended number of times. In the example you gave with attempts: 2, active job will retry the job twice by deleting it from the queue and queueing another, new SQS message. Then once it reaches its max attempts, the exception bubbles up and is caught by the exception handler and left on the queue to be re-tried again and will continue until it reaches the queue's configured max reads (and then should be moved to a DLQ).

I do believe we should fix this behavior to align with the rails ActiveJob documentation and document the behavior and its interaction with SQS retry mechanisms in the readme.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug This issue is a bug.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants