Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Difference in error behavior when a job is undefined #23

Closed
daichi5 opened this issue Jan 20, 2023 · 4 comments · Fixed by #42
Closed

Difference in error behavior when a job is undefined #23

daichi5 opened this issue Jan 20, 2023 · 4 comments · Fixed by #42

Comments

@daichi5
Copy link

daichi5 commented Jan 20, 2023

Hi! We have migrated DelayedJob to Delayed and are running it.
While observing the worker, we came across an error and found a difference in behavior.

Here is the stack trace in the last_error column.

Job failed to load: undefined class/module Sample. Handler: "--- !ruby/object:Delayed::JobWrapper
job_data:
job_class: Sample
job_id: 0f9f613a-682b-439c-8169-d45a4984e8eb
provider_job_id:
queue_name: v2
priority:
arguments: &2 []
executions: 0
exception_executions: &1 {}
locale: en
timezone: UTC
enqueued_at: '2023-01-20T11:51:44Z'
job: !ruby/object:Sample
arguments: []
job_id: 0f9f613a-682b-439c-8169-d45a4984e8eb
queue_name: v2
priority:
executions: 0
exception_executions: *1
timezone: UTC
provider_job_id:
serialized_arguments: *2
locale: en
enqueued_at: '2023-01-20T11:51:44Z'
"
/usr/local/bundle/gems/delayed-0.4.0/lib/delayed/backend/base.rb:84:in rescue in payload_object' /usr/local/bundle/gems/delayed-0.4.0/lib/delayed/backend/base.rb:81:in payload_object'
/usr/local/bundle/gems/delayed-0.4.0/lib/delayed/backend/base.rb:133:in max_run_time' /usr/local/bundle/gems/delayed-0.4.0/lib/delayed/worker.rb:194:in max_run_time'
/usr/local/bundle/gems/delayed-0.4.0/lib/delayed/worker.rb:137:in block in run' /usr/local/lib/ruby/3.0.0/benchmark.rb:308:in realtime'
/usr/local/bundle/gems/delayed-0.4.0/lib/delayed/worker.rb:136:in run' /usr/local/bundle/gems/delayed-0.4.0/lib/delayed/worker.rb:210:in block in run_job'
/usr/local/bundle/gems/delayed-0.4.0/lib/delayed/lifecycle.rb:61:in block in initialize' /usr/local/bundle/gems/delayed-0.4.0/lib/delayed/lifecycle.rb:66:in execute'
/usr/local/bundle/gems/delayed-0.4.0/lib/delayed/lifecycle.rb:40:in run_callbacks' /usr/local/bundle/gems/delayed-0.4.0/lib/delayed/worker.rb:210:in run_job'
/usr/local/bundle/gems/delayed-0.4.0/lib/delayed/worker.rb:102:in block (4 levels) in work_off' /usr/local/bundle/gems/delayed-0.4.0/lib/delayed/plugins/connection.rb:7:in block (3 levels) in class:Connection'
/usr/local/bundle/gems/activerecord-6.1.6.1/lib/active_record/connection_adapters/abstract/connection_pool.rb:462:in with_connection' /usr/local/bundle/gems/delayed-0.4.0/lib/delayed/plugins/connection.rb:6:in block (2 levels) in class:Connection'
/usr/local/bundle/gems/delayed-0.4.0/lib/delayed/lifecycle.rb:79:in block (2 levels) in add' /usr/local/bundle/gems/delayed-0.4.0/lib/delayed/lifecycle.rb:61:in block in initialize'
/usr/local/bundle/gems/delayed-0.4.0/lib/delayed/lifecycle.rb:79:in block in add' /usr/local/bundle/gems/delayed-0.4.0/lib/delayed/lifecycle.rb:66:in execute'
/usr/local/bundle/gems/delayed-0.4.0/lib/delayed/lifecycle.rb:40:in run_callbacks' /usr/local/bundle/gems/delayed-0.4.0/lib/delayed/worker.rb:121:in run_thread_callbacks'
/usr/local/bundle/gems/delayed-0.4.0/lib/delayed/worker.rb:101:in block (3 levels) in work_off' /usr/local/bundle/gems/concurrent-ruby-1.1.10/lib/concurrent-ruby/concurrent/executor/ruby_thread_pool_executor.rb:352:in run_task'
/usr/local/bundle/gems/concurrent-ruby-1.1.10/lib/concurrent-ruby/concurrent/executor/ruby_thread_pool_executor.rb:343:in block (3 levels) in create_worker' /usr/local/bundle/gems/concurrent-ruby-1.1.10/lib/concurrent-ruby/concurrent/executor/ruby_thread_pool_executor.rb:334:in loop'
/usr/local/bundle/gems/concurrent-ruby-1.1.10/lib/concurrent-ruby/concurrent/executor/ruby_thread_pool_executor.rb:334:in block (2 levels) in create_worker' /usr/local/bundle/gems/concurrent-ruby-1.1.10/lib/concurrent-ruby/concurrent/executor/ruby_thread_pool_executor.rb:333:in catch'
/usr/local/bundle/gems/concurrent-ruby-1.1.10/lib/concurrent-ruby/concurrent/executor/ruby_thread_pool_executor.rb:333:in `block in create_worker'

This error occurs when a job is enqueued even though the class definition of the job does not exist in the worker.
(For example, this can occur when adding a new job definition and trying to synchronize the webapp, worker Deployment with ArgoCD.)

I noticed that some kind of error occurs when executing a job, whether it is a DelayedJob or Delayed, but the difference is that Delayed does not retry and the failed_at column is updated.

The situation of enqueuing a non-existent job is not good, but we expect the worker to retry and complete the job execution once the code is up-to-date.

I am assuming that this difference in behavior exists because of the difference in serialization and deserialization of jobs.

In the case of Delayed, the job key is added to the serialization string of the job, and an error seems to occur when deserializing it.
DeserializationError is handled so that it is not retried.
(In DelayedJob, there is no job key and no error occurs during deserialization.)

--- !ruby/object:Delayed::JobWrapper
job_data:
 job_class: Sample
 job_id: c5d50b73-c8c9-4df0-8ea3-d2f6ca45b85d
 provider_job_id: 
 queue_name: v2
 priority: 
 arguments: &2 []
 executions: 0
 exception_executions: &1 {}
 locale: en
 timezone: UTC
 enqueued_at: '2023-01-20T04:17:09Z'
job: !ruby/object:Sample
 arguments: []
 job_id: c5d50b73-c8c9-4df0-8ea3-d2f6ca45b85d
 queue_name: v2
 priority: 
 executions: 0
 exception_executions: *1
 timezone: UTC
 provider_job_id: 
 serialized_arguments: *2
 locale: en
 enqueued_at: '2023-01-20T04:17:09Z'

I assume that the job can be executed even if this job key does not exist.

Is there any intention behind this difference in specifications?

@daichi5
Copy link
Author

daichi5 commented Jan 20, 2023

I thought one solution would be to not include the job key in the handler column. What do you think?👀

@smudge
Copy link
Member

smudge commented Jan 20, 2023

Hi @daichi5! Thanks for the report, and good catch! You're spot on -- that job: section is redundant and unnecessary, and was not the result of anything intentional. The @job ivar is simply there to support memoization of the instance of the payload class, which we delegate method calls to so that API methods like max_run_time and max_attempts are honored by ActiveJob-executed jobs.

@smudge
Copy link
Member

smudge commented Jan 20, 2023

Circling back! This behavior should be changed in the 0.5.0 release now.

@smudge smudge closed this as completed Jan 20, 2023
@okadato623
Copy link

okadato623 commented Jan 23, 2023

@smudge
Hi, I have checked the recent 0.5.0 update.

Although undefined jobs are now retried, the rescheduling does not seem to be working properly for undefined jobs as a NameError is generated when job.max_attempts is referenced.

As a result, attempts value is not incremented and the job is retried indefinitely. Could you please check this?

@smudge smudge reopened this Jan 23, 2023
smudge added a commit to smudge/delayed that referenced this issue Jun 26, 2024
This ensures that exceptions raised in thread callback hooks are rescued
and properly mark jobs as failed.

Fixes Betterment#23 and Betterment#41
@smudge smudge closed this as completed in fd97a38 Jun 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants