Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Delayed job stops, the jobs are stuck in postgres #195

Open
Chandananimmu opened this issue Aug 30, 2021 · 6 comments
Open

Delayed job stops, the jobs are stuck in postgres #195

Chandananimmu opened this issue Aug 30, 2021 · 6 comments

Comments

@Chandananimmu
Copy link

@albus522 I'm using delayed job 4.1.9 , delayed_job_active_record 4.1.6 , when I try to send 10,000 emails via delayed job, it get stucks and the delayed job stops working, job will be in postgres, could you please help me to resolve this.

@kaylareopelle
Copy link

I think I may be having a similar issue! I'm interested to learn about any solutions folks have found.

@davidkrider
Copy link

I get these in my log:

I, [2022-02-10T21:29:22.975757 #3950] INFO -- : 2022-02-10T21:29:22-0500: [Worker(delayed_job host:miner pid:3950)] Error while reserving job: PG::UnableToSend: SSL SYSCALL error: EOF detected I, [2022-02-10T21:29:27.977649 #3950] INFO -- : 2022-02-10T21:29:27-0500: [Worker(delayed_job host:miner pid:3950)] Error while reserving job: PG::UnableToSend: no connection to the server ... x9 F, [2022-02-10T21:30:07.993164 #3950] FATAL -- : PG::UnableToSend: no connection to the server

I get 9 copies of that second message, then it gives up, and the delayed_job daemon falls over.

@brijeshs-atharvasystem
Copy link

I am also facing the same issue for specific a queue only.

  • delayed_job (4.1.11)
  • delayed_job_active_record (4.1.7)
  • Postgres 15

In my case, it is not happening with every job. I have a few queues and this issue is happening in only one queue. Even in a single queue, jobs are performed and deleted most of the time but sometimes, jobs are not deleted. Due to this, pending jobs are stuck and not processed and I received this error.

Error: execution expired (Delayed::Worker.max_run_time is only 14400 seconds) (Delayed::WorkerTimeout)

Note: I have been facing this issue since I upgraded the Postgres version from 11 to 15.

Please let me know if anyone found the solution for this issue.
Thanks

@davidkrider
Copy link

My app and database run in Azure. In my case, I finally figured out that Microsoft was updating my Postgres instance and restarting it without any notification. (This seems crazy to me.) This was killing my long-running jobs, and leaving me with this error message. I thought the messages were confusing, because they were telling me that the database had dropped, but it seemed perfectly fine. However, they were actually correct. I finally found a setting on the Postgres service to limit updates and restarts to only critical fixes, and this has stopped happening for me.

@brijeshs-atharvasystem
Copy link

@davidkrider Thanks for the reply. In my case, I upgraded the Postgres version to 15. I am not sure that is the cause of this issue. If there is any issue with the Postgres version then other queues are also affected which is not the case here.

I observed that the job is performed but the job is not unlocked (locked_at is not updated) so it is not deleted.

@brijeshs-atharvasystem
Copy link

It seems like there was some issue with Ruby 2.7.3. I downgraded the version to 2.6.5 and all queues are working fine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants