Skip to content

Are exceptions in _maintain_queue unrecoverable? #88

Description

@ZadenRB

I'm running chancy as part of a FastAPI application. Due to transient network issues or DB issues, I've seen errors occur in _maintain_queue or _maintain_queues a couple of times. Root cause is typically being a bad connection in the pool or a connection being killed for exceeding the idle_in_transaction timeout (some psycopg exception being thrown essentially). Whatever the cause, my concern is what happens next. An exception like this is logged:

Task exception was never retrieved
future: <Task finished name='queue_name' coro=<Worker._maintain_queue() done, defined at /usr/local/lib/python3.11/site-packages/chancy/worker.py:320> exception=IdleInTransactionSessionTimeout('terminating connection due to idle-in-transaction timeout')>

And the worker stops picking up jobs from that queue entirely. I have to manually restart the process in order to begin processing jobs again. I'd imagine the same thing happening in maintain_queues would lead to a similar issue, where any queue updates would not be picked up by the worker.

I really have two questions:

  1. Does the failure case I've outlined here track with what you would expect? I'm 100% certain I've seen jobs stuck in the queue for 20+ minutes after an error like the one above, but I'm not so familiar with the Chancy codebase, so would love to be corrected if I'm wrong about the root cause.
  2. Is this behavior something you would consider changing / adding additional exception handling to? Or am I better off either forking or subclassing the worker (which I'll likely do in the meantime).

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions