I'm running chancy as part of a FastAPI application. Due to transient network issues or DB issues, I've seen errors occur in _maintain_queue or _maintain_queues a couple of times. Root cause is typically being a bad connection in the pool or a connection being killed for exceeding the idle_in_transaction timeout (some psycopg exception being thrown essentially). Whatever the cause, my concern is what happens next. An exception like this is logged:
Task exception was never retrieved
future: <Task finished name='queue_name' coro=<Worker._maintain_queue() done, defined at /usr/local/lib/python3.11/site-packages/chancy/worker.py:320> exception=IdleInTransactionSessionTimeout('terminating connection due to idle-in-transaction timeout')>
And the worker stops picking up jobs from that queue entirely. I have to manually restart the process in order to begin processing jobs again. I'd imagine the same thing happening in maintain_queues would lead to a similar issue, where any queue updates would not be picked up by the worker.
I really have two questions:
- Does the failure case I've outlined here track with what you would expect? I'm 100% certain I've seen jobs stuck in the queue for 20+ minutes after an error like the one above, but I'm not so familiar with the Chancy codebase, so would love to be corrected if I'm wrong about the root cause.
- Is this behavior something you would consider changing / adding additional exception handling to? Or am I better off either forking or subclassing the worker (which I'll likely do in the meantime).
I'm running chancy as part of a FastAPI application. Due to transient network issues or DB issues, I've seen errors occur in
_maintain_queueor_maintain_queuesa couple of times. Root cause is typically being a bad connection in the pool or a connection being killed for exceeding the idle_in_transaction timeout (some psycopg exception being thrown essentially). Whatever the cause, my concern is what happens next. An exception like this is logged:And the worker stops picking up jobs from that queue entirely. I have to manually restart the process in order to begin processing jobs again. I'd imagine the same thing happening in
maintain_queueswould lead to a similar issue, where any queue updates would not be picked up by the worker.I really have two questions: