You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have recently started upgrading my environment to python 3.9, and then all of a sudden I started getting deadlocks. As only Linux uses fork for ProcessingPool by default, this is a Linux specific problem.
Reproducing the deadlock
After tracking down the cause of the problem, I found the following code snippet will reproduce the problem:
You should see it count from 1 to 10, but it will most likely stop in the middle somewhere, as half the workers deadlock. If it doesn't hang, keep rerunning it until it deadlocks.
As this is a timing issue, it's possible different computer will be harder to reproduce on, but every Linux machine I've tried deadlocks almost every time.
What I believe changed to cause this behavior
I've been able to track this down to python version 3.9.0a6. 3.9.0a6 and every version after has an extremely high chance of deadlocking when the number of workers is at least 4, Python 3.6-3.8 and up to 3.9.0a5 has never deadlocked.
It appears a thread in the parent process is locking a mutex, as another thread forks the worker processes. And so these new workers are being started with locks in the "locked" state and no thread to unlock them.
I have confirmed in the debugger that this is happening: the locks are starting in the "locked" state on the forked workers. And this is the part that becomes very confusing, as this seems to be the exact opposite of what the fix in 3.9.0a6 claimed to do:
Add a private _at_fork_reinit() method to :class:`_thread.Lock`, :class:`_thread.RLock`, :class:`threading.RLock` and
:class:`threading.Condition` classes: reinitialize the lock at fork in the child process, reset the lock to the
unlocked state. Rename also the private _reset_internal_locks() method of :class:`threading.Event` to _at_fork_reinit().
Is this a regression introduced by this fix?
Deep dive into the problem
It appears that the code is getting deadlocked on (at least) two locks of concern.
The first lock is from Celery.tasks in a cached_property, which uses a lock here.
Depending on the timing, a second lock it will hang here, during app.finalize
There may be additional locks I was unable to track down, but at least these two cause issue.
ProcessPoolExecutor is pickling the function in its "QueueFeederThread" as another thread is forking new workers. During pickling app.tasks is cached (which also finalized the celery app seen here)
This covers why and when the parent is accessing app.tasks. The QueueFeederThread is serializing _CallItems which include the function and thus ends up pickling the celery task.
This covers when the worker is hitting the lock. During unpickle, it accessing app.tasks, and this also finalizes and caches app.tasks
Work arounds
Pre-finalize and pre cache tasks.
Adding something like:
from celery import _state as celery_state
app = celery_state.get_current_app()
# Just trigger the cached_property
app.tasks
before the executor clause will pre-cache tasks and prevent the deadlocks from happening, at least in my tests. This is not really ideal and feels too fragile and it might fail in other ways down the road.
Don't pass a celery task to ProcessPoolExecutor
While, on the surface, this works, it does break how I use celery tasks. I use the bind=True flag on celery tasks, so they all have a self object that won't be there if I'm not using the task.
with ProcessPoolExecutor(max_workers=10) as executor:
for x in range(10):
futures[executor.submit(foo.__wrapped__, x)] = x
Additional notes
The problem is the same if I replace @shared_task with a simple:
app=Celery('tasks')
@app.task
I'm mainly using the latest version of celery and all dependencies, but the behavior is the same on both master and the last celery 4.
Background
Why am I passing a celery task to a ProcessPoolExecutor? In my library, celery is only one of the possible Executors I use. I also use ThreadPoolExecutor and ProcessPoolExecutor based on external configurations. And up until python 3.9.0a6, this has worked without issue.
Questions
Is this a regression in python 3.9?
Did I do something wrong?
What is the best way to handle this and move forward?
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
I have recently started upgrading my environment to python 3.9, and then all of a sudden I started getting deadlocks. As only Linux uses fork for ProcessingPool by default, this is a Linux specific problem.
Reproducing the deadlock
After tracking down the cause of the problem, I found the following code snippet will reproduce the problem:
The thread deadlock can be replicated by simply running the script: (docker commands for an easy reference point)
You should see it count from 1 to 10, but it will most likely stop in the middle somewhere, as half the workers deadlock. If it doesn't hang, keep rerunning it until it deadlocks.
As this is a timing issue, it's possible different computer will be harder to reproduce on, but every Linux machine I've tried deadlocks almost every time.
What I believe changed to cause this behavior
I've been able to track this down to python version 3.9.0a6. 3.9.0a6 and every version after has an extremely high chance of deadlocking when the number of workers is at least 4, Python 3.6-3.8 and up to 3.9.0a5 has never deadlocked.
I suspect it has to do with https://bugs.python.org/issue40089 where they added
_at_fork_reinit
mentioned hereIt appears a thread in the parent process is locking a mutex, as another thread forks the worker processes. And so these new workers are being started with locks in the "locked" state and no thread to unlock them.
I have confirmed in the debugger that this is happening: the locks are starting in the "locked" state on the forked workers. And this is the part that becomes very confusing, as this seems to be the exact opposite of what the fix in 3.9.0a6 claimed to do:
Is this a regression introduced by this fix?
Deep dive into the problem
It appears that the code is getting deadlocked on (at least) two locks of concern.
Celery.tasks
in a cached_property, which uses a lock here.app.finalize
There may be additional locks I was unable to track down, but at least these two cause issue.
ProcessPoolExecutor
is pickling the function in its "QueueFeederThread" as another thread is forking new workers. During picklingapp.tasks
is cached (which also finalized the celery app seen here)Parent Stack trace (QueueFeederThread)
This covers why and when the parent is accessing
app.tasks
. The QueueFeederThread is serializing_CallItems
which include the function and thus ends up pickling the celery task.Worker stack tracker
This covers when the worker is hitting the lock. During unpickle, it accessing
app.tasks
, and this also finalizes and cachesapp.tasks
Work arounds
Adding something like:
before the executor clause will pre-cache
tasks
and prevent the deadlocks from happening, at least in my tests. This is not really ideal and feels too fragile and it might fail in other ways down the road.ProcessPoolExecutor
While, on the surface, this works, it does break how I use celery tasks. I use the
bind=True
flag on celery tasks, so they all have aself
object that won't be there if I'm not using the task.Additional notes
The problem is the same if I replace
@shared_task
with a simple:I'm mainly using the latest version of celery and all dependencies, but the behavior is the same on both master and the last celery 4.
Background
Why am I passing a celery task to a
ProcessPoolExecutor
? In my library, celery is only one of the possible Executors I use. I also useThreadPoolExecutor
andProcessPoolExecutor
based on external configurations. And up until python 3.9.0a6, this has worked without issue.Questions
Beta Was this translation helpful? Give feedback.
All reactions