-
-
Notifications
You must be signed in to change notification settings - Fork 747
Closed
Description
Apologies - I'm not sure exactly how to provide a code example that triggers this condition, but here's what I observed:
Sometimes at the end of long runs with 1000s of tasks, I've found that there are straggler tasks that seem to be stuck on workers. These workers seem to be at the memory.pause 0.8 mark and the amount of stuck tasks is equal to the threads available to the dask worker. The workers are heart beating just fine, but don't seem to be actually doing anything with the tasks they're processing (callstacks for each task are blank). Other workers aren't stealing these tasks. When I go kill the workers, the scheduler will go reassign those tasks and everything will complete as normal.
Metadata
Metadata
Assignees
Labels
No labels