In my use of dask, where I have some very long running tasks, I often encounter the following situation:
- submit long running tasks (as part of a compute graph or otherwise)
- an error occurs or I change my mind and cancel the future or release my Client
- the Scheduler transitions the tasks from processing->released->forgotten
- any task that was already executing are still running on the worker, but since the scheduler sees that worker as available, the scheduler assigns new work to that worker, even if there are other idle workers
- I do not see the tasks that are actually still executing in the dashboard or workers.html page, they are no where to be seen. Even the Call Stacks from the worker dashboard does not show the old cancelled task that is actually still running
- the cluster appears to have work, and some idle workers, yet appears hung since it is not proceeding
Since I run dask workers with num-threads==1, I can simply restart the workers that are currently processing tasks when the client chooses to cancel or disconnects from the scheduler, but when I tried to use a WorkerPlugin, I did not see any updates when the task transitioned from processing to released. Is this intentional?
In my use of dask, where I have some very long running tasks, I often encounter the following situation:
Since I run dask workers with num-threads==1, I can simply restart the workers that are currently processing tasks when the client chooses to cancel or disconnects from the scheduler, but when I tried to use a WorkerPlugin, I did not see any updates when the task transitioned from processing to released. Is this intentional?