We are submitting multiple futures / tasks from a given worker using worker_client context manager. The function makes use of client.submit to submit list of futures.
We break down the execution into batches and after each batch run is complete, we run client.cancel to do a cleanup and then submit the next batch. Time and again, entire batch of futures (lot of 95) immediately return status of cancelled.
We checked the worker logs and not much there and also the scheduler logs. Also tried to enable scheduler logs timestamp by adding the following lines to yaml config but did not help.
distributed:
admin:
log-format: '%(name)s - %(levelname)s - %(message)s'
We are using DASK - 0.17.2 and Distributed - 1.21.6 running on Windows 2018R2 servers. Also the we have 128 worker processes scattered over 16 nodes and each batch size for execution is around 95 futures at a time.
Can you please help with what we can look into which could be causing this.
We are submitting multiple futures / tasks from a given worker using worker_client context manager. The function makes use of client.submit to submit list of futures.
We break down the execution into batches and after each batch run is complete, we run client.cancel to do a cleanup and then submit the next batch. Time and again, entire batch of futures (lot of 95) immediately return status of cancelled.
We checked the worker logs and not much there and also the scheduler logs. Also tried to enable scheduler logs timestamp by adding the following lines to yaml config but did not help.
We are using DASK - 0.17.2 and Distributed - 1.21.6 running on Windows 2018R2 servers. Also the we have 128 worker processes scattered over 16 nodes and each batch size for execution is around 95 futures at a time.
Can you please help with what we can look into which could be causing this.