I have a graph that contains 300 tasks which take ~hour. If I start up a cluster with >300 nodes before submitting a graph, the work gets distributed nicely and the tasks run in parallel. If I cut it too close to 300 (especially if the nodes are preemptible), I will run into a few nodes getting multiple tasks. This isn't too big a deal - I give it a few extra nodes and sometimes it runs a 2 or 3 hours.
On the other hand, if I use a fresh (initial size 0) autoscaling dask cluster and set it to scale from 0-350, a lot of my long tasks pile up on individual workers. This is because K8s can take time to add nodes, and the graph starts processing as soon as 1 or 2 nodes are up. The scheduler does not know yet about the coming workers, and so a few workers get assigned many tasks. The autoscaling in this cases stops around ~30 nodes.
Is my best solution to avoid autoscaling with fresh clusters and graphs like this?
I'm using dask/distributed 2.19, and dask-gateway 0.7.1.
I have a graph that contains 300 tasks which take ~hour. If I start up a cluster with >300 nodes before submitting a graph, the work gets distributed nicely and the tasks run in parallel. If I cut it too close to 300 (especially if the nodes are preemptible), I will run into a few nodes getting multiple tasks. This isn't too big a deal - I give it a few extra nodes and sometimes it runs a 2 or 3 hours.
On the other hand, if I use a fresh (initial size 0) autoscaling dask cluster and set it to scale from 0-350, a lot of my long tasks pile up on individual workers. This is because K8s can take time to add nodes, and the graph starts processing as soon as 1 or 2 nodes are up. The scheduler does not know yet about the coming workers, and so a few workers get assigned many tasks. The autoscaling in this cases stops around ~30 nodes.
Is my best solution to avoid autoscaling with fresh clusters and graphs like this?
I'm using dask/distributed 2.19, and dask-gateway 0.7.1.