I'm not sure how much detail to put here, or if this is the right place, but I am attempting to run dask_searchcv on a Kubernetes cluster and whenever I run the grid search all of my workers get killed immediately.
From the logs on the cluster I have:
distributed.scheduler - INFO - Register tcp://10.16.1.9:41940
distributed.scheduler - INFO - Starting worker compute stream, tcp://10.16.1.9:41940
distributed.scheduler - INFO - Register tcp://10.16.0.9:46410
distributed.scheduler - INFO - Starting worker compute stream, tcp://10.16.0.9:46410
distributed.scheduler - INFO - Register tcp://10.16.2.9:38646
distributed.scheduler - INFO - Starting worker compute stream, tcp://10.16.2.9:38646
distributed.scheduler - INFO - Remove worker tcp://10.16.2.9:38646
distributed.scheduler - INFO - Remove worker tcp://10.16.1.9:41940
distributed.scheduler - INFO - Remove worker tcp://10.16.0.9:46410
distributed.scheduler - INFO - Lost all workers
I have a cluster of 3 CPUs and 12G RAM, and so 3 workers are spawned by default.
One thing I noticed, not sure if it's related, but when I run the grid search and look at the workers, there's always one worker that's at 90%+ utilization while the others are only around 5%.
I'm not sure how much detail to put here, or if this is the right place, but I am attempting to run
dask_searchcvon a Kubernetes cluster and whenever I run the grid search all of my workers get killed immediately.From the logs on the cluster I have:
I have a cluster of 3 CPUs and 12G RAM, and so 3 workers are spawned by default.
One thing I noticed, not sure if it's related, but when I run the grid search and look at the workers, there's always one worker that's at 90%+ utilization while the others are only around 5%.