You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This isn't strictly a K8sClusterManagers.jl issue, but @omus pointed me here :).
I was running hyperparameter optimization on a model using @phyperopt from Hyperopt.jl with pmap=Parallelism.robust_pmap from Parallelism.jl. I would spin up the desired number of workers with addprocs, then essentially call pmap via these abstractions, and then that's it. When the pmap is done, the manager writes out a summary and exits, and all the processors are released.
I wanted to train 20 models this way quickly, so I did this with 20 workers and left them to train. However, some finished much faster than others, and those processors were left idling. Since this is via k8s, if we killed them, we could have in-scaled and saved lots of resources.
It would be great to have something like pmap that could automatically remove processors when they were no longer needed.
The text was updated successfully, but these errors were encountered:
Thinking about this slightly more, I think a nice "inversion of control" here is that the ideal pmap could return workers to the pool (in fact, I think it already does), and the pool could decide to remove idle workers. (Perhaps the pool would wait a minute or two and then if they are still idle, rm them).
This isn't strictly a K8sClusterManagers.jl issue, but @omus pointed me here :).
I was running hyperparameter optimization on a model using
@phyperopt
from Hyperopt.jl withpmap=Parallelism.robust_pmap
from Parallelism.jl. I would spin up the desired number of workers withaddprocs
, then essentially callpmap
via these abstractions, and then that's it. When thepmap
is done, the manager writes out a summary and exits, and all the processors are released.I wanted to train 20 models this way quickly, so I did this with 20 workers and left them to train. However, some finished much faster than others, and those processors were left idling. Since this is via k8s, if we killed them, we could have in-scaled and saved lots of resources.
It would be great to have something like
pmap
that could automatically remove processors when they were no longer needed.The text was updated successfully, but these errors were encountered: