Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pattern to rmprocs when there's no more work to do #87

Open
ericphanson opened this issue Sep 13, 2021 · 1 comment
Open

pattern to rmprocs when there's no more work to do #87

ericphanson opened this issue Sep 13, 2021 · 1 comment

Comments

@ericphanson
Copy link
Member

ericphanson commented Sep 13, 2021

This isn't strictly a K8sClusterManagers.jl issue, but @omus pointed me here :).

I was running hyperparameter optimization on a model using @phyperopt from Hyperopt.jl with pmap=Parallelism.robust_pmap from Parallelism.jl. I would spin up the desired number of workers with addprocs, then essentially call pmap via these abstractions, and then that's it. When the pmap is done, the manager writes out a summary and exits, and all the processors are released.

I wanted to train 20 models this way quickly, so I did this with 20 workers and left them to train. However, some finished much faster than others, and those processors were left idling. Since this is via k8s, if we killed them, we could have in-scaled and saved lots of resources.

It would be great to have something like pmap that could automatically remove processors when they were no longer needed.

@ericphanson
Copy link
Member Author

Thinking about this slightly more, I think a nice "inversion of control" here is that the ideal pmap could return workers to the pool (in fact, I think it already does), and the pool could decide to remove idle workers. (Perhaps the pool would wait a minute or two and then if they are still idle, rm them).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant