Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

loadbalancing, plan(multisession) and persistent workers. #63

Closed
hans-ekbrand opened this issue Mar 21, 2021 · 2 comments
Closed

loadbalancing, plan(multisession) and persistent workers. #63

hans-ekbrand opened this issue Mar 21, 2021 · 2 comments
Labels

Comments

@hans-ekbrand
Copy link

hans-ekbrand commented Mar 21, 2021

Thanks for future and doFuture!

I use these in a package, and on linux plan(multicore) and .future.options=list(prescheduling=FALSE) works perfect. But I have to support MS Windows too.

My use case is one where computation time is very uneven and for most jobs, the computation time is relatively short compared to the time of starting R. The program uses many foreach loops, so having a way to keep the workers between these loops would be great, but as far as I can tell, each worker is discarded when it is done with its job(s), is that correct?

For MS Windows, I use plan(multisession) and if I use prescheduling=FALSE, then it seems a completely new R instance is started for each job, which is very bad if the average computation time for a job is in the same magnitude as the computation time to start R and load the required libraries. So for now I use prescheduling=TRUE for Windows, and while it is not optimal it works pretty OK. Is there a better way to do it for me?

The real problem though, is that I can not figure out how to make the workers persistant, which is very frustrating since I have a computer with 80 logical cores, but starting 80 new instances of R for all my foreach loops is very slow.

This is my boilerplate code, and I have about 15 of these in the whole program. Is there a way to make the workers persistant through the whole program?

doFuture::registerDoFuture()
if(.Platform$OS.type == "unix") {
    plan(multicore)
    my.scheduling=FALSE
} else {
    plan(multisession, gc=TRUE, workers=n.cores)
    my.scheduling=TRUE
}
foreach::foreach(...,
                       .options.future = list(scheduling = my.scheduling)

Kind regards,

Hans Ekbrand

@HenrikBengtsson
Copy link
Owner

Hi.

... each worker is discarded when it is done with its job(s), is that correct?

Nah, plan(multisession) launches R workers in the background and keeps them around until you shut them down, which you can do by switching plan, e.g. plan(sequential) ...

I use plan(multisession) and if I use prescheduling=FALSE, then it seems a completely new R instance is started for each job,

... so, that's not a correct conclusion.

which is very bad if the average computation time for a job is in the same magnitude as the computation time to start R and load the required libraries. So for now I use prescheduling=TRUE for Windows, and while it is not optimal it works pretty OK.

Note that there's no option/argument called prescheduling, but I assume you meant scheduling as in your code snippet.

Have a look at Section 'Load balancing ("chunking")' in ?doFuture::doFuture. Specifically, note that you're using one of the two extremes right now:

  • .options.future = list(scheduling = TRUE) = .options.future = list(scheduling = 1.0), and
  • .options.future = list(scheduling = FALSE) = .options.future = list(scheduling = +Inf).

Try with for instance .options.future = list(scheduling = 5.0). That way each worker will get approximately 5 chunks. That helps to deal with non-uniform runtimes for the different iterations.

This is my boilerplate code, and I have about 15 of these in the whole program. ...

Are you saying you're calling registerDoFuture() and calls plan(...) in each function call? If so, then, yes, you're most likely paying a large overhead from plan(multisession) setting up new workers in each function call. It's much better to leave it to the end-user to configure the plan() once before calling your functions.

@HenrikBengtsson
Copy link
Owner

Closing because no follow-up after more than a year.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants