You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I use these in a package, and on linux plan(multicore) and .future.options=list(prescheduling=FALSE) works perfect. But I have to support MS Windows too.
My use case is one where computation time is very uneven and for most jobs, the computation time is relatively short compared to the time of starting R. The program uses many foreach loops, so having a way to keep the workers between these loops would be great, but as far as I can tell, each worker is discarded when it is done with its job(s), is that correct?
For MS Windows, I use plan(multisession) and if I use prescheduling=FALSE, then it seems a completely new R instance is started for each job, which is very bad if the average computation time for a job is in the same magnitude as the computation time to start R and load the required libraries. So for now I use prescheduling=TRUE for Windows, and while it is not optimal it works pretty OK. Is there a better way to do it for me?
The real problem though, is that I can not figure out how to make the workers persistant, which is very frustrating since I have a computer with 80 logical cores, but starting 80 new instances of R for all my foreach loops is very slow.
This is my boilerplate code, and I have about 15 of these in the whole program. Is there a way to make the workers persistant through the whole program?
... each worker is discarded when it is done with its job(s), is that correct?
Nah, plan(multisession) launches R workers in the background and keeps them around until you shut them down, which you can do by switching plan, e.g. plan(sequential) ...
I use plan(multisession) and if I use prescheduling=FALSE, then it seems a completely new R instance is started for each job,
... so, that's not a correct conclusion.
which is very bad if the average computation time for a job is in the same magnitude as the computation time to start R and load the required libraries. So for now I use prescheduling=TRUE for Windows, and while it is not optimal it works pretty OK.
Note that there's no option/argument called prescheduling, but I assume you meant scheduling as in your code snippet.
Have a look at Section 'Load balancing ("chunking")' in ?doFuture::doFuture. Specifically, note that you're using one of the two extremes right now:
Try with for instance .options.future = list(scheduling = 5.0). That way each worker will get approximately 5 chunks. That helps to deal with non-uniform runtimes for the different iterations.
This is my boilerplate code, and I have about 15 of these in the whole program. ...
Are you saying you're calling registerDoFuture() and calls plan(...) in each function call? If so, then, yes, you're most likely paying a large overhead from plan(multisession) setting up new workers in each function call. It's much better to leave it to the end-user to configure the plan()once before calling your functions.
Thanks for future and doFuture!
I use these in a package, and on linux
plan(multicore)
and.future.options=list(prescheduling=FALSE)
works perfect. But I have to support MS Windows too.My use case is one where computation time is very uneven and for most jobs, the computation time is relatively short compared to the time of starting R. The program uses many foreach loops, so having a way to keep the workers between these loops would be great, but as far as I can tell, each worker is discarded when it is done with its job(s), is that correct?
For MS Windows, I use
plan(multisession)
and if I useprescheduling=FALSE
, then it seems a completely new R instance is started for each job, which is very bad if the average computation time for a job is in the same magnitude as the computation time to start R and load the required libraries. So for now I useprescheduling=TRUE
for Windows, and while it is not optimal it works pretty OK. Is there a better way to do it for me?The real problem though, is that I can not figure out how to make the workers persistant, which is very frustrating since I have a computer with 80 logical cores, but starting 80 new instances of R for all my foreach loops is very slow.
This is my boilerplate code, and I have about 15 of these in the whole program. Is there a way to make the workers persistant through the whole program?
Kind regards,
Hans Ekbrand
The text was updated successfully, but these errors were encountered: