New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
worker "pool" for nested paralellization #361
Comments
Correct x 2.
I'm not sure I fully understand, but I can guess what you're after. Basically, if you do: a <- future_lapply(x, function(y) {
future_lapply(y, function(z)) {
...
})
}) you want the inner and the outer "loops" to be able to pull from the same pool of "workers", correct? This is available if you use an external job scheduler such as those available in HPC environment. Then you could use: plan(list(outer = batchtools_slurm, inner = batchtools_slurm)) Both layers will submit their jobs (=futures) to the same job queue and it's up to the job scheduler to allocate resources as they get available. Try to implement something similar in R is tedious but should be doable. Maybe one could build upon Gábor Csárdi's work in Multi Process Task Queue in 100 Lines of R Code, 2019-09-09. But, point is, this is not really something that should be implemented in the future package. Instead, it should/could be added asa new type of backend that futures can rely on - think: library(future.taskqueue)
plan(list(outer=taskqueue, inner=taskqueue))
... The future.tests package can be used to validate that it is properly implemented and meets the requirements of the future framework. |
Yes, that's what I meant. Though I was thinking less about nested loops in client code that are known to the user and easily configured with That would be my main argument for allowing a simple queue scheduler into Another argument might be that (I wish I could promise a PR, but it would be easier to promise that I'll never find the time...). |
If I understand correctly,
plan(tweak(multicore, workers=8))
means that the first nesting level gets 8 parallel threads and the second nesting level gets no parallelism. I could hard-allocate threads to each level, but that's hard to do since it means I have to know all thread usages down the tree of packages.What I'm looking for is a "worker pool" like implementation. A naive greedy allocation using a semaphore that decrements every time a thread is forked off would be a good start. So that if I have a loop of three calling a package that has uses
future.apply
on a huge vector but takes very long to even get there, the NN workers can be busy for as much of the time as possible.Interaction with in particular OMP is a problem of course. A lot of things seem to use that. IRC, Intel TBB auto-detects the number of "useful" threads to use and adjusts this value as it goes based on system load. Something like this would need extra house keeping, but the concept of "don't start more threads if all my workers/cpus are busy", or even "don't start more threads if we are at XY% memory" would be very useful to robustly run things in parallel.
The text was updated successfully, but these errors were encountered: