-
Notifications
You must be signed in to change notification settings - Fork 82
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WISH: It should be possible to adjust the number of assigned cores #7
Comments
I think you have misunderstood the use of the |
You can show that
which prints 8. |
Thanks for your comment. I agree with you in the With the objective of optimizing the CPU usage on the machine, then it certainly makes sense to have However, with futures, you can imagine the main R process to also run at a 100% load, because it doesn't have be used for waiting only. A prototype example: x <- list()
for (kk in 1:8) x[[kk]] <- future(some_heavy_calc(n = 1e6))
x[[9]] <- some_heavy_calc(n = 1e6) The above could very well result in 9 cores running at a 100% load. So, for such a case, it does make sense to have Of course, the question is how common the above use case is where you use the main process for full processing too, or whether it is more common to use the main one to create futures and then wait for them to resolve (by a low-load polling). I have both cases show up in how I use futures myself, but I'd say the latter is indeed more common. It is of course annoying if you have access to, say, 4 cores, but due to the default of On the other hand, consider a compute cluster where you have to request the number of cores you plan to use. Say you request a 32-core job. The scheduler with launching your R process on an allocated machine and it gives you 32 cores to work with. With the current defaults of the futures package, I hope this clarifies the reason for the current defaults. Having said this, I've been going back and forth thinking what the ideal default should be, and I'm open for change and suggestions. I would like to support both use cases in a simple way. Maybe there could be an global option for which style to use - I don't know. See also HenrikBengtsson/Wishlist-for-R#7 |
The real problem is that I can't just unconditionally subtract 1 from the result of |
Also, regarding the possibility of an overly strict scheduler killing one's job because it's running one too many processes, I don't think that is possible. Even if you ran a job with 32 R processes, there would still typically be at least a few other processes running, such the shell script that started the parent R process. This is similar to the common case of a single-CPU job running a command such as |
Thanks for all this. I'll reopen this issue with a new title, because I think it's worth a thorough discussion. I think what is needed is for the developer to be able to specify whether s/he here code is going to utilize C or C+1 cores. Basically, this configuration / option needs to be close to the code block that creates futures. This code could be hidden deep in a package so it's really only the developer who knows it's there and how it works. In other words, this should most likely not be something that the end user should control via About killing jobs running too many processes: Yes, I agree it could be very complicated to automate policing of CPU resources and may the number of process is a very poor proxy for that. I still have to see a HPC environment where there is a policy to automatically kill jobs that use more CPU resources than requested. However, I keep hearing it being discussed as an idea in different academic environments with shared HPC resources (my world). I basically know to little about how it works elsewhere, but I did take the conservative approach for reasons I gave in previous comments. |
Maybe you could have a pair of functions named |
I'm coming back to this old unresolved issue, and have been trying to come up with a nice API to hand both. Then peeking back hear, it looks like I've came to the same idea / reinvented what you suggested above. Here's a follow up for new followers. ProblemIf we would not reserve one "worker" for multiprocess (multicore or multisession) to the main R process as is done currently, we would end up with: library("future")
plan(multiprocess, workers = 4)
x <- list()
for (kk in 1:8) {
x[[kk]] <- future(some_heavy_calc(n = 1e6)) # 4 cores
some_other_heavy_calc()
}
x <- resolve(x) would consume 4 background process + then main process, totally a load of 5 cores. Note, in the current conservative approach allows only 4-1 = 3 background processes so that we will only add a maximum load of 4 parallel processes to the current machine. The downside with this current approach is that: for (kk in 1:8) {
x[[kk]] <- future(some_heavy_calc(n = 1e6)) # 3 cores
} would only run 3 parallel processes, although we have 4 cores available all in all. This is for instance what happens when using Note also that we apply this "protection" to local-machine processes. If we use, say, for (kk in 1:8) {
x[[kk]] <- future(some_heavy_calc(n = 1e6)) # 4 cores
} The rational for this is that those workers consume resources on the current machine. ProposalMaybe the use case where we put heavy load on the main R process while once-in-a-while poll the futures (here local background processes) to see if they're completed is rather rare. This argues for the current defaults being suboptimal. If we change the defaults such that number of workers means the same regardless backend, we need a mechanism for developers to play nice in the first use case; for (kk in 1:8) {
x[[kk]] <- future(some_heavy_calc(n = 1e6)) # 4 cores
some_other_heavy_calc()
} A solution could be something like: for (kk in 1:8) {
x[[kk]] <- future(some_heavy_calc(n = 1e6)) # 4 cores
# wait for a free worker to be available, but
# run in the main R process
reserveWorker({
some_other_heavy_calc()
})
} Another though is to introduce argument for (kk in 1:8) {
x[[kk]] <- future(some_heavy_calc(n = 1e6)) # 4 cores
# wait for a free worker to be available, but
# run in the main R process
future({
some_other_heavy_calc()
}, sequential = TRUE)
} which may be more natural. |
FYI, I've now updated the develop branch (= 1.4.0-9000, to become next release) of the package such that:
That could be considered to fix the first part of this issue. I still haven't decided on what to do with |
2016-11-03: Issue was originally on 'availableCores("mc.cores") should returngetOptions("mc.cores") + 1L' but it recently turned into a more general discussion on how to maximize core utilization. See below.
future::availableCores("mc.cores")
should really returngetOptions("mc.cores") + 1L
, because fromhelp('options')
:Further clarification: Multicore processing is not supported, it would effectively correspond to
options(mc.cores = 0)
and in that caseavailableCores()
returns 1.The text was updated successfully, but these errors were encountered: