Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WISH: It should be possible to adjust the number of assigned cores #7

Open
HenrikBengtsson opened this issue Jul 14, 2015 · 9 comments

Comments

@HenrikBengtsson
Copy link
Owner

HenrikBengtsson commented Jul 14, 2015

2016-11-03: Issue was originally on 'availableCores("mc.cores") should returngetOptions("mc.cores") + 1L' but it recently turned into a more general discussion on how to maximize core utilization. See below.

future::availableCores("mc.cores") should really return getOptions("mc.cores") + 1L, because from help('options'):

mc.cores:
a integer giving the maximum allowed number of additional R processes allowed to be run in parallel to the current R process. Defaults to the setting of the environment variable MC_CORES if set. Most applications which use this assume a limit of 2 if it is unset.

Further clarification: Multicore processing is not supported, it would effectively correspond to options(mc.cores = 0) and in that case availableCores() returns 1.

@DarwinAwardWinner
Copy link

DarwinAwardWinner commented Nov 2, 2016

I think you have misunderstood the use of the mc.cores option. If you system has 8 cores, then mc.cores should be set to 8, not 7. In this case, if you run mclapply, there will be 9 R processes: the parent process, which is doing no computations except for receiving and deserializing return values, and 8 forked processes performing computations. Since the parent process isn't doing any calculations it doesn't need its own core, so using mc.cores=8 allows 8 R processes to do computations in parallel. Hence, mc.cores represents exactly the level of parallelism the user requests, so it is wrong to add 1 to the value.

@DarwinAwardWinner
Copy link

DarwinAwardWinner commented Nov 2, 2016

You can show that mc.cores=8 results in 8 processes performing computations like this:

library(parallel)
library(magrittr)
options(mc.cores=8)
mclapply(1:100, function(x) Sys.getpid()) %>% unlist %>% unique %>% length

which prints 8.

@HenrikBengtsson
Copy link
Owner Author

Thanks for your comment. I agree with you in the mclapply() case, where the main / calling R process is only waiting for the mc.cores additional forked R processes to finish. In this case we have very little CPU load from the main R process, because it's mostly spending it's time waiting.

With the objective of optimizing the CPU usage on the machine, then it certainly makes sense to have mc.cores to equal the number of available cores on the machine, as you suggest.

However, with futures, you can imagine the main R process to also run at a 100% load, because it doesn't have be used for waiting only. A prototype example:

x <- list()
for (kk in 1:8) x[[kk]] <- future(some_heavy_calc(n = 1e6))
x[[9]] <- some_heavy_calc(n = 1e6)

The above could very well result in 9 cores running at a 100% load. So, for such a case, it does make sense to have mc.cores be one less core in order to put more load on the machine than intended.

Of course, the question is how common the above use case is where you use the main process for full processing too, or whether it is more common to use the main one to create futures and then wait for them to resolve (by a low-load polling). I have both cases show up in how I use futures myself, but I'd say the latter is indeed more common.

It is of course annoying if you have access to, say, 4 cores, but due to the default of availableCores(), you'll really utilize only 4-1 cores. One can of course, override this behavior by specifying plan(multicore, workers = availableCores()+1) and you get what you want. But, that's a bit hacky, so maybe it would make sense to change the default behavior to do this.

On the other hand, consider a compute cluster where you have to request the number of cores you plan to use. Say you request a 32-core job. The scheduler with launching your R process on an allocated machine and it gives you 32 cores to work with. With the current defaults of the futures package, availableCores() will result in you running 32 parallel processes on that machine. If it would give us 33, then we would have in total 33 R processes running. That's ok, if one of them is only use 1-2% CPU load, so effectively we're only using a ~32*100% load. However, if we have a very strict and conservative scheduler, it could be that it monitors the number of processes you run and if you run more than you were allotted, then it might simply terminate your job. This latter scenario is another reason why I decided on the "conservative" definition of how availableCores() uses mc.cores.

I hope this clarifies the reason for the current defaults. Having said this, I've been going back and forth thinking what the ideal default should be, and I'm open for change and suggestions. I would like to support both use cases in a simple way. Maybe there could be an global option for which style to use - I don't know.

See also HenrikBengtsson/Wishlist-for-R#7

@DarwinAwardWinner
Copy link

The real problem is that I can't just unconditionally subtract 1 from the result of availableCores in order to undo the +1, because if it uses, say SLURM_CPUS_PER_TASK to decide the number of cores, it doesn't add one to it. This seems inconsistent to me. I think it should either always add one or never add one to anything. Maybe you could just have an "reservedCores" argument that is unconditionally subtracted from the returned number. If you're going to be doing processing in the main process as well as parallel futures, then use availableCores(reservedCores=1) to tell future that one core is reserved for the parent, otherwise if the parent is just waiting for results, use availableCores(reservedCores=0).

@DarwinAwardWinner
Copy link

Also, regarding the possibility of an overly strict scheduler killing one's job because it's running one too many processes, I don't think that is possible. Even if you ran a job with 32 R processes, there would still typically be at least a few other processes running, such the shell script that started the parent R process. This is similar to the common case of a single-CPU job running a command such as cat input.txt | grep something | sort | uniq -c | tee output.txt. That pipeline runs 4 processes, plus the shell itself, but no sane job manager would kill or disallow that job because it requested fewer than 5 cores.

@HenrikBengtsson HenrikBengtsson changed the title availableCores("mc.cores") should return getOptions("mc.cores") + 1L WISH: It should be to adjust the number of assigned cores Nov 3, 2016
@HenrikBengtsson
Copy link
Owner Author

Thanks for all this. I'll reopen this issue with a new title, because I think it's worth a thorough discussion.

I think what is needed is for the developer to be able to specify whether s/he here code is going to utilize C or C+1 cores. Basically, this configuration / option needs to be close to the code block that creates futures. This code could be hidden deep in a package so it's really only the developer who knows it's there and how it works. In other words, this should most likely not be something that the end user should control via plan(...) per se (unless she writes futures herself).

About killing jobs running too many processes: Yes, I agree it could be very complicated to automate policing of CPU resources and may the number of process is a very poor proxy for that. I still have to see a HPC environment where there is a policy to automatically kill jobs that use more CPU resources than requested. However, I keep hearing it being discussed as an idea in different academic environments with shared HPC resources (my world). I basically know to little about how it works elsewhere, but I did take the conservative approach for reasons I gave in previous comments.

@DarwinAwardWinner
Copy link

Maybe you could have a pair of functions named reserveCores and releaseCores that can be used to tell future not to use a certain number of the available cores. Or instead of releaseCores, just use reserveCores(0). Something like that. Obviously, you'll need to figure out how to handle the case of reserving more cores than are available.

@HenrikBengtsson HenrikBengtsson changed the title WISH: It should be to adjust the number of assigned cores WISH: It should be possible to adjust the number of assigned cores Dec 27, 2016
@HenrikBengtsson
Copy link
Owner Author

I'm coming back to this old unresolved issue, and have been trying to come up with a nice API to hand both. Then peeking back hear, it looks like I've came to the same idea / reinvented what you suggested above. Here's a follow up for new followers.

Problem

If we would not reserve one "worker" for multiprocess (multicore or multisession) to the main R process as is done currently, we would end up with:

library("future")
plan(multiprocess, workers = 4)

x <- list()
for (kk in 1:8) {
  x[[kk]] <- future(some_heavy_calc(n = 1e6)) # 4 cores
  some_other_heavy_calc()
}
x <- resolve(x)

would consume 4 background process + then main process, totally a load of 5 cores.

Note, in the current conservative approach allows only 4-1 = 3 background processes so that we will only add a maximum load of 4 parallel processes to the current machine. The downside with this current approach is that:

for (kk in 1:8) {
  x[[kk]] <- future(some_heavy_calc(n = 1e6)) # 3 cores
}

would only run 3 parallel processes, although we have 4 cores available all in all. This is for instance what happens when using future_lapply() right now, e.g. Issue #146.

Note also that we apply this "protection" to local-machine processes. If we use, say, plan(cluster, workers = c("n1", "n2", "n3", "n4")), there will be truly four background workers available:

for (kk in 1:8) {
  x[[kk]] <- future(some_heavy_calc(n = 1e6)) # 4 cores
}

The rational for this is that those workers consume resources on the current machine.

Proposal

Maybe the use case where we put heavy load on the main R process while once-in-a-while poll the futures (here local background processes) to see if they're completed is rather rare. This argues for the current defaults being suboptimal.

If we change the defaults such that number of workers means the same regardless backend, we need a mechanism for developers to play nice in the first use case;

for (kk in 1:8) {
  x[[kk]] <- future(some_heavy_calc(n = 1e6)) # 4 cores
  some_other_heavy_calc()
}

A solution could be something like:

for (kk in 1:8) {
  x[[kk]] <- future(some_heavy_calc(n = 1e6)) # 4 cores
  # wait for a free worker to be available, but
  # run in the main R process
  reserveWorker({
    some_other_heavy_calc()
  })
}

Another though is to introduce argument sequential (better name?) to all futures, e.g.

for (kk in 1:8) {
  x[[kk]] <- future(some_heavy_calc(n = 1e6)) # 4 cores
  # wait for a free worker to be available, but
  # run in the main R process
  future({
    some_other_heavy_calc()
  }, sequential = TRUE)
}

which may be more natural.

HenrikBengtsson added a commit that referenced this issue May 19, 2017
… uses now equals then number of 'workers'; in the past it was one less.

This solves Issue #146 and is also related to Issue #7.
@HenrikBengtsson
Copy link
Owner Author

FYI, I've now updated the develop branch (= 1.4.0-9000, to become next release) of the package such that:

SIGNIFICANT CHANGES:

 o Multicore and multisession futures no longer reserve one core for the
   main R process, which was done to lower the risk for producing a higher
   CPU load than the number of cores available for the R session.

That could be considered to fix the first part of this issue. I still haven't decided on what to do with reserveWorker() or its alternatives.

@HenrikBengtsson HenrikBengtsson modified the milestones: 1.3.0 (submitted), Next release May 23, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants