Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why future use more CPU than worker number? #343

Closed
ShixiangWang opened this issue Sep 30, 2019 · 7 comments
Closed

Why future use more CPU than worker number? #343

ShixiangWang opened this issue Sep 30, 2019 · 7 comments
Labels

Comments

@ShixiangWang
Copy link

I specified workers = 16 but all 24 cores are used when set future::plan("multiprocess", workers = 16)

image

@renkun-ken
Copy link

It looks like workers are using multi-threading, for example, data.table uses OpenMP in subsetting, and some model fitting functions uses OpenMP or LAPACK.

@ShixiangWang
Copy link
Author

ShixiangWang commented Sep 30, 2019

@renkun-ken Thanks. Do you mean future starts 16 threads instead of 16 cores, and these 16 threads use all available computer cores to do computation?

I still cannot understand the difference between cores and workers from the following command results:

> parallel::detectCores()
[1] 8
> future::availableCores()
system 
     8 
> future::availableWorkers()
[1] "localhost" "localhost" "localhost" "localhost" "localhost" "localhost" "localhost" "localhost"

Do you have any idea to limit cores used for computation in future? Or is there any transformation method between the number of cores and the number of workers?

@renkun-ken
Copy link

futures only starts processes, i.e. a worker. Some packages use OpenMP or some other multi-threading techniques to improve performance. If multi-threading is enabled, each worker may start multiple threads. The overall CPU usage can be very high if many threads are working. Even if you only start 2 workers, if each worker also starts 20 threads, you will observe all CPUs are occupied.

@ShixiangWang
Copy link
Author

@renkun-ken Yes, it is indeed what I observed when I set different number of workers. But I cannot find a solution to limit it 😢. I think it is a disaster if all CPUs are occupied too much time, it will affect other programs and users in linux server.

@renkun-ken
Copy link

There are plenty of methods to limit the number of threads. Some packages provide their own get/set threads functions, e.g. data.table::setDTthreads() and fst::threads_fst(), which basically sets OpenMP number of threads. data.table allows you to limit the number of threads with an environment variable R_DATATABLE_NUM_THREADS. fst allows you to do the same with an R options like options(fst_threads = 10). To limit OpenMP threads in general, you may also write export OMP_NUM_THREADS=20 in your shell profile (.bashrc, .zshrc, .etc)

Some functions or packages use BLAS and do not provide a function to limit the number of threads. In this case, you may try RhpcBLASctl, which provided functions to limit the number of threads of both OpenMP and BLAS.

For Linux server with multiple users, I suggest that every user always limit the max number of threads to use.

@ShixiangWang
Copy link
Author

@renkun-ken Thanks, I will take a try.

@mxblsdl
Copy link

mxblsdl commented Jan 29, 2020

I would like to add that data.table::setDTthreads() needs to be called each time Sys.setenv() is called. Running

plan(multisession, workers = works)
future_lapply(list, function(x) {
  # creates a new environment
data.table::setDTthreads(1)
}

works for me, while

data.table::setDTthreads(1)
plan(multisession, workers = works)
future_lapply(list, function(x) {
  # creates a new environment
}

Does not work and I see CPU +100% when looking at top

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants