Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Best practice for using max 2 cores on CRAN? #5658

Closed
tdhock opened this issue Jun 14, 2023 · 60 comments
Closed

Best practice for using max 2 cores on CRAN? #5658

tdhock opened this issue Jun 14, 2023 · 60 comments
Labels

Comments

@tdhock
Copy link
Member

tdhock commented Jun 14, 2023

CRAN has a policy that limits the number of cores that can be used during package checks, and they have recently implemented a mechanism for checking compliance. Upon recent CRAN submission of one of my R packages which imports data.table, I got the following rejection message from CRAN:

Flavor: r-devel-linux-x86_64-debian-gcc
Check: examples, Result: NOTE
  Examples with CPU (user + system) or elapsed time > 5s
                         user system elapsed
  aum_line_search      12.349  0.322   1.935
  aum_line_search_grid 10.033  0.308   1.781
  Examples with CPU time > 2.5 times elapsed time
                         user system elapsed ratio
  aum_line_search      12.349  0.322   1.935 6.548
  aum_line_search_grid 10.033  0.308   1.781 5.806
  aum_diffs_penalty     4.730  0.169   1.635 2.996

These messages can be suppressed by adding data.table::setDTthreads(1) at the start of each example.
Is there another/recommended way to avoid using too many cores when checking a CRAN package which uses data.table? The default number of threads in data.table is max cores/2 which is over the CRAN limit (max 2 cores I believe).
Also is there documentation about this somewhere? Probably would be good to mention in the datatable-importing vignette.

@eddelbuettel
Copy link
Contributor

they have recently implemented a mechanism for checking compliance

I don't think it is that recent. Package tiledb got dinged three or so years ago, and I then implemented a scheme that works.

Here is what the package does now. In essence, in every example file or unit test file (i.e. files that CRAN runs) I explicitly call a function that throttles cores down to

  • the given function value, if there is one, in this throttler function
  • or the value of option Ncpu with a fallback value of 2 (and that is what CRAN gets; as I recall they set Ncpu)
  • or the value of environment variable OMP_THREAD_LIMIT common to set OpenMP and MKL thread counts.

So this gets us the max of 2 in everything that CRAN runs yet it can be overriden (if, they, your CI has more cores). I already had Ncpu set locally to e.g. install multiple packages in parallel via install.packages() / update.package() ).

Importantly, it leaves general startup (i.e. .onLoad(), .onAttach() ) alone: it still gets full core count and performance.

So normal users are not affected and get full speed, but CRAN gets its limit. I can detail that some more you if you care; it is in tiledb file R/config.R (but a little tied to how the config object progagates for us).

@jangorecki
Copy link
Member

Also reading DT news file can point to commits that set that up for DT

@tdhock
Copy link
Member Author

tdhock commented Jun 15, 2023

@eddelbuettel thanks for the quick feedback, that approach seems similar to what I did, but more flexible. You wrote that CRAN sets option Ncpu -- do you know where is that documented? On https://cran.r-project.org/web/packages/policies.html I see "If running a package uses multiple threads/cores it must never use more than two simultaneously: the check farm is a shared resource and will typically be running many checks simultaneously. " but no mention of option Ncpu.

Also I was thinking that if CRAN indeed sets option Ncpu then the data.table default number of threads should respect that (avoids having to change example/test code in 1000+ packages with hard dependency on data.table).

@jangorecki I checked https://github.com/Rdatatable/data.table/blob/master/NEWS.md but I did not see mention of commits, can you please clarify?

@tdhock
Copy link
Member Author

tdhock commented Jun 15, 2023

An alternative to option Ncpu would be to just detect if on CRAN, for example using code below, and then throttle to two cores for data.table default.

> testthat:::on_cran
function () 
{
    !env_var_is_true("NOT_CRAN")
}

I do not see mention of NOT_CRAN env var on CRAN repository policy either, do you know where that is documented?

@eddelbuettel
Copy link
Contributor

eddelbuettel commented Jun 15, 2023

Relying on absence of NOT_CRAN has always been and still is a hack I do not recommended, or use. YYMV.

I do not recall where they document but it is documented somewhere that they allow two threads and AFAICR Ncpus is the one vessel for that payload.

@tdhock
Copy link
Member Author

tdhock commented Jun 16, 2023

There is a related R-devel thread, https://stat.ethz.ch/pipermail/r-devel/2021-November/081289.html that explains the env var _R_CHECK_LIMIT_CORES_=TRUE is set when running R CMD check --as-cran.

@tdhock
Copy link
Member Author

tdhock commented Jun 16, 2023

?tools::check_packages_in_dir says Ncpus option is used to specify how many packages are checked in parallel,

   Ncpus: the number of parallel processes to use for parallel
          installation and checking.

@eddelbuettel
Copy link
Contributor

But note that what you quoted does not limit it to package checks as your statement implies but to general 'checking'. Which is where we started: how to play nice at CRAN and not exceed limits.

@tdhock
Copy link
Member Author

tdhock commented Jun 16, 2023

This check is implemented via this code https://github.com/wch/r-source/blob/1c0545ba7c6c07e8c358eda552b875b1e4d6826d/src/library/tools/R/check.R#L4123 which gets the 2.5 ratio from the environment variable _R_CHECK_EXAMPLE_TIMING_CPU_TO_ELAPSED_THRESHOLD_ so we could check to see if that is set and then round down as a default for number of threads.

@eddelbuettel
Copy link
Contributor

eddelbuettel commented Jun 16, 2023

Also:

Rprintf(_(" OMP_THREAD_LIMIT %s\n"), mygetenv("OMP_THREAD_LIMIT", "unset")); // CRAN sets to 2

Recall that my approach was about taking the lower limit from Ncpu amd OMP_NUM_THREADS (and I no longer recall why I do not / did not also set OMP_THREAD_LIMIT).

Also

The number of logical CPUs is determined by the OpenMP function \code{omp_get_num_procs()} whose meaning may vary across platforms and OpenMP implementations. \code{setDTthreads()} will not allow more than this limit. Neither will it allow more than \code{omp_get_thread_limit()} nor the current value of \code{Sys.getenv("OMP_THREAD_LIMIT")}. Note that CRAN's daily test system (results for data.table \href{https://cran.r-project.org/web/checks/check_results_data.table.html}{here}) sets \code{OMP_THREAD_LIMIT} to 2 and should always be respected; e.g., if you have written a package that uses data.table and your package is to be released on CRAN, you should not change \code{OMP_THREAD_LIMIT} in your package to a value greater than 2.

@tdhock
Copy link
Member Author

tdhock commented Jun 16, 2023

related to #5573 #5620 about other environment variables to look at to determine default number of threads.

@TimTaylor
Copy link

I also ran in to this on a submission this week and so did others recently (see r-devel thread where Dirk gives similar guidance).

I was confused as I assumed this would always be handled on the data.table side. I wonder if the CRAN test machine is no longer setting the OMP_THREAD_LIMIT environment variable which data.table is assuming it does?

@jangorecki
Copy link
Member

I haven't found in NEWS file anything useful as well.
#3300 seems to be related

@eddelbuettel
Copy link
Contributor

Pondering this a little more, I think this may not really belong here as data.table has long done 'The Right Thing (TM)'.

What may help @tdhock, @TimTaylor and likely others is a new helper function to throttle cores when running tests or examples in their packages. And while those packages may well use data.table, this is a general problem at CRAN so shouldn't there be a generic little helper somewhere in a (preferably zero-dependency) package?

@tdhock
Copy link
Member Author

tdhock commented Jun 20, 2023

#3300 quotes an email from Brian Ripley saying that CRAN sets OMP_THREAD_LIMIT=2, but that was back in 2019, so I suspect that @TimTaylor is right that CRAN is no longer doing that, but they are clearly setting _R_CHECK_EXAMPLE_TIMING_CPU_TO_ELAPSED_THRESHOLD_=2.5 because that is the source of the NOTE in my original post, so I would suggest using this env var instead.

@tdhock
Copy link
Member Author

tdhock commented Jul 17, 2023

#3300 (comment) is an issue originally opened in 2019 with a new comment in July 2023 quoting Uwe Ligges saying that CRAN no longer sets the OMP_THREAD_LIMIT env var, so I suggest we instead look at _R_CHECK_EXAMPLE_TIMING_CPU_TO_ELAPSED_THRESHOLD_ and round down.

@eddelbuettel
Copy link
Contributor

Your call between a second-hand quote (!!) in a 2019 ticket (!!) on one hand and the CRAN Repository Policy on the other.

The latter still says

If running a package uses multiple threads/cores it must never use more than two simultaneously: the check farm is a shared resource and will typically be running many checks simultaneously.

Examples should run for no more than a few seconds each: they are intended to exemplify to the would-be user how to use the functions in the package.

topepo added a commit to tidymodels/parsnip that referenced this issue Aug 17, 2023
topepo added a commit to tidymodels/parsnip that referenced this issue Aug 17, 2023
* version bump

* Documented arguments not in \usage

* changes for CRAN check; see Rdatatable/data.table#5658

* avoid tests on cran

* remove old checks; no longer supported in current python
@lrberge
Copy link

lrberge commented Aug 24, 2023

Is there something against placing:

if(any(grepl("_R_CHECK", names(Sys.getenv()), fixed = TRUE))){
  setDTthreads(2)
}

in data.table's onLoad?

It would save all dependent package maintainers to do that down the road. But maybe I'm missing something!

@TimTaylor
Copy link

TimTaylor commented Aug 24, 2023

The impression I get is that CRAN are nudging for packages to default to being single (or very low) threaded unless the package user explicitly says otherwise. related discussions as to whether users are even aware of the default behaviour in #5620 (comment).

EDIT: Add a link to the most recent R-package-devel thread with associated comments from Uwe, Dirk et al https://stat.ethz.ch/pipermail/r-package-devel/2023q3/009454.html

@eddelbuettel
Copy link
Contributor

@lrberge That is pretty clever but it would bite folks like me who have values in a dotfile for R CMD check:

> grep("_R_", names(Sys.getenv()), value=TRUE)
[1] "_R_CHECK_COMPILATION_FLAGS_KNOWN_" "_R_CHECK_TESTS_NLINES_"           
> 

It would be better if we could nudge Uwe towards setting the OMP variable on his machine! He is the one who wants the lower core and process count there!

@lrberge
Copy link

lrberge commented Aug 24, 2023

@eddelbuettel how come you have "_R_CHECK" env vars lying around?! :-)
I agree the cleanest solution would be the OMP variable.

@TimTaylor: the current default for many packages with multi threading is usually to max out resources (or close to) which in many cases isn't the best.
So nudging users to set it themselves can be a solution. The big issue is... it makes R very user unfriendly.

# A)
library(data.table)
setDTthreads(percent = 0.5)

# B)
library(data.table)

Running A) instead of B) is really a pain and nobody will do it: this will follow that R multi-core-disabled functions will look slow on large tasks as compared to other non-R software.
Placing A) in an .Rprofile is too much to ask for new R users.
If R takes the single core path, it must be written in bold and large font how to change that (in particular the .Rprofile mechanism) in as many places as possible, otherwise new users simply will be disappointed with performance on demanding tasks and won't look further.

So in my view: a) using all available (or many) threads as a default isn't great, b) using single thread mode as default and nudging users to change that has big caveats and isn't really user-firendly. So maybe the way is c)? c) "smart" thread setting, with the number of threads decided using the parameters of the task to achieve? (A bit like the throttle argument of setDTthread but with more detailed heuristics.)
It seems to me that currently the official R core team position and direction isn't really clear (yet).

@jangorecki
Copy link
Member

A) is already the default one, no need to specify that value

@eddelbuettel
Copy link
Contributor

using all available (or many) threads as a default isn't great

Wit my "work" hat on: I disagree. We build software for 'big enough' problems and e.g. our underlying library maxes out by default. I strongly believe that is the right thing. (And in our package I put a throttler in as described above).

So I actually hold both views: I like what data.table does as a default and would not want it to default to 2, but as a widely-used package it would also be nice to help other packages using it.

Back to: How do we get Uwe to change his mind? 😁

@jangorecki
Copy link
Member

jangorecki commented Aug 24, 2023

Most critical pieces of DT openmp does not speed up linearly with increase of number of cores. We observed that difference between 50% and 100% was not very big and decided to stay on 50% default to avoid problems on shared machines that have been reported couple times already. Although there are other pieces in DT which will scale more linearly, like froll, and here having 100% would be better default.

@aitap
Copy link

aitap commented Oct 1, 2023

In addition to what @jangorecki says:

If you're using parallel-style clusters together with data.table, you need to split your core allowance between the two. This means either makeCluster(1) and setDTthreads(2) or the other way around. (This problem isn't novel or unique to data.table. Anyone with OpenBLAS/MKL/other parallel BLAS and a parallel cluster has already had to manage it somehow.)

It's best to minimise changes to global state from the examples. Start your examples with \dontshow{.prev.dt.threads <- data.table::setDTthreads(2)} and end them with \dontshow{data.table::setDTthreads(.prev.dt.threads); rm(.prev.dt.threads)}. (Use a variable name starting with a dot because those are by convention not for the user to care about despite being present in the global environment.) Otherwise a user who had set up their DT threads to their liking will be surprised to see data.table slowed down again after running example(your_function).

@EmilHvitfeldt
Copy link

@jangorecki, your suggested solutions does not work in my use case.

with tidymodels/textrecipes#251 I still get the CRAN submission NOTE with

* checking examples ... [17s/12s] NOTE
Examples with CPU time > 2.5 times elapsed time
                 user system elapsed ratio
step_dummy_hash 1.579   0.11   0.273 6.187
* checking for unstated dependencies in ‘tests’ ... OK
* checking tests ... [24s/19s] OK
  Running ‘testthat.R’ [24s/18s]

I don't know if it is because I don't use {data.table} directly and instead using a package {text2vec} that uses {data.table}.

Last release I ended up simply deleting the documentation for the affected functions, but that is not a long-term solution.

@jangorecki
Copy link
Member

@EmilHvitfeldt hard for me to imagine why that could still cause problem, unless there is some nested parallelism, like mclapply + data.table, then inside topmost parallel code should be call to set any nested code to be single threaded, not just for data.table but for any multi threaded computations. Are you able to narrow down from where parallel processing comes?

@TimTaylor
Copy link

@EmilHvitfeldt - Are you seeing this on all platforms? Not dug further but text2vec has this in R/zzz.R:

.onLoad = function(libname, pkgname) {
  n_cores = 1L
  if(.Platform$OS.type == "unix")
    n_cores = parallel::detectCores(logical = FALSE)
  options("text2vec.mc.cores" = n_cores)

  logger = lgr::get_logger('text2vec')
  logger$set_threshold('info')
  assign('logger', logger, envir = parent.env(environment()))
}

As @jangorecki alludes, could there be more going on than just data.table?

@eddelbuettel
Copy link
Contributor

It again demonstrates that CRAN was not really helpful here by insisting each and every package fix that on their own.

@cdriveraus
Copy link

given that they are considering a general solution to the problem, waiting for that before rigidly enforcing the 2 core thing would have seemed sensible to me...

@eddelbuettel
Copy link
Contributor

@cdriveraus News to me. Can you supply a reference to back up that clain? Last we all saw was Uwe Ligges telling everybody to set up 'max 2 cores' in each package needing it.

@cdriveraus
Copy link

@eddelbuettel I thought it was mentioned somewhere on the pkg devel list at one point, but I'm guessing you're more familiar than me so I'm assuming I misinterpreted / imagined one of the posts I read...

@EmilHvitfeldt
Copy link

EmilHvitfeldt commented Oct 19, 2023

One of the frustration part of this problem is that I'm not able to reproduce this problem locally. Unless broken, the {text2vec} parallelism is only for certain functions which I'm not using (as far as I know).

Only seeing it on the Debian incoming check machine.

right now my examples have the following to no avail.

#' \dontshow{library(data.table)}
#' \dontshow{data.table::setDTthreads(2)}
#' \dontshow{Sys.setenv("OMP_THREAD_LIMIT" = 2)}
#' \dontshow{library(text2vec)}
#' \dontshow{options("text2vec.mc.cores" = 1)}

Also I very much appreciate all the help that is coming in this thread!

@helske
Copy link

helske commented Oct 24, 2023

I think there is something else going on on CRAN than just the data.table behaviour. I have similar problems with the bssm package, which does not use data.table, but uses OpenMP in some selected functions where the number of threads is set by the the user in the R function call (default being one). I tried using Sys.setenv("OMP_THREAD_LIMIT" = 2) at the start of the problematic example, but I still get the same pretest check note from CRAN on Debian.

@jangorecki
Copy link
Member

yes, problem is not with data.table but openmp (or other multithreading code).

it is interesting that even in your case where it uses single thread by default, it still having the problem. Then I don't think there is any reliable solution that package maintainers can employ.

@jangorecki jangorecki removed the omp label Nov 7, 2023
polettif added a commit to r-transit/tidytransit that referenced this issue Dec 7, 2023
levenc added a commit to levenc/posologyr that referenced this issue Dec 8, 2023
ruthkr added a commit to ruthkr/greatR that referenced this issue Feb 8, 2024
@stitam
Copy link

stitam commented Feb 16, 2024

Not sure this a proper solution, but this is how I handle the issue within the webseq package. The package is not yet on CRAN but it does pass devtools::check() with the default cran = TRUE.

In each of my functions that use parallelisation, I set the default value for the number of cores to NULL and validate the number of cores with an internal get_mc_cores() function which 1. returns the number of cores if it has been set, 2. otherwise looks at getOption("Ncpu") and 3. if that returns NULL, falls back to parallel::detectCores().

Within a regular R session I do not set "Ncpu" so R will not find it and will fall back to using many cores by default. However, I start each of my test files with options("Ncpu" = 2L) so R CMD check will always work with 2 cores.

Do you folks think this will work on CRAN? Thanks.

@jangorecki
Copy link
Member

DT uses srtDTthreads rather than global options. Please read manual.

@aitap
Copy link

aitap commented Feb 17, 2024 via email

@stitam
Copy link

stitam commented Feb 17, 2024

Thanks @aitap! Oh, some of my examples were wrapped in dontrun{} so they were not evaluated by R CMD check.. I unwrapped them to see what happens and R CMD check threw an error, as expected. I'm okay with wrapping, and AFAIK it is not against CRAN policies, but if that's not acceptable, then unfortunately setting options("Ncpu" = 2L) within tests is not enough to pass the checks. Thanks for flagging the NA, currently nothing. I am unsure exactly when parallel::detectCores() returns NA, so I'll probably just add stop() for now.

@aadler
Copy link

aadler commented Feb 22, 2024

You may want to look into the parallelly package's availableCores function which is written to always return an integer. Increases your dependency count but may be worth it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.