Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

R_DATATABLE_NUM_THREADS env var alone ignored #4514

Closed
jangorecki opened this issue May 30, 2020 · 5 comments · Fixed by #4562
Closed

R_DATATABLE_NUM_THREADS env var alone ignored #4514

jangorecki opened this issue May 30, 2020 · 5 comments · Fixed by #4562
Milestone

Comments

@jangorecki
Copy link
Member

@jangorecki jangorecki commented May 30, 2020

Shouldn't we default num_proc_percent to 100% when num_threads is provided?

R_DATATABLE_NUM_THREADS=2 Rscript -e 'data.table::getDTthreads(T)'
#  omp_get_num_procs()            2
#  R_DATATABLE_NUM_PROCS_PERCENT  unset (default 50)
#  R_DATATABLE_NUM_THREADS        2
#  omp_get_thread_limit()         2147483647
#  omp_get_max_threads()          2
#  OMP_THREAD_LIMIT               unset
#  OMP_NUM_THREADS                unset
#  RestoreAfterFork               true
#  data.table is using 1 threads. See ?setDTthreads.
#[1] 1

ans = imax(ans*perc/100, 1);

@MichaelChirico
Copy link
Member

@MichaelChirico MichaelChirico commented May 30, 2020

Hmm I think it's slightly different. Actually what's happening is R_DATATABLE_NUM_THREADS is too high so it's ignored:

ans = imin(ans, getIntEnv("R_DATATABLE_NUM_THREADS", INT_MAX));

the min here is imax(ans*perc/100, 1), i.e. 1.

For me I have omp_get_num_procs()=8 so your code runs as expected:

R_DATATABLE_NUM_THREADS=2 Rscript -e 'data.table::getDTthreads(T)'
#   omp_get_num_procs()            8
#   R_DATATABLE_NUM_PROCS_PERCENT  unset (default 50)
#   R_DATATABLE_NUM_THREADS        2
#   omp_get_thread_limit()         2147483647
#   omp_get_max_threads()          8
#   OMP_THREAD_LIMIT               unset
#   OMP_NUM_THREADS                unset
#   RestoreAfterFork               true
#   data.table is using 2 threads. See ?setDTthreads.
# [1] 2

@MichaelChirico
Copy link
Member

@MichaelChirico MichaelChirico commented May 30, 2020

I guess your point is if NUM_THREADS is set and NUM_PROCS_PERCENT is not, NUM_THREADS the default of NUM_PROCS_PERCENT becomes 100...

@jangorecki
Copy link
Member Author

@jangorecki jangorecki commented May 30, 2020

try using 5

@jangorecki
Copy link
Member Author

@jangorecki jangorecki commented May 30, 2020

maybe that is actually good, so it is more flexible to cap threads to min of two provided. Then we just need to mention that in manual.

@jangorecki jangorecki added this to the 1.12.9 milestone Jun 16, 2020
@mattdowle
Copy link
Member

@mattdowle mattdowle commented Jun 20, 2020

On the principle of the first-thought-best, let's go with Jan's first thought and change it. When you use setDTthreads() at the R prompt it works at you expect, so fixing the environment variable is more like making its behaviour consistent with setDTthreads.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants