Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cgroups awareness for XGBoost when using n_jobs=-1 #7653

Closed
daschnerm opened this issue Feb 14, 2022 · 2 comments · Fixed by #7654
Closed

Cgroups awareness for XGBoost when using n_jobs=-1 #7653

daschnerm opened this issue Feb 14, 2022 · 2 comments · Fixed by #7654

Comments

@daschnerm
Copy link

The performance of XGBoost can be significantly affected when running with default parameters in a container whose CPU is limited using cgroups.

Projects such as https://mybinder.org/ and other container-based projects, use cgroups to limit resource usage. This leads to a large performance impact, when default parameters (n_jobs=-1) are used: XGBoost reads the host CPU count using OpenMP, which leads to over-subscription and causes containers to be throttled immediately. In a small test using XGBClassifier with the Wisconsin breast cancer dataset (569 rows and 32 columns) and a container limited using cgroups to 1 CPU core, the executing time is about 90x slower compared to specifying n_jobs = container_cpus

n_jobs Execution time Container CPUs Host CPUs
-1 28s 1 16
1 0.3s 1 16

Looking at the number of threads spawned during the test, one can see, that indeed the host CPUs are used for determining the number of threads.

Therefore, I suggest, that:

  • XGboost should respect limits imposed by cgroups, when using default parameters / specifying n_jobs=-1
  • The limits may be overwritten by OMP_THREAD_LIMIT, if the variable is set

sklearn for example, respect the limits imposed by cgroups and can therefore correctly determine the number of suitable cores:

[...] take cgroups quotas into account when deciding the number of threads used by OpenMP. This avoids performance problems caused by over-subscription when using those classes in a docker container for instance

An implemented could look similar to the one of Sklearn:

  • min(openmp.omp_get_max_threads(), cpu_count())
  • , where cpu_count = min(physical_cpu_cout(), container_cpu_count())

container_cpu_count could be implemented similar to joblib, by reading /sys/fs/cgroup/cpu/cpu.cfs_quota_us and /sys/fs/cgroup/cpu/cpu.cfs_period_us to calculate the usable CPU cores XGBoost using int(math.ceil(cfs_quota_us / cfs_period_us)).

@trivialfis
Copy link
Member

Thank you for raising the issue, would you like to review #7654 ?

@daschnerm
Copy link
Author

@trivialfis Sorry, I was a bit too late. Many thanks for the fast fix!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants