You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The performance of XGBoost can be significantly affected when running with default parameters in a container whose CPU is limited using cgroups.
Projects such as https://mybinder.org/ and other container-based projects, use cgroups to limit resource usage. This leads to a large performance impact, when default parameters (n_jobs=-1) are used: XGBoost reads the host CPU count using OpenMP, which leads to over-subscription and causes containers to be throttled immediately. In a small test using XGBClassifier with the Wisconsin breast cancer dataset (569 rows and 32 columns) and a container limited using cgroups to 1 CPU core, the executing time is about 90x slower compared to specifying n_jobs = container_cpus
n_jobs
Execution time
Container CPUs
Host CPUs
-1
28s
1
16
1
0.3s
1
16
Looking at the number of threads spawned during the test, one can see, that indeed the host CPUs are used for determining the number of threads.
Therefore, I suggest, that:
XGboost should respect limits imposed by cgroups, when using default parameters / specifying n_jobs=-1
The limits may be overwritten by OMP_THREAD_LIMIT, if the variable is set
sklearn for example, respect the limits imposed by cgroups and can therefore correctly determine the number of suitable cores:
[...] take cgroups quotas into account when deciding the number of threads used by OpenMP. This avoids performance problems caused by over-subscription when using those classes in a docker container for instance
An implemented could look similar to the one of Sklearn:
min(openmp.omp_get_max_threads(), cpu_count())
, where cpu_count = min(physical_cpu_cout(), container_cpu_count())
container_cpu_count could be implemented similar to joblib, by reading /sys/fs/cgroup/cpu/cpu.cfs_quota_us and /sys/fs/cgroup/cpu/cpu.cfs_period_us to calculate the usable CPU cores XGBoost using int(math.ceil(cfs_quota_us / cfs_period_us)).
The text was updated successfully, but these errors were encountered:
The performance of XGBoost can be significantly affected when running with default parameters in a container whose CPU is limited using cgroups.
Projects such as https://mybinder.org/ and other container-based projects, use cgroups to limit resource usage. This leads to a large performance impact, when default parameters (n_jobs=-1) are used: XGBoost reads the host CPU count using OpenMP, which leads to over-subscription and causes containers to be throttled immediately. In a small test using XGBClassifier with the Wisconsin breast cancer dataset (569 rows and 32 columns) and a container limited using cgroups to 1 CPU core, the executing time is about 90x slower compared to specifying
n_jobs = container_cpus
Looking at the number of threads spawned during the test, one can see, that indeed the host CPUs are used for determining the number of threads.
Therefore, I suggest, that:
OMP_THREAD_LIMIT
, if the variable is setsklearn for example, respect the limits imposed by cgroups and can therefore correctly determine the number of suitable cores:
An implemented could look similar to the one of Sklearn:
container_cpu_count could be implemented similar to joblib, by reading
/sys/fs/cgroup/cpu/cpu.cfs_quota_us
and/sys/fs/cgroup/cpu/cpu.cfs_period_us
to calculate the usable CPU cores XGBoost usingint(math.ceil(cfs_quota_us / cfs_period_us))
.The text was updated successfully, but these errors were encountered: