Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions autosklearn/evaluation/abstract_evaluator.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,8 @@

from smac.tae import StatusType

from threadpoolctl import threadpool_limits

import autosklearn.pipeline.classification
import autosklearn.pipeline.regression
from autosklearn.constants import (
Expand Down Expand Up @@ -193,6 +195,9 @@ def __init__(
budget_type: Optional[str] = None,
):

# Limit the number of threads that numpy uses
threadpool_limits(limits=1)

self.starttime = time.time()

self.configuration = configuration
Expand Down
18 changes: 11 additions & 7 deletions doc/manual.rst
Original file line number Diff line number Diff line change
Expand Up @@ -172,13 +172,17 @@ is exhausted.

**Note:** *auto-sklearn* requires all workers to have access to a shared file system for storing training data and models.

Furthermore, depending on the installation of scikit-learn and numpy,
the model building procedure may use up to all cores. Such behaviour is
unintended by *auto-sklearn* and is most likely due to numpy being installed
from `pypi` as a binary wheel (`see here <https://scikit-learn-general.narkive
.com/44ywvAHA/binary-wheel-packages-for-linux-are-coming>`_). Executing
``export OPENBLAS_NUM_THREADS=1`` should disable such behaviours and make numpy
only use a single core at a time.
*auto-sklearn* employs `threadpoolctl <https://github.com/joblib/threadpoolctl/>`_ to control the number of threads employed by scientific libraries like numpy or scikit-learn. This is done exclusively during the building procedure of models, not during inference. In particular, *auto-sklearn* allows each pipeline to use at most 1 thread during training. At predicting and scoring time this limitation is not enforced by *auto-sklearn*. You can control the number of resources
employed by the pipelines by setting the following variables in your environment, prior to running *auto-sklearn*:

.. code-block:: shell-session

$ export OPENBLAS_NUM_THREADS=1
$ export MKL_NUM_THREADS=1
$ export OMP_NUM_THREADS=1


For further information about how scikit-learn handles multiprocessing, please check the `Parallelism, resource management, and configuration <https://scikit-learn.org/stable/computing/parallelism.html>`_ documentation from the library.

Model persistence
=================
Expand Down
1 change: 1 addition & 0 deletions requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ distributed>=2.2.0
pyyaml
pandas>=1.0
liac-arff
threadpoolctl

ConfigSpace>=0.4.14,<0.5
pynisher>=0.6.3
Expand Down