Skip to content
Branch: master
Find file History
RAMitchell Add native support for Dask (#4473)
* Add native support for Dask

* Add multi-GPU demo

* Add sklearn example
Latest commit 09b90d9 May 27, 2019
Type Name Latest commit message Commit time
Failed to load latest commit information. Add native support for Dask (#4473) May 27, 2019 Add native support for Dask (#4473) May 27, 2019 Add native support for Dask (#4473) May 27, 2019

Dask Integration

Dask is a parallel computing library built on Python. Dask allows easy management of distributed workers and excels handling large distributed data science workflows.

The simple demo shows how to train and make predictions for an xgboost model on a distributed dask environment. We make use of first-class support in xgboost for launching dask workers. Workers launched in this manner are automatically connected via xgboosts underlying communication framework, Rabit. The calls to xgb.train() and xgb.predict() occur in parallel on each worker and are synchronized.

The GPU demo shows how to configure and use GPUs on the local machine for training on a large dataset.


Dask is trivial to install using either pip or conda. See here for official install documentation.

The GPU demo requires GPUtil for detecting system GPUs.

Install via pip install gputil

Running the scripts

You can’t perform that action at this time.