Skip to content
ThunderGBM: Fast GBDTs and Random Forests on GPUs
Branch: master
Clone or download
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
cub @ 3988246 fix error when there are empty features Jan 20, 2019
dataset update docs and cmakelists Feb 3, 2019
docs add windows prerequisites Mar 12, 2019
include/thundergbm add pointer for model Mar 16, 2019
python use in memory model for scikit Mar 17, 2019
src use in memory model for scikit Mar 17, 2019
.gitignore
.gitmodules Update .gitmodules Nov 16, 2018
CMakeLists.txt
LICENSE Create LICENSE Jan 7, 2019
README.md
thundergbm-full.pdf minor fix Feb 2, 2019

README.md

Documentation Status GitHub license GitHub issues

Documentations | Installation | Parameters | Python (scikit-learn) interface

Overview

The mission of ThunderGBM is to help users easily and efficiently apply GBDTs and Random Forests to solve problems. ThunderGBM exploits GPUs to achieve high efficiency. Key features of ThunderGBM are as follows.

  • Often by 10x times over other libraries.
  • Support Python (scikit-learn) interfaces.
  • Supported Operating System(s): Linux and Windows.
  • Support classification, regression and ranking.

Why accelerate GBDT and Random Forests: A survey conducted by Kaggle in 2017 shows that 50%, 46% and 24% of the data mining and machine learning practitioners are users of Decision Trees, Random Forests and GBMs, respectively.

GBDTs and Random Forests are often used for creating state-of-the-art data science solutions. We've listed three winning solutions using GBDTs below. Please check out the XGBoost website for more winning solutions and use cases. Here are some example successes of GDBTs and Random Forests:

Getting Started

Prerequisites

  • cmake 2.8 or above | C++ boost
    • gcc 4.8 or above for Linux | CUDA 8 or above
    • Visual C++ for Windows | CUDA 10

Download

git clone https://github.com/zeyiwen/thundergbm.git
cd thundergbm
#under the directory of thundergbm
git submodule init cub && git submodule update

Build on Linux (build instructions for Windows)

#under the directory of thundergbm
mkdir build && cd build && cmake .. && make -j

Quick Start

./bin/thundergbm-train ../dataset/machine.conf
./bin/thundergbm-predict ../dataset/machine.conf

You will see RMSE = 0.489562 after successful running.

How to cite ThunderGBM

If you use ThunderGBM in your paper, please cite our work (preprint).

@article{wenthundergbm19,
 author = {Wen, Zeyi and Shi, Jiashuai and He, Bingsheng and Li, Qinbin and Chen, Jian},
 title = {{ThunderGBM}: Fast {GBDTs} and Random Forests on {GPUs}},
 journal = {To appear in arXiv},
 year = {2019}
}

Other related paper

  • Zeyi Wen, Bingsheng He, Kotagiri Ramamohanarao, Shengliang Lu, and Jiashuai Shi. Efficient Gradient Boosted Decision Tree Training on GPUs. The 32nd IEEE International Parallel and Distributed Processing Symposium (IPDPS), pages 234-243, 2018. pdf

Key members of ThunderGBM

  • Zeyi Wen, NUS
  • Jiashuai Shi, SCUT (a visiting student at NUS)
  • Qinbin Li, NUS
  • Advisor: Bingsheng He, NUS
  • Collaborators: Jian Chen (SCUT), Kotagiri Ramamohanarao (The University of Melbourne)

Other information

  • This work is supported by a MoE AcRF Tier 2 grant (MOE2017-T2-1-122) and an NUS startup grant in Singapore.

Related libraries

You can’t perform that action at this time.