Skip to content

Latest commit

 

History

History
169 lines (148 loc) · 6.63 KB

how-to.md

File metadata and controls

169 lines (148 loc) · 6.63 KB

ThunderGBM How To

This page is for key instructions of intalling, using and contributing to ThunderGBM. Everyone in the community can contribute to ThunderGBM to make it better.

How to install ThunderGBM

First of all, you need to install the prerequisite libraries and tools. Then you can download and install ThunderGBM.

Prerequisites

  • cmake 2.8 or above
    • gcc 4.8 or above for Linux | CUDA 8 or above
    • Visual C++ for Windows | CUDA 10

Download

git clone https://github.com/zeyiwen/thundergbm.git
cd thundergbm
#under the directory of thundergbm
git submodule init cub && git submodule update

Build on Linux

#under the directory of "thundergbm"
mkdir build && cd build && cmake .. && make -j

Quick Start

./bin/thundergbm-train ../dataset/machine.conf
./bin/thundergbm-predict ../dataset/machine.conf

You will see RMSE = 0.489562 after successful running.

Build on Windows

You can build the ThunderGBM library as follows:

cd thundergbm
mkdir build
cd build
cmake .. -DCMAKE_WINDOWS_EXPORT_ALL_SYMBOLS=TRUE -DBUILD_SHARED_LIBS=TRUE -G "Visual Studio 15 2017 Win64"

You need to change the Visual Studio version if you are using a different version of Visual Studio. Visual Studio can be downloaded from this link. The above commands generate some Visual Studio project files, open the Visual Studio project to build ThunderGBM. Please note that CMake should be 3.4 or above for Windows.

How to use ThunderGBM using command line

First of all, please refer to the above instruction for installing ThunderGBM. Then, you can run the demo by the following command.

./bin/thundergbm-train ../dataset/machine.conf
./bin/thundergbm-predict ../dataset/machine.conf

If you like to know more about the detailed options of running the binary, please use the -help option as follows.

./bin/thundergbm-train -help

In ThunderGBM, the command line options can be added in the machine.conf file under the dataset folder. All the options are listed in the Parameters page.

How to improve documentations

Most of the documents can be viewed on GitHub. The documents can also be viewed in Read the Doc. The HTML files of our documents are generated by Sphinx, and the source files of the documents are written using Markdown. In the following, we describe how to setup the Sphinx environment.

  • Install sphinx
pip install sphinx
  • Install Makedown Parser
pip install recommonmark

Note that recommonmark has a bug when working with Sphinx in some platforms, so you may need to hack into transform.py to fix the problem by yourself. You can find the instruction of hacking in this link.

  • Install Sphinx theme
pip install sphinx_rtd_theme
  • Generate HTML

    Go to the "docs" directory of ThunderGBM and run:

make html

At this point, make sure you have generated the documents of ThunderGBM. You can build the documents in your machine to see the outcome.

Contribute to ThunderGBM

You need to fetch the latest version of ThunderGBM before submitting a pull request.

git remote add upstream https://github.com/Xtra-Computing/thundergbm.git
git fetch upstream
git rebase upstream/master

How to build test for ThunderGBM

For building test cases, you also need to obtain googletest using the following command.

#under the thundergbm directory
git submodule update --init src/test/googletest

After obtaining the googletest submodule, you can build the test cases by the following commands.

cd thundergbm
mkdir build && cd build && cmake -DBUILD_TESTS=ON .. && make -j

How to use ThunderGBM for ranking

There are two key steps to use ThunderGBM for ranking.

  • First, you need to choose rank:pairwise or rank:ndcg to set the objective of ThunderGBM.
  • Second, you need to have a file called [train_file_name].group to specify the number of instances in each query.

The remaining part is the same as classification and regression. Please refer to Parameters for more information about setting the parameters.

How to build the Python wheel file for Linux

You have to ensure the repository is identical to the latest one.

  • Clone ThunderGBM repository
git clone https://github.com/zeyiwen/thundergbm.git
cd thundergbm
#under the directory of thundergbm
git submodule init cub && git submodule update
  • Build the binary
mkdir build && cd build && cmake .. && make -j
  • Build the python wheel file
    • change directory to python by cd ../python
    • update the version you are going to release in setup.py
    • you may need to install the wheel dependency by pip3 install wheel
python3 setup.py bdist_wheel

How to build the Python wheel file for Windows

You have to ensure the repository is identical to the latest one.

  • Requirements
    • Visual Studio
    • CUDA 10.0 or above
    • python3.x
  • Clone ThunderGBM repository
git clone https://github.com/zeyiwen/thundergbm.git
cd thundergbm
#under the directory of thundergbm
git submodule init && git submodule update
  • Cmake using Visual Studio Developer Command Prompt
mkdir build && cd build
cmake .. -DCMAKE_WINDOWS_EXPORT_ALL_SYMBOLS=TRUE -DBUILD_SHARED_LIBS=TRUE -G "Visual Studio 15 2017 Win64"

You may need to change the version of Visual Studio if you are using a different version of Visual Studio.

  • Build binary file using Visual Studio
    • Open the file in path 'thundergbm/build/thundergbm.sln' with Visual Studio
    • Click 'Build all' in Visual Studio
  • Build the python wheel file
    • change directory to python by cd ../python
    • update the version you are going to release in setup.py
    • you may need to install the wheel dependency by pip3 install wheel
python3 setup.py bdist_wheel
  • Upload the wheel file to Pypi.org
    • you may need to install the twine dependency by pip3 install twine
    • you need to use python3 -m twine upload dist/* if twine is not included in PATH
twine upload dist/* --verbose
  • [Recommended] Draw a new release on Release
    • state the bug fixed or new functions.