Skip to content

GB Classifier

Cheng Li edited this page Sep 29, 2017 · 8 revisions

Pyramid implements the gradient boosting algorithm with KL divergence/cross entropy loss for binary and multi-class classification tasks, as described in the paper:

Friedman, Jerome H. "Greedy function approximation: a gradient boosting machine." Annals of statistics (2001): 1189-1232.

In the literature, this algorithm has several names: gradient boosting (GB), gradient boosted decision trees (GBDT) or K-class logistic boost (LK Boost).

This tutorial provides a gentle introduction to gradient boosting.

For datasets where each instance has multiple matched classes/tags, gradient boosting can also be extended to perform multi-label classification through the CBM reduction method.

Usage

To run the gradient boosting classifier algorithm, please just type

./pyramid config/gb_classifier.properties

where the gb_classifier.properties file specifies all the algorithm parameters, as explained below.

Program Properties

The properties file (a plain text file with each line being a key value pair) specifies all input, output and hyper parameters required by the program. A sample properties file is shown below. The same file can also be found in the config folder associated with the code release. You can modify this file to set up the the correct dataset paths on your computer and experiment with different model parameters.

# path to input train data
input.trainData=/Users/chengli/Datasets/mnist/train

# path to input test data
input.testData=/Users/chengli/Datasets/mnist/test

# matrix format; can be dense or sparse
# use sparse format for text data with large number of sparse features to save memory
input.matrixType=dense

# folder for the program output
output.folder=/Users/chengli/tmp/out

# train the model on the training set
train=true

# test the model on the test set
test=true

# number of training iterations
train.numIterations=100

# number of leaves in each tree
train.numLeaves=10

# shrinkage parameter applied to each tree; 
# 0.1 is usually good
train.shrinkage=0.1

# whether to show training performance during training
train.showTrainProgress=true

# whether to show test performance during training
train.showTestProgress=true

# show the progress in every k iterations
train.showProgress.interval=1

# the internal Java class name for this application. 
# users do not need to modify this.
pyramid.class=GBClassifier

Sample Datasets

Some sample classification datasets can be downloaded here.

Source Code

The source code files related to gradient boosting classifier can be found here and here.

Clone this wiki locally