# Configuration advice from R

The gradient boosting algorithm is implemented in R as the gbm package. Reviewing the package documentation2, the gbm() function specifies sensible defaults:

- n.trees = 100 (number of trees).
- interaction.depth = 1 (number of leaves).
- n.minobsinnode = 10 (minimum number of samples in tree terminal nodes). 
- shrinkage = 0.001 (learning rate).

# Configuration advice from SKLearn

The Python library provides an implementation of gradient boosting for classification called the GradientBoostingClassifier class and regression called the GradientBoostingRegressor class. It is useful to review the default configuration for the algorithm in this library. There are many parameters, but below are a few key defaults:

- Learning rate = 0.1 (shrinkage).
- n estimators = 100 (number of trees). 
- max depth = 3.
- min samples split = 2.
- min samples leaf = 1.
- subsample = 1.0.

# Configuration advice from XGBoost

The XGBoost library is dedicated to the gradient boosting algorithm. It too specifies default parameters that are interesting to note, firstly the XGBoost Parameters page:

- eta = 0.3 (a.k.a learning rate).
- max depth = 6.
- subsample = 1.
- max depth = 3.
- learning rate = 0.1.
- n estimators = 100.
- subsample = 1.

## Owen Zhang's configuration tips:

- Target 500-to-1000 trees and then tune the learning rate (n estimators).
- Set the number of samples in the leaf nodes to enough observations needed to make a good mean estimate (min child weight).
- Configure the interaction depth to about 10 or more (max depth).
- Number of Trees (n estimators) set to a fixed value between 100 and 1000, depending on the dataset size.
- Learning Rate (learningrate) simplified to the ratio: [2 to 10], depending on the trees number of trees.
- Row Sampling (subsample) grid searched values in the range [0.5, 0.75, 1.0].
- Column Sampling (colsample bytree and maybe colsample bylevel) grid searched values in the range [0.4, 0.6, 0.8, 1.0].
- Min Leaf Weight (min child weight) simplified to the ratio 3 , where rare events is the percentage of rare event observations in the dataset.
- Tree Size (max depth) grid searched values in the rage [4, 6, 8, 10].
- Min Split Gain (gamma) fixed with a value of zero.

### Differences that may be relevant

- Number of Trees and Learning Rate: Fix the number of trees at around 100 (rather than 1000) and then tune the learning rate.
- Max Tree Depth: Start with a value of 6 and presumably tune from there.
- Min Leaf Weight: Use a modified ratio of 1 , where rare events is the sqrt(rare events) percentage of rare event observations in the dataset.
- Column Sampling: Grid search values in the range of 0.3 to 0.5 (more constrained). Row Sampling: Fixed at the value 1.0.
- Min Split Gain: Fixed at the value 0.0.

## Abhishek Thakor's tips:
The tuning ranges for each parameter are much the same with some notable di↵erences. Specifically, he suggests grid searching values for the Min Split Gain (gamma) and the regular- ization penalty terms (reg alpha and reg lambda). He also explore large values for tree size (max depth) values above 10 as well as fixed Min Leaf Weight (min child weight) values in the range of about 1 to 10.