# Tuning Number/Size of Decision Trees
Gradient boosting involves the creation and addition of decision trees sequentially, each at- tempting to correct the mistakes of the learners that came before it. This raises the question as to how many trees (weak learners or estimators) to configure in your gradient boosting model and how big each tree should be.

## Tuning Number of Trees

Most implementations of gradient boosting are configured by default with a relatively small number of trees, such as hundreds or thousands. The general reason is that on most problems, adding more trees beyond a limit does not improve the performance of the model. The reason is in the way that the boosted tree model is constructed, sequentially where each new tree attempts to model and correct for the errors made by the sequence of previous trees. Quickly, the model reaches a point of diminishing returns.

The number of trees (or rounds) in an XGBoost model is specified to the XGBClassifier or XGBRegressor class in the n estimators argument. The default in the XGBoost library is 100.

### Tuning N-estimators with Otto dataset

In [2]:
#Load Library
from pandas import read_csv
from xgboost import XGBClassifier
from sklearn.model_selection import GridSearchCV
from sklearn.model_selection import StratifiedKFold
from sklearn.preprocessing import LabelEncoder
import matplotlib
matplotlib.use('Agg')
from matplotlib import pyplot as pt

In [3]:
#Load Data and Split X/Y; encode target class
data = read_csv('train.csv')
X = data.values[:,0:94]
Y = data.values[:, 94]
encoded_y = LabelEncoder().fit_transform(Y)

In [None]:
# Grid Search
model = XGBClassifier()
n_estimators = range(50, 400, 50)
param_grid = dict(n_estimators=n_estimators)
kfold = StratifiedKFold(n_splits = 10, shuffle=True, random_state = 7)
grid_search = GridSearchCV(model, param_grid, scoring = "neg_log_loss", n_jobs = -1, cv = kfold)
grid_results = grid_search.fit(X, encoded_y)

#Summarize Results
print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))
means = grid_results.cv_results_['mean_test_score']
stds = grid_results.cv_results_['std_test_score']
params = grid_results.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
    print("%f (%f) with: r%" % (mean, stdev, params))

In [5]:
#Plot
pt.errorbar(n_estimators, means, yerr=stds)
pt.title("XGBoost n_estimats vs. Log Loss")
pt.xlabel('n_estimators')
pt.ylabel('Log Loss')
pt.savefig('n_estimators.png')

range(50, 400, 50)

## Tuning Size of Trees

In gradient boosting, we can control the size of decision trees, also called the number of layers or the depth. Shallow trees are expected to have poor performance because they capture few details of the problem and are generally referred to as weak learners. Deeper trees generally capture too many details of the problem and overfit the training dataset, limiting the ability to make good predictions on new data. Generally, boosting algorithms are configured with weak learners, decision trees with few layers, sometimes as simple as just a root node, also called a decision stump rather than a decision tree.

In [None]:
# Load Libraries
from pandas import read_csv
from xgboost import XGBClassifier
from sklearn.model_selection import GridSearchCV
from sklearn.model_selection import StratifiedKFold
from sklearn.preprocessing import LabelEncoder
import matplotlib
matplotlib.use('Agg')
from matplotlib import pyplot as pt

#Load Data and split X/Y; encode Target Class
data = read_csv('train.csv')
X = data.values[:,0:94]
Y = data.values[:, 94]
encoded_y = LabelEncoder().fit_transform(Y)

#Grid Search
model = XGBClassifier()
kfold = StratifiedKFold(n_splits = 10, shuffle=True, random_state=7)
max_depth = range(1,11,2)
param_grid = dict(max_depth=max_depth)
grid_search = GridSearchCV(model, param_grid, scoring='neg_log_loss', n_jobs=1, cv=kfold, verbose=1)
grid_results = grid_search.fit(X, encoded_y)

#Summarize Results
print("Best: %f using %s" %(grid_results.best_score_, grid_results.best_params_))
means = grid_results.cv_results_['mean_test_score']
stds = grid_results.cv_results_['std_test_score']
params = grid_results.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
    print("%f (%f) with %r" % (mean, stdev, param))

In [None]:
#Plot
pt.errorbar(max_depth, means, yerr=stds)
pt.title("XGBoost max_depth vs. Log Loss")
pyplot.xlabel('max_depth')
pyplot.xlabel('Log Loss')
pyplot.savgefig('max_depth.png')

## Tuning Number and Size 

In [None]:
#Load Libraries
from pandas import read_csv
from xgboost import XGBClassifier
from sklearn.model_selection import GridSearchCV
from sklearn.model_selection import StratifiedKFold
from sklearn.preprocessing import LabelEncoder
import matplotlib
matplotlib.use('Agg')
from matplotlib import pyplot as pt

#Load Data and Split X/Y; encode target Variable
data = read_csv('train.csv')
X = data.values[:,0:94]
Y = data.values[:, 94]
encoded_y = LabelEncoder().fit_transform(Y)

#Grid Search
model = XGBClassifier()
n_estimators = [50,100,150,200]
max_depth = [2,4,6,8]
param_grid = dict(max_depth = max_depth, n_estimators = n_estimators)
kfold = StratifiedKFold(n_splits = 10, shuffle = True, random_state = 7)

grid_search = GridSearchCV(model, param_grid, scoring='neg_log_loss', n_jobs = -1, cv=kfold, verbose = 1)
grid_result = grid_search.fit(X, encoded_y)

#Summarize Results
print("Best: %f (%f) with %r" %(grid_result.best_score_, grid_results.best_params_))
means = grid_results.cv_results_['mean_test_score']
stds = grid_results.cv_results_['std_test_score']
params = grid_results.cv_results['params']
for mean, stdev, param in zip(means, stds, params):
    print("%f (%f) with %r" %(mean, stdev, param))