# Tune Learning Rate and Number of Trees
A problem with gradient boosted decision trees is that they are quick to learn and overfit training data. One effective way to slow down learning in the gradient boosting model is to use a learning rate, also called shrinkage (or eta in XGBoost documentation)

### Slow Learning in Gradient Boosting 

Gradient boosting involves creating and adding trees to the model sequentially. New trees are created to correct the residual errors in the predictions from the existing sequence of trees. The e↵ect is that the model can quickly fit, then overfit the training dataset. A technique to slow down the learning in the gradient boosting model is to apply a weighting factor for the corrections by new trees when added to the model. This weighting is called the shrinkage factor or the learning rate, depending on the literature or the tool.

Naive gradient boosting is the same as gradient boosting with shrinkage where the shrinkage factor is set to 1.0. Setting values less than 1.0 has the e↵ect of making less corrections for each tree added to the model. This in turn results in more trees that must be added to the model. It is common to have small values in the range of 0.1 to 0.3, as well as values less than 0.1.

### Tuning Learning Rate

When creating gradient boosting models with XGBoost using the scikit-learn wrapper, the
learning rate parameter can be set to control the weighting of new trees added to the model. We can use the grid search capability in scikit-learn to evaluate the e↵ect on logarithmic loss of training a gradient boosting model with di↵erent learning rate values.

In [None]:
#Load Libraries
from pandas import read_csv
from xgboost import XGBClassifer
from sklearn.model_selection import StratifiedKFold
from sklearn.model_selection import GridSearchCV
from sklearn.preprocessing import LabelEncoder
import matplotlib
matplotlib.use('Agg')
from matplotlib import pyplot as pt

#Load Data and split X/Y; encode Target 
data = read_csv('train.csv')
X = data.values[:,0:94]
Y = data.values[:,94]
encoded_y = LabelEncoder().fit_transform(Y)

#Grid Search
model = XGBClassifer(nthread=-1)
kfold = StratifiedKFold(n_splits = 10, shuffle=True, random_state=7)
learning_rate= [0.0001, 0.001, 0.01, 0.1, 0.2, 0.3]
grid_params = dict(learning_rate = learning_rate)
grid_search = GridSearchCV(model, grid_params, scoring = 'neg_log_loss', cv=kfold)
grid_results = grid_search.fit(X, encoded_y)

#Sumarrize Results
print("Best %f using %s" %(grid_results.best_score_, grid_results.best_params))
means = grid_results.cv_results_['mean_test_score']
stds = grid_results.cv_results_['std_test_score']
params = grid_results.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
    print("%f (%f) using %r" % (mean, stdev, param))

In [None]:
# Plot
pt.errorbar(learning_rate, means, yerr=stds)
pt.title("XGBoost learning_rate vs Log Loss")
pt.xlabel("learning_rate")
pt.ylabel("Log Loss")
pt.savefig("learning_rate.png")

#

### Tuning Learning Rate and Number of Trees
Smaller learning rates generally require more trees to be added to the model. We can explore this relationship by evaluating a grid of parameter pairs. The number of decision trees will be varied from 100 to 500 and the learning rate varied on a log10 scale from 0.0001 to 0.1.

In [2]:
#Load Libraries
from pandas import read_csv
from xgboost import XGBClassifier
from sklearn.model_selection import StratifiedKFold
from sklearn.model_selection import GridSearchCV
from sklearn.model_preprocessing import LabelEncoder
import matplotlib
matplotlib.use('Agg')
from matplotlib import pyplot as pt
import numpy

#Load Data
data = read_csv('train.csv')
X = data.values[:, 0:94]
Y = data.values[:,94]
encoded_y = LabelEncoder().fit_transfor(Y)

#Grid Search
model = XGBClassifier(nthread=-1)
kfold = StratifiedKFold(n_split = 10, shuffle = True, random_state = 7)
n_estimators = [50,150,200,250]
learning_rate = [0.0001, 0.001, 0.01, 0.1]
param_grid = dict(n_estimators = n_estimators, learning_rate = learning_rate)
grid_search = GridSearch(model, param_grid = param_grid, scoring = 'neg_log_loss', n_jobs=-1, cv=kfold)
grid_result = grid_search(X, encoded_y)

#Summarize Results
print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
    print("%f (%f) using %r" % (mean, stdev, param))

ModuleNotFoundError: No module named 'xgbost'

In [None]:
#Plot Results
scores = numpy.array(means).reshape(len(learning_rate), len(n_estimators))
for i, value in enumerate(learning_curve):
    pt.plot(n_estimators, scores[i], label = 'learning' + str(value))
pt.legend()
pt.xlabel('n_estimators')
pt.ylabel('learning_rate')
py.savefig('n_estimators_vs_learning_rate.png')