In [None]:
%matplotlib inline


# Early stopping of Gradient Boosting

Gradient boosting is an ensembling technique where several weak learners
(regression trees) are combined to yield a powerful single model, in an
iterative fashion.

Early stopping support in Gradient Boosting enables us to find the least number
of iterations which is sufficient to build a model that generalizes well to
unseen data.

The concept of early stopping is simple. We specify a ``validation_fraction``
which denotes the fraction of the whole dataset that will be kept aside from
training to assess the validation loss of the model. The gradient boosting
model is trained using the training set and evaluated using the validation set.
When each additional stage of regression tree is added, the validation set is
used to score the model.  This is continued until the scores of the model in
the last ``n_iter_no_change`` stages do not improve by atleast `tol`. After
that the model is considered to have converged and further addition of stages
is "stopped early".

The number of stages of the final model is available at the attribute
``n_estimators_``.

This example illustrates how the early stopping can used in the
:class:`~sklearn.ensemble.GradientBoostingClassifier` model to achieve
almost the same accuracy as compared to a model built without early stopping
using many fewer estimators. This can significantly reduce training time,
memory usage and prediction latency.


In [1]:

import time

import numpy as np
import matplotlib.pyplot as plt

from sklearn.ensemble import GradientBoostingClassifier
from sklearn.datasets import load_digits
from sklearn.model_selection import train_test_split




- Load digits data set using `load_digits()`
- Train `GradientBoostingCLassifier` with and without early stopping
- Keep a timer for both cases and report the time it takes to train both models
- Report the scores for both models

In [None]:
from sklearn.metrics import accuracy_score, roc_auc_score

# Load digits dataset
digits = load_digits()
X = digits.data
y = digits.target

# Convert to a binary classification problem for ROC AUC score
y = (y == 1).astype(int)  # Considering class 1 as positive class

# Split the data into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)


In [3]:
# Initialize the classifier without early stopping
gb_clf_no_early_stopping = GradientBoostingClassifier(n_estimators=1000, random_state=42)

# Measure the time taken to train the model without early stopping
start_time = time.time()
gb_clf_no_early_stopping.fit(X_train, y_train)
time_no_early_stopping = time.time() - start_time

# Predict and calculate the score
y_proba_no_early_stopping = gb_clf_no_early_stopping.predict_proba(X_test)[:, 1]
roc_auc_no_early_stopping = roc_auc_score(y_test, y_proba_no_early_stopping)
accuracy_no_early_stopping = accuracy_score(y_test, gb_clf_no_early_stopping.predict(X_test))


In [4]:
# Initialize the classifier with early stopping
gb_clf_early_stopping = GradientBoostingClassifier(
    n_estimators=1000,
    validation_fraction=0.1,
    n_iter_no_change=10,
    tol=1e-4,
    random_state=42
)

# Measure the time taken to train the model with early stopping
start_time = time.time()
gb_clf_early_stopping.fit(X_train, y_train)
time_early_stopping = time.time() - start_time

# Predict and calculate the score
y_proba_early_stopping = gb_clf_early_stopping.predict_proba(X_test)[:, 1]
roc_auc_early_stopping = roc_auc_score(y_test, y_proba_early_stopping)
accuracy_early_stopping = accuracy_score(y_test, gb_clf_early_stopping.predict(X_test))

# Number of stages where early stopping occurred
n_estimators_used = gb_clf_early_stopping.n_estimators_


In [5]:
print(f'Training time without early stopping: {time_no_early_stopping:.2f} seconds')
print(f'ROC AUC score without early stopping: {roc_auc_no_early_stopping:.2f}')
print(f'Accuracy without early stopping: {accuracy_no_early_stopping:.2f}')

print(f'Training time with early stopping: {time_early_stopping:.2f} seconds')
print(f'ROC AUC score with early stopping: {roc_auc_early_stopping:.2f}')
print(f'Accuracy with early stopping: {accuracy_early_stopping:.2f}')
print(f'Number of estimators used with early stopping: {n_estimators_used}')


Training time without early stopping: 4.38 seconds
ROC AUC score without early stopping: 1.00
Accuracy without early stopping: 0.99
Training time with early stopping: 0.63 seconds
ROC AUC score with early stopping: 1.00
Accuracy with early stopping: 0.99
Number of estimators used with early stopping: 125
