# Gradient boosting | Gradient Boosting Machine | GBM

**Gradient boosting:** es un algoritmo de conjunto que se ajusta a los árboles de decisión potenciados al minimizar un gradiente de error. Hay muchas implementaciones del algoritmo de aumento de gradiente disponibles en Python.
Probaremos un clasificador con todos ellos y compararemos la velocidad y la precisión de cada uno.

In [4]:
# importamos librerías necesarias

# Basicas
import warnings
import numpy as np
import pandas as pd
from time import time
from IPython.core.debugger import set_trace
from sklearn.datasets import make_classification
warnings.filterwarnings('ignore')

# Genero un dataset de ejemplo
X, y = make_classification(
    n_samples=100000, 
    n_features=20, 
    n_informative=15, 
    n_redundant=5, 
    random_state=0)

print(f'X shape = {X.shape}')
print(f'y shape = {y.shape}')

# Genero diccionarios vación para guardar posteriores datos
speed = {}
accuracy = {}

X shape = (100000, 20)
y shape = (100000,)


### 1) Implementación de Gradient Boosting con Scikit-Learn

In [5]:
# Scikit-Learn
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import RepeatedStratifiedKFold # Repeated Stratified K-Fold is a cross validator

In [6]:
model = GradientBoostingClassifier()

start = time()

cv = RepeatedStratifiedKFold(n_splits=5, n_repeats=2, random_state=0)
score = cross_val_score(model, X, y, scoring="accuracy", cv=cv, n_jobs=-1)

speed["GradientBoosting"] = np.round(time() - start, 3)
accuracy["GradientBoosting"] = np.mean(score).round(3)

print(f"Mean Accuracy: {accuracy['GradientBoosting']}")
print(f"Std: {np.std(score):.3f}s")
print(f"Run time: {speed['GradientBoosting']}s")

Mean Accuracy: 0.894
Std: 0.003
Run time: 393.908s


##### Alternative

In [11]:
from sklearn.experimental import enable_hist_gradient_boosting
from sklearn.ensemble import HistGradientBoostingClassifier

In [12]:
model = HistGradientBoostingClassifier()

start = time()

cv = RepeatedStratifiedKFold(n_splits=5, n_repeats=2, random_state=0)
score = cross_val_score(model, X, y, scoring="accuracy", cv=cv, n_jobs=-1)

speed["HistGradientBoosting"] = np.round(time() - start, 3)
accuracy["HistGradientBoosting"] = np.mean(score).round(3)

print(f"Mean Accuracy: {accuracy['HistGradientBoosting']}")
print(f"Std: {np.std(score):.3f}s")
print(f"Run time: {speed['HistGradientBoosting']}s")

Mean Accuracy: 0.962
Std: 0.001s
Run time: 29.388s


### 2) Implementación de Gradient Boosting con XGBoost

In [17]:
#!pip install xgboost
from xgboost import XGBClassifier

In [18]:
model = XGBClassifier()

start = time()

cv = RepeatedStratifiedKFold(n_splits=5, n_repeats=2, random_state=0)
score = cross_val_score(model, X, y, scoring="accuracy", cv=cv, n_jobs=-1)

speed["XGB"] = np.round(time() - start, 3)
accuracy["XGB"] = np.mean(score).round(3)

print(f"Mean Accuracy: {accuracy['XGB']}")
print(f"Std: {np.std(score):.3f}s")
print(f"Run time: {speed['XGB']}s")

Mean Accuracy: 0.976
Std: 0.001
Run time: 273.073s


### 3) Implementación de Gradient Boosting con LightGBM

In [19]:
#!pip install lightgbm
from lightgbm import LGBMClassifier

In [20]:
model = LGBMClassifier()

start = time()

cv = RepeatedStratifiedKFold(n_splits=5, n_repeats=2, random_state=0)
score = cross_val_score(model, X, y, scoring="accuracy", cv=cv, n_jobs=-1)

speed["LGBM"] = np.round(time() - start, 3)
accuracy["LGBM"] = np.mean(score).round(3)

print(f"Mean Accuracy: {accuracy['LGBM']}")
print(f"Std: {np.std(score):.3f}s")
print(f"Run time: {speed['LGBM']}s")

Mean Accuracy: 0.963
Std: 0.001s
Run time: 31.802s


### 4) Implementación de Gradient Boosting con Catboost

In [23]:
#!pip install catboost
from catboost import CatBoostClassifier

In [24]:
model = CatBoostClassifier()

start = time()

cv = RepeatedStratifiedKFold(n_splits=5, n_repeats=2, random_state=0)
score = cross_val_score(model, X, y, scoring="accuracy", cv=cv, n_jobs=-1)

speed["CatBoost"] = np.round(time() - start, 3)
accuracy["CatBoost"] = np.mean(score).round(3)

print(f"Mean Accuracy: {accuracy['CatBoost']}")
print(f"Std: {np.std(score):.3f}s")
print(f"Run time: {speed['CatBoost']}s")

Mean Accuracy: 0.983
Std: 0.001s
Run time: 285.221s


### Comparación final de los resultados de todos los algoritmos probados

In [25]:
print("Accuracy:")
{k: v for k, v in sorted(accuracy.items(), key=lambda i: i[1], reverse=True)}

Accuracy:


{'CatBoost': 0.983,
 'XGB': 0.976,
 'LGBM': 0.963,
 'HistGradientBoosting': 0.962,
 'GradientBoosting': 0.894}

In [26]:
print("Speed:")
{k: v for k, v in sorted(speed.items(), key=lambda i: i[1], reverse=False)}

Speed:


{'HistGradientBoosting': 29.388,
 'LGBM': 31.802,
 'XGB': 273.073,
 'CatBoost': 285.221,
 'GradientBoosting': 393.908}