# **Gradient Boost**

Gradient Boost is an ensemble learning technique that builds models sequentially, where each new model attempts to correct the errors made by the previous ones. It combines weak learners (often decision trees) to create a strong predictive model.

It is a powerful technique that combines the strengths of multiple weak learners to create a robust predictive model. It is widely used in various domains and has become a standard tool in the machine learning toolkit

It is particularly effective for both regression and classification tasks, and it can handle various types of data, including categorical and numerical features.

Gradient Boosting works by optimizing a loss function through gradient descent, where each new model is trained on the residuals (errors) of the previous models. 
This iterative process continues until a specified number of models are built or until the model performance stops improving.

It is widely used in machine learning competitions and real-world applications due to its high accuracy and flexibility. 
However, it can be prone to overfitting if not properly regularized, and it may require careful tuning of hyperparameters to achieve optimal performance.

Gradient Boosting is implemented in various libraries, including Scikit-Learn, XGBoost, LightGBM, and CatBoost, each with its own optimizations and features.

Gradient Boosting is particularly effective for structured data and is often used in applications such as fraud detection, customer churn prediction, and ranking tasks.

It is important to note that while Gradient Boosting can achieve high accuracy, it may require more computational resources and time to train compared to simpler models. Therefore, it is essential to balance model complexity with training time and resource availability.

In [1]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn import datasets

In [2]:
# as_frame=True loads the dataset as a pandas DataFrame
# This is useful for easier data manipulation and visualization.
wine = datasets.load_wine(as_frame=True)
display(wine)

{'data':      alcohol  malic_acid   ash  alcalinity_of_ash  magnesium  total_phenols  \
 0      14.23        1.71  2.43               15.6      127.0           2.80   
 1      13.20        1.78  2.14               11.2      100.0           2.65   
 2      13.16        2.36  2.67               18.6      101.0           2.80   
 3      14.37        1.95  2.50               16.8      113.0           3.85   
 4      13.24        2.59  2.87               21.0      118.0           2.80   
 ..       ...         ...   ...                ...        ...            ...   
 173    13.71        5.65  2.45               20.5       95.0           1.68   
 174    13.40        3.91  2.48               23.0      102.0           1.80   
 175    13.27        4.28  2.26               20.0      120.0           1.59   
 176    13.17        2.59  2.37               20.0      120.0           1.65   
 177    14.13        4.10  2.74               24.5       96.0           2.05   
 
      flavanoids  nonflavanoid

In [3]:
wine.feature_names

['alcohol',
 'malic_acid',
 'ash',
 'alcalinity_of_ash',
 'magnesium',
 'total_phenols',
 'flavanoids',
 'nonflavanoid_phenols',
 'proanthocyanins',
 'color_intensity',
 'hue',
 'od280/od315_of_diluted_wines',
 'proline']

In [4]:
X = wine['data']
Y = wine['target']

In [None]:
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.2, random_state=42)

In [6]:
from sklearn.ensemble import GradientBoostingClassifier, GradientBoostingRegressor
# For classification tasks, use GradientBoostingClassifier
# For regression tasks, use GradientBoostingRegressor
from sklearn.model_selection import cross_val_score
from sklearn.metrics import mean_squared_error, accuracy_score, r2_score

In [None]:
# n_estimators is the number of boosting stages to be run - more stages can lead to better performance
# learning_rate shrinks the contribution of each tree - more trees can be added to improve performance
# max_depth limits the depth of the individual regression estimators- this helps prevent overfitting
gbc = GradientBoostingClassifier(n_estimators=1000, learning_rate=0.1, max_depth=15, random_state=42)
gbc.fit(X_train, Y_train)

0,1,2
,loss,'log_loss'
,learning_rate,0.1
,n_estimators,1000
,subsample,1.0
,criterion,'friedman_mse'
,min_samples_split,2
,min_samples_leaf,1
,min_weight_fraction_leaf,0.0
,max_depth,15
,min_impurity_decrease,0.0


In [8]:
y_pred = gbc.predict(X_test)
display(y_pred)

array([0, 0, 2, 0, 1, 0, 1, 2, 1, 0, 0, 2, 0, 1, 0, 1, 1, 1, 0, 1, 0, 1,
       1, 2, 2, 2, 1, 0, 1, 0, 0, 1, 2, 0, 0, 0])

In [9]:
print("Mean Squared Error:", mean_squared_error(Y_test, y_pred))
print("R^2 Score:", r2_score(Y_test, y_pred))
print("Cross-Validation Mean Score:", cross_val_score(gbc, X, Y, cv=5, n_jobs=4).mean())
print("Accuracy Score:", accuracy_score(Y_test, y_pred.round()))
print("Feature Importances:", gbc.feature_importances_)

Mean Squared Error: 0.1388888888888889
R^2 Score: 0.7619047619047619
Cross-Validation Mean Score: 0.9219047619047618
Accuracy Score: 0.9444444444444444
Feature Importances: [1.05873521e-02 6.63360207e-03 1.65354337e-02 2.05675386e-03
 1.26650579e-02 1.72606898e-17 8.49016812e-02 7.27258720e-18
 8.02326142e-04 3.04932452e-01 9.08186088e-03 2.51322869e-01
 3.00480611e-01]


In [10]:
gbc2 = GradientBoostingClassifier()
gbc2.fit(X_train, Y_train)

0,1,2
,loss,'log_loss'
,learning_rate,0.1
,n_estimators,100
,subsample,1.0
,criterion,'friedman_mse'
,min_samples_split,2
,min_samples_leaf,1
,min_weight_fraction_leaf,0.0
,max_depth,3
,min_impurity_decrease,0.0


In [11]:
y_pred_2 = gbc2.predict(X_test)
display(y_pred_2)

array([0, 0, 2, 0, 1, 0, 1, 2, 1, 2, 0, 1, 0, 1, 0, 1, 1, 1, 0, 1, 0, 1,
       1, 2, 2, 2, 1, 0, 1, 0, 0, 1, 2, 0, 0, 0])

In [12]:
print("Mean Squared Error:", mean_squared_error(Y_test, y_pred_2))
print("R^2 Score:", r2_score(Y_test, y_pred_2))
print("Cross-Validation Scores:", cross_val_score(gbc2, X, Y, cv=8, n_jobs=4))
print("Accuracy Score:", accuracy_score(Y_test, y_pred_2.round()))
print("Feature Importances:", gbc2.feature_importances_)

Mean Squared Error: 0.05555555555555555
R^2 Score: 0.9047619047619048
Cross-Validation Scores: [0.73913043 0.82608696 0.95454545 0.86363636 1.         0.95454545
 1.         0.95454545]
Accuracy Score: 0.9444444444444444
Feature Importances: [8.74104707e-03 2.43820320e-03 1.70349757e-02 1.87436133e-03
 1.50375648e-02 1.64268526e-06 1.08466225e-01 5.60748095e-06
 8.38260279e-05 3.05126370e-01 2.55976952e-03 2.36406700e-01
 3.02223707e-01]


In [13]:
param_grid = {
    'n_estimators': [50, 100, 150, 200],
    'learning_rate': [0.01, 0.05, 0.1, 0.2],
    'max_depth': [3, 5, 7, 9]
}

**Grid Search CV**

Grid Search CV is a technique to find the best hyperparameters for a model by exhaustively searching through a specified parameter grid.

It evaluates all combinations of the parameters and selects the one that gives the best performance based on cross-validation.

This is useful for optimizing model performance and ensuring that the model generalizes well to unseen data.

It helps in tuning the model to achieve the best possible accuracy or other performance metrics.

It is particularly useful in complex models like Gradient Boosting, where multiple hyperparameters can significantly affect the model's performance.

Grid Search CV can be computationally expensive, especially with large datasets and complex models, but it is a powerful tool for model optimization.

In [14]:
from sklearn.model_selection import GridSearchCV

# Create a GridSearchCV object
# This will search for the best hyperparameters using cross-validation
# The estimator is the model to be tuned, in this case, gbc1
# The param_grid is the dictionary of hyperparameters to search over
# The cv parameter specifies the number of folds in cross-validation
# n_jobs=-1 allows the search to use all available CPU cores
# The verbose parameter controls the verbosity of the output - higher values give more detailed output
# verbose=1 will print messages about the progress of the search
# verobsity is the level of detail in the output
grid_search = GridSearchCV(estimator=gbc, param_grid=param_grid, cv=5, n_jobs=6, verbose=1)

In [15]:
grid_search.fit(X_train, Y_train)

Fitting 5 folds for each of 64 candidates, totalling 320 fits


0,1,2
,estimator,GradientBoost...ndom_state=42)
,param_grid,"{'learning_rate': [0.01, 0.05, ...], 'max_depth': [3, 5, ...], 'n_estimators': [50, 100, ...]}"
,scoring,
,n_jobs,6
,refit,True
,cv,5
,verbose,1
,pre_dispatch,'2*n_jobs'
,error_score,
,return_train_score,False

0,1,2
,loss,'log_loss'
,learning_rate,0.2
,n_estimators,100
,subsample,1.0
,criterion,'friedman_mse'
,min_samples_split,2
,min_samples_leaf,1
,min_weight_fraction_leaf,0.0
,max_depth,3
,min_impurity_decrease,0.0


In [16]:
grid_search.best_params_

{'learning_rate': 0.2, 'max_depth': 3, 'n_estimators': 100}

In [17]:
y_pred_3 = grid_search.predict(X_test)

In [19]:
print("Mean Squared Error:", mean_squared_error(Y_test, y_pred_3))
print("R^2 Score:", r2_score(Y_test, y_pred_3))
print("Cross-Validation Scores:", cross_val_score(grid_search, X, Y, cv=8, n_jobs=4))
print("Accuracy Score:", accuracy_score(Y_test, y_pred_3.round()))

Mean Squared Error: 0.05555555555555555
R^2 Score: 0.9047619047619048
Cross-Validation Scores: [0.91304348 0.82608696 0.95454545 0.90909091 0.95454545 0.95454545
 1.         0.95454545]
Accuracy Score: 0.9444444444444444
