* GBM
* XGBoost
* LightGBM
* CatBoost

### Gradient boosting -
GB is a machine learning ensemble technique that combines the predictions of multiple weak learners, typically decision trees, sequentially.
* before feeding the observations to M2, what we do is update the weights of the observations which are wrongly classified
* (probability of selecting a wrongly classified data increases)
* regression trees are fit on the negative gradient of the loss function.
* Since trees are added sequentially, boosting algorithms learn slowly. In statistical learning, models that learn slowly perform better.



### Gradient boosting
* Gradient Boosting updates the weights by computing the negative gradient of the loss function with respect to the predicted output.
* Gradient Boosting can use a wide range of base learners, such as decision trees, and linear models.
* Gradient Boosting is generally more robust, as it updates the weights based on the gradients, which are less sensitive to outliers.

*    "GradientBoostingClassifier", - 
*    "GradientBoostingRegressor",
*    "AdaBoostClassifier", - New models with updated weights on initial dataset
*    "AdaBoostRegressor",
*    "HistGradientBoostingClassifier",
*    "HistGradientBoostingRegressor",

In [4]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split

In [26]:
from sklearn.datasets import load_digits
# 2 classes
X,y = load_digits(return_X_y=True)

In [27]:
from sklearn.model_selection import train_test_split 

X_train, X_test, y_train, y_test = train_test_split(X, y , test_size=0.2, random_state=42) 
X_train.shape, X_test.shape, y_train.shape, y_test.shape

((1437, 64), (360, 64), (1437,), (360,))

In [42]:
from sklearn.ensemble import GradientBoostingClassifier

gbc = GradientBoostingClassifier(loss="log_loss", # log_loss, deviance, exponential
                                learning_rate=0.05,
                                n_estimators=10,
                                subsample=1.0,
                                criterion="friedman_mse", # friedman_mse, squared_error
                                min_samples_split=2,
                                min_samples_leaf=1,
                                min_weight_fraction_leaf=0.0,
                                max_depth=3,
                                min_impurity_decrease=0.0,
                                init=None,
                                random_state=42,
                                max_features="log2", # log2, auto,sqrt
                                verbose=0,
                                max_leaf_nodes=None,
                                warm_start=True,
                                validation_fraction=0.1,
                                n_iter_no_change=None,
                                tol=1e-4,
                                ccp_alpha=0.0)

# Fit to training set
gbc.fit(X_train, y_train)
 
# Predict on test set
prediction = gbc.predict(X_test)
prediction

array([6, 9, 3, 7, 8, 1, 5, 2, 5, 2, 1, 4, 4, 0, 4, 2, 3, 7, 8, 8, 4, 3,
       9, 7, 5, 6, 3, 5, 6, 3, 4, 9, 1, 4, 4, 6, 9, 4, 7, 6, 6, 9, 1, 3,
       6, 1, 3, 0, 6, 5, 5, 1, 9, 5, 6, 0, 9, 0, 0, 1, 0, 4, 5, 2, 4, 5,
       7, 0, 7, 5, 9, 3, 5, 4, 7, 0, 4, 5, 9, 9, 9, 0, 2, 3, 8, 0, 6, 4,
       4, 9, 1, 2, 8, 3, 5, 2, 9, 1, 4, 4, 4, 3, 5, 3, 1, 8, 5, 9, 4, 2,
       7, 7, 4, 4, 1, 9, 2, 7, 8, 7, 2, 6, 9, 4, 0, 7, 2, 7, 5, 8, 7, 5,
       7, 9, 0, 6, 6, 4, 2, 8, 0, 9, 4, 6, 1, 9, 6, 9, 0, 1, 9, 6, 6, 0,
       6, 4, 2, 9, 3, 7, 7, 2, 9, 0, 0, 5, 8, 6, 5, 7, 9, 8, 4, 2, 1, 3,
       7, 7, 2, 2, 3, 9, 8, 0, 3, 8, 2, 5, 6, 9, 9, 4, 1, 5, 4, 2, 3, 6,
       4, 8, 5, 9, 5, 7, 8, 9, 4, 8, 1, 5, 4, 4, 9, 6, 1, 8, 6, 0, 4, 5,
       2, 7, 4, 6, 4, 5, 6, 0, 3, 2, 3, 6, 7, 1, 9, 1, 4, 7, 6, 5, 1, 5,
       5, 1, 0, 2, 8, 8, 7, 7, 7, 6, 2, 2, 2, 3, 4, 8, 8, 3, 6, 0, 3, 7,
       8, 0, 1, 0, 4, 5, 1, 5, 3, 6, 0, 4, 1, 0, 0, 3, 6, 5, 9, 7, 3, 5,
       9, 9, 9, 8, 5, 3, 3, 2, 0, 5, 8, 3, 4, 0, 2,

In [43]:
y_test

array([6, 9, 3, 7, 2, 1, 5, 2, 5, 2, 1, 9, 4, 0, 4, 2, 3, 7, 8, 8, 4, 3,
       9, 7, 5, 6, 3, 5, 6, 3, 4, 9, 1, 4, 4, 6, 9, 4, 7, 6, 6, 9, 1, 3,
       6, 1, 3, 0, 6, 5, 5, 1, 9, 5, 6, 0, 9, 0, 0, 1, 0, 4, 5, 2, 4, 5,
       7, 0, 7, 5, 9, 5, 5, 4, 7, 0, 4, 5, 5, 9, 9, 0, 2, 3, 8, 0, 6, 4,
       4, 9, 1, 2, 8, 3, 5, 2, 9, 0, 4, 4, 4, 3, 5, 3, 1, 3, 5, 9, 4, 2,
       7, 7, 4, 4, 1, 9, 2, 7, 8, 7, 2, 6, 9, 4, 0, 7, 2, 7, 5, 8, 7, 5,
       7, 7, 0, 6, 6, 4, 2, 8, 0, 9, 4, 6, 9, 9, 6, 9, 0, 3, 5, 6, 6, 0,
       6, 4, 3, 9, 3, 9, 7, 2, 9, 0, 4, 5, 3, 6, 5, 9, 9, 8, 4, 2, 1, 3,
       7, 7, 2, 2, 3, 9, 8, 0, 3, 2, 2, 5, 6, 9, 9, 4, 1, 5, 4, 2, 3, 6,
       4, 8, 5, 9, 5, 7, 8, 9, 4, 8, 1, 5, 4, 4, 9, 6, 1, 8, 6, 0, 4, 5,
       2, 7, 4, 6, 4, 5, 6, 0, 3, 2, 3, 6, 7, 1, 5, 1, 4, 7, 6, 8, 8, 5,
       5, 1, 6, 2, 8, 8, 9, 9, 7, 6, 2, 2, 2, 3, 4, 8, 8, 3, 6, 0, 9, 7,
       7, 0, 1, 0, 4, 5, 1, 5, 3, 6, 0, 4, 1, 0, 0, 3, 6, 5, 9, 7, 3, 5,
       5, 9, 9, 8, 5, 3, 3, 2, 0, 5, 8, 3, 4, 0, 2,

In [44]:
accuracy = np.sum(y_test == prediction) / len(y_test)
accuracy

0.9166666666666666

In [45]:
gbc.max_features_

6

In [48]:
from sklearn.datasets import load_diabetes

X, y = load_diabetes(return_X_y=True)

In [49]:
train_X, test_X, train_y, test_y = train_test_split(X, y, 
                                                    test_size = 0.25, 
                                                    random_state = 23)
train_X.shape, test_X.shape, train_y.shape, test_y.shape

((331, 10), (111, 10), (331,), (111,))

In [53]:
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.metrics import mean_squared_error
 
gbr = GradientBoostingRegressor(loss='absolute_error',
                                learning_rate=0.1,
                                n_estimators=100,
                                max_depth = 5, 
                                random_state = 23,
                                max_features = 5)
 
gbr.fit(train_X, train_y)
prediction = gbr.predict(test_X)
prediction

array([283.88578911, 103.60458892, 151.30595716,  88.41778663,
       149.5224744 , 154.7324027 , 200.72221524, 159.63019406,
       206.11411364, 211.33836085, 124.96244952, 121.57916732,
       163.19006777,  83.67545538, 114.9991849 , 211.18534265,
       251.90425203, 164.72844538, 129.46897752, 100.21991019,
       224.83614726, 159.51225101, 118.45875952, 148.59044144,
       132.20830552, 153.34353502, 134.80697867, 130.20754792,
       309.93276796,  92.06416738, 149.06351918, 262.24233467,
       203.25513475,  75.9403632 , 192.39799314, 248.54137892,
       123.26850847, 146.61054564, 153.59277799, 105.94786293,
       153.89825258, 210.47625413,  80.44656692, 132.17722462,
        80.83352621, 231.40685688, 222.82515018,  70.17964747,
       200.6056077 , 100.30364518, 167.85944652,  96.84430375,
       221.94263066, 162.09417355, 210.3989053 ,  88.28049319,
       249.56959098,  84.06701545, 160.71408909, 233.1350084 ,
       201.3103988 , 173.72525384,  92.59623727, 177.99

In [54]:
test_rmse = mean_squared_error(test_y, prediction) ** (1 / 2)
print('Root mean Square error: {:.2f}'.format(test_rmse))

Root mean Square error: 56.87


### EXTREME GRADIENT BOOSTING MACHINE (XGBM)

* Uses various regularization techniques.
* follows parallel processing of each node. So its faster than GBM
* Gets rid of worry abt missing values or sparse data.
* Using trees helps them Handle large features.
* They use CART (Which does not use all features at a time).
* A model whose parameters adjust itself iteratively (XGBoost) will learn better from streaming data than one with a fixed set of parameters for the entire ensemble (RF).
* Does not update weights of the samples post mis classification.

In [55]:
from sklearn.datasets import load_digits

X,y = load_digits(return_X_y=True)

In [56]:
from sklearn.model_selection import train_test_split 

X_train, X_test, y_train, y_test = train_test_split(X, y , test_size=0.2, random_state=42) 
X_train.shape, X_test.shape, y_train.shape, y_test.shape

((1437, 64), (360, 64), (1437,), (360,))

In [59]:
from xgboost import XGBClassifier

model = XGBClassifier()

# fit the model with the training data
model.fit(X_train,y_train)
prediction = model.predict(X_test)
prediction

array([6, 9, 3, 7, 2, 1, 5, 2, 5, 2, 1, 9, 4, 0, 4, 2, 3, 7, 8, 8, 4, 3,
       9, 7, 5, 6, 3, 5, 6, 3, 4, 9, 1, 4, 4, 6, 9, 4, 7, 6, 6, 9, 1, 3,
       6, 1, 3, 0, 6, 5, 5, 1, 9, 5, 6, 0, 9, 0, 0, 1, 0, 4, 5, 2, 4, 5,
       7, 0, 7, 5, 9, 5, 5, 4, 7, 0, 1, 5, 5, 9, 9, 0, 2, 3, 8, 0, 6, 4,
       4, 9, 1, 2, 8, 3, 5, 2, 9, 4, 4, 4, 4, 3, 5, 3, 1, 3, 5, 9, 4, 2,
       7, 7, 4, 4, 1, 9, 2, 7, 8, 7, 2, 6, 9, 4, 0, 7, 2, 7, 5, 8, 7, 5,
       7, 9, 0, 6, 6, 4, 2, 8, 0, 9, 4, 6, 9, 9, 6, 9, 0, 5, 5, 6, 6, 0,
       6, 4, 3, 9, 3, 7, 7, 2, 9, 0, 4, 5, 8, 6, 5, 8, 9, 8, 4, 2, 1, 3,
       7, 7, 2, 2, 3, 9, 8, 0, 3, 2, 2, 5, 6, 9, 9, 4, 1, 5, 4, 2, 3, 6,
       4, 8, 5, 9, 5, 7, 1, 9, 4, 8, 1, 5, 4, 4, 9, 6, 1, 8, 6, 0, 4, 5,
       2, 7, 4, 6, 4, 5, 6, 0, 3, 2, 3, 6, 7, 1, 5, 1, 4, 7, 6, 5, 8, 5,
       5, 1, 5, 2, 8, 8, 9, 9, 7, 6, 2, 2, 2, 3, 4, 8, 8, 3, 6, 0, 9, 7,
       7, 0, 1, 0, 4, 5, 1, 5, 3, 6, 0, 4, 1, 0, 0, 3, 6, 5, 9, 7, 3, 5,
       5, 9, 9, 8, 5, 3, 3, 2, 0, 5, 8, 3, 4, 0, 2,

In [61]:
accuracy = np.sum(y_test == prediction) / len(prediction)
accuracy

0.9694444444444444

### Light Gradient Boosting Machines (Light GBM)

* Can handle huge amount of data without any issue.
* Not good for lesser data
* leaf-wise growth of the nodes of the tree.

In [None]:
from lightgbm import LGBMClassifier

model = LGBMClassifier()

model.fit(X_train, y_train)
prediction = model.predict(X_test)

### CATBOOST
* Used for categorical data.


In [62]:
from sklearn.datasets import load_iris

data = load_iris()
X = data.data
y = data.target

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2])

### AdaBoost

* AdaBoost uses a different approach, where each subsequent model tries to focus on the samples that were misclassified by the previous model.
* AdaBoost do not include these regularization techniques.
* AdaBoost is generally slower than Gradient Boosting and XGBoost, as it requires multiple iterations to build the sequence of models.
* AdaBoost require explicit imputation of missing values.
* AdaBoost require a One-vs-All approach to solve multi-class problems.

### AdaBoost
* During each iteration in AdaBoost, the weights of incorrectly classified samples are increased, so that the next weak learner focuses more on these samples.	
* AdaBoost uses simple decision trees with one split known as the decision stumps of weak learners.	
* AdaBoost is more susceptible to noise and outliers in the data, as it assigns high weights to misclassified samples.

In [70]:
from sklearn.ensemble import AdaBoostClassifier

adagbm = AdaBoostClassifier(n_estimators=300, learning_rate=0.05, random_state=None, algorithm="SAMME",)

adagbm.fit(X_train, y_train)
prediction = adagbm.predict(X_test)
prediction

array([6, 9, 3, 8, 6, 5, 5, 3, 4, 8, 1, 9, 4, 0, 6, 2, 3, 7, 1, 8, 4, 3,
       9, 7, 5, 6, 3, 5, 6, 3, 6, 9, 3, 6, 4, 6, 9, 4, 7, 6, 6, 9, 6, 3,
       6, 1, 3, 0, 6, 5, 1, 1, 9, 5, 6, 0, 9, 0, 0, 1, 0, 4, 9, 5, 6, 5,
       7, 0, 7, 1, 9, 5, 5, 4, 7, 0, 6, 5, 0, 0, 9, 4, 2, 3, 8, 0, 6, 4,
       6, 9, 1, 2, 1, 3, 9, 8, 9, 4, 4, 4, 4, 3, 9, 3, 4, 1, 4, 9, 4, 8,
       1, 7, 4, 6, 4, 0, 1, 8, 8, 3, 2, 6, 9, 9, 0, 6, 6, 7, 5, 3, 7, 5,
       7, 9, 0, 6, 6, 4, 2, 8, 0, 9, 4, 6, 9, 9, 6, 0, 0, 1, 5, 6, 6, 0,
       6, 4, 9, 9, 3, 8, 7, 6, 0, 0, 6, 5, 8, 6, 5, 8, 9, 8, 4, 2, 1, 3,
       7, 7, 2, 2, 3, 0, 8, 0, 3, 2, 2, 5, 6, 9, 0, 4, 6, 6, 9, 2, 3, 6,
       4, 8, 5, 9, 5, 7, 8, 9, 4, 2, 1, 5, 4, 4, 9, 6, 1, 8, 6, 0, 9, 4,
       2, 4, 4, 6, 4, 6, 6, 7, 3, 2, 3, 6, 7, 9, 9, 1, 4, 4, 6, 9, 1, 5,
       4, 8, 4, 8, 8, 1, 8, 8, 7, 6, 2, 8, 2, 3, 6, 8, 8, 3, 6, 0, 9, 7,
       7, 0, 1, 0, 4, 5, 4, 5, 3, 6, 0, 4, 2, 0, 0, 3, 6, 5, 9, 7, 3, 5,
       5, 9, 9, 8, 5, 3, 3, 2, 0, 4, 1, 3, 4, 0, 8,

In [71]:
accuracy = np.sum(y_test == prediction) / len(y_test)
accuracy

0.7222222222222222