#### About

> Gradient Boosting

Gradient Boosting is a popular machine learning technique used for both regression and classification tasks. It is an ensemble method that combines the predictions of multiple base learners (often referred to as "weak learners") to create a stronger overall model. 

Three widely used gradient boosting libraries are LightGBM, XGBoost, and CatBoost, which are known for their efficiency and effectiveness in handling large datasets and providing high-quality predictions.


Gradient Boosting is a type of boosting algorithm that iteratively builds a predictive model by adding base learners to minimize the residual errors of the previous iterations. The basic idea is to start with an initial model (e.g., a simple decision tree with a small number of nodes) and then sequentially add more models to correct the errors of the previous models.

The key intuition behind gradient boosting is to optimize the gradient of the loss function with respect to the predicted values. This is done by updating the predicted values at each iteration based on the negative gradient of the loss function. The predicted values are updated in a way that minimizes the loss function, effectively moving the model towards the direction of steepest descent in the loss function space.

> Mathematics

Given a training dataset with input features denoted as X and corresponding target labels denoted as y, the objective is to find an optimal model F(x) that minimizes the loss function L(y, F(x)).

At each iteration t, the gradient boosting algorithm updates the predicted values F_t(x) using the negative gradient of the loss function L(y, F(x)) with respect to the predicted values F(x). The negative gradient is multiplied by a learning rate (or step size) denoted as eta, which controls the step size of the model updates. The updated predicted values F_t(x) are then combined with the predictions of the previous models to get the final prediction F(x) as follows:

F_t(x) = F_{t-1}(x) + lr * h_t(x),

where F_{t-1}(x) is the predicted values of the model at the previous iteration, lr is the learning rate, h_t(x) is the base learner added at the current iteration, and F_t(x) is the updated predicted values.


> LightGBM, XGBoost, and CatBoost

LightGBM, XGBoost, and CatBoost are three popular gradient boosting libraries that have gained popularity due to their speed, scalability, and accuracy. They are optimized for large datasets and offer various advanced features to enhance model performance.

> LightGBM

LightGBM is a gradient boosting framework developed by Microsoft that is known for its efficiency and speed. It uses a combination of techniques, such as Gradient-based One-Side Sampling (GOSS) and Exclusive Feature Bundling (EFB), to reduce the memory usage and improve the training speed. LightGBM also supports parallel and distributed training, which makes it well-suited for handling large datasets. The algorithm is implemented in C++ and provides APIs for various programming languages, including Python, R, and Java.

> Mathematics

The update rule for LightGBM can be expressed as follows:

F_t(x) = F_{t-1}(x) + eta * h_t(x),

where F_{t-1}(x) is the predicted values of the model at the previous iteration, eta is the learning rate, and h_t(x) is the base learner added at the current iteration.

The base learners in LightGBM are typically decision trees, which are constructed based on the GOSS technique. GOSS involves sampling a subset of data points based on their gradient magnitudes, and then using these samples to train 
the decision trees. This helps in reducing the number of data points used for training while retaining the important samples that have higher gradient magnitudes, which in turn can lead to faster convergence and improved model performance.

LightGBM also uses Exclusive Feature Bundling (EFB) to combine features with similar values into a single feature, which reduces the number of features used in the model and helps in reducing the memory usage and training time.


> XGBoost

XGBoost (eXtreme Gradient Boosting) is another popular gradient boosting library that is known for its efficiency and accuracy.XGBoost supports both regression and classification tasks and offers various advanced features, such as regularized learning, handling missing values, and tree pruning, to improve model performance.

> Mathematics

The update rule for XGBoost can be expressed as follows:

F_t(x) = F_{t-1}(x) + eta * h_t(x),

where F_{t-1}(x) is the predicted values of the model at the previous iteration, eta is the learning rate, and h_t(x) is the base learner added at the current iteration.

The base learners in XGBoost are typically decision trees, which are constructed using a technique called "Gradient-based Tree Learning". This technique involves computing the gradient of the loss function with respect to the predicted values, and then using this gradient to guide the tree construction process. XGBoost also uses regularization techniques, such as L1 (Lasso) and L2 (Ridge) regularization, to control the complexity of the model and prevent overfitting.


> CatBoost 

CatBoost is another gradient boosting library developed by Yandex that is known for its ability to handle categorical features without requiring explicit feature encoding. It uses a combination of ordered boosting and categorical feature handling techniques to achieve high model performance. 

> Mathematics of CatBoost

The update rule for CatBoost can be expressed as follows:

F_t(x) = F_{t-1}(x) + eta * h_t(x),

where F_{t-1}(x) is the predicted values of the model at the previous iteration, eta is the learning rate, and h_t(x) is the base learner added at the current iteration.

The base learners in CatBoost are typically decision trees, which are constructed using the ordered boosting technique. This technique involves splitting the data points based on the ordered values of the categorical features, which helps in handling categorical features without explicit encoding. CatBoost also uses various other techniques, such as random permutations of the feature values, to further improve the model performance.



In [2]:
# Import libraries
import lightgbm as lgb
import xgboost as xgb
from catboost import CatBoostClassifier
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

In [3]:
# Generate synthetic dataset
X, y = make_classification(n_samples=1000, n_features=10, random_state=42)


In [4]:
# Split dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)


In [5]:
# LightGBM
lgb_model = lgb.LGBMClassifier()
lgb_model.fit(X_train, y_train)
y_pred_lgb = lgb_model.predict(X_test)

In [6]:
accuracy_lgb = accuracy_score(y_test, y_pred_lgb)
print("Accuracy (LightGBM):", accuracy_lgb)

Accuracy (LightGBM): 0.88


In [7]:
#xgb
xgb_model = xgb.XGBClassifier()
xgb_model.fit(X_train, y_train)

In [8]:
y_pred_xgb = xgb_model.predict(X_test)



In [9]:
accuracy_xgb = accuracy_score(y_test, y_pred_xgb)
print("Accuracy (XGBoost):", accuracy_xgb)

Accuracy (XGBoost): 0.88


In [10]:
cat_model = CatBoostClassifier()
cat_model.fit(X_train, y_train)

Learning rate set to 0.009366
0:	learn: 0.6847054	total: 50.8ms	remaining: 50.8s
1:	learn: 0.6750957	total: 54.1ms	remaining: 27s
2:	learn: 0.6652065	total: 57.9ms	remaining: 19.3s
3:	learn: 0.6560905	total: 60.5ms	remaining: 15.1s
4:	learn: 0.6482287	total: 66.9ms	remaining: 13.3s
5:	learn: 0.6393761	total: 69.6ms	remaining: 11.5s
6:	learn: 0.6291191	total: 71.9ms	remaining: 10.2s
7:	learn: 0.6214537	total: 77.7ms	remaining: 9.63s
8:	learn: 0.6131741	total: 79.9ms	remaining: 8.8s
9:	learn: 0.6057145	total: 81.7ms	remaining: 8.09s
10:	learn: 0.5983343	total: 83.6ms	remaining: 7.52s
11:	learn: 0.5904946	total: 85.4ms	remaining: 7.03s
12:	learn: 0.5831570	total: 87.3ms	remaining: 6.63s
13:	learn: 0.5775119	total: 89ms	remaining: 6.27s
14:	learn: 0.5707378	total: 91.1ms	remaining: 5.98s
15:	learn: 0.5625348	total: 92.9ms	remaining: 5.71s
16:	learn: 0.5549398	total: 94.8ms	remaining: 5.48s
17:	learn: 0.5483592	total: 96.7ms	remaining: 5.28s
18:	learn: 0.5415257	total: 98.7ms	remaining: 5.0

<catboost.core.CatBoostClassifier at 0x7f6e1f7aad60>

In [11]:
y_pred_cat = cat_model.predict(X_test)

In [12]:
accuracy_cat = accuracy_score(y_test, y_pred_cat)
print("Accuracy (CatBoost):", accuracy_cat)

Accuracy (CatBoost): 0.875
