## Gradient Boosting

Gradient Boosting is a powerful machine learning technique for both regression and classification problems. It's an ensemble learning method that combines the predictions of multiple base models (typically decision trees) to create a stronger predictive model. The basic idea behind gradient boosting is to build an ensemble of weak learners, sequentially, where each learner corrects the errors made by its predecessor. The final model is a weighted sum of these weak learners.

Here's a step-by-step explanation of how gradient boosting works:

1. **Initialize the Model**: Gradient boosting starts with an initial model, often a simple one like a single decision tree. This is often referred to as the "base model" or the "first learner."

2. **Calculate Residuals**: The next step is to calculate the residuals, which are the differences between the actual target values and the predictions made by the current model. These residuals represent the errors made by the model and will be used to fit the next model.

3. **Fit a Weak Learner**: A new model (usually a decision tree) is fitted to the residuals from the previous step. The goal of this new model is to capture the patterns or relationships that the initial model failed to capture. The new model is typically a weak learner, meaning it's a simple model with limited depth to prevent overfitting.

4. **Update the Model**: The new model is then added to the ensemble with a weighted contribution. The weights are determined during the training process. The weights give more importance to models that perform better in reducing the residuals.

5. **Repeat Steps 2-4**: Steps 2-4 are repeated iteratively. At each step, a new weak learner is added to the ensemble, and its weight is determined based on how well it corrects the errors made by the current ensemble. This process continues until a predefined number of iterations or until a performance threshold is reached.

6. **Final Prediction**: The final prediction is made by aggregating the predictions from all the weak learners. This is done by summing up the weighted predictions from each learner. For regression tasks, this results in a continuous prediction, while for classification tasks, it can be used to determine class probabilities or final class labels.

Gradient boosting offers several advantages:

- **High Predictive Power**: Gradient boosting often leads to highly accurate models and is considered one of the state-of-the-art techniques for many machine learning competitions.

- **Robustness to Overfitting**: By using weak learners and adding them sequentially, gradient boosting tends to be less prone to overfitting compared to a single complex model.

- **Flexibility**: It can be used with various loss functions and weak learner types, making it adaptable to a wide range of problems.

However, it also has some limitations, such as being computationally intensive and potentially requiring careful hyperparameter tuning. Popular implementations of gradient boosting include XGBoost, LightGBM, and CatBoost, which have been optimized for performance and scalability.

In [1]:
import xgboost as xgb
import numpy as np
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

In [2]:
iris = load_iris()
X, y = iris.data, iris.target

In [3]:
X

array([[5.1, 3.5, 1.4, 0.2],
       [4.9, 3. , 1.4, 0.2],
       [4.7, 3.2, 1.3, 0.2],
       [4.6, 3.1, 1.5, 0.2],
       [5. , 3.6, 1.4, 0.2],
       [5.4, 3.9, 1.7, 0.4],
       [4.6, 3.4, 1.4, 0.3],
       [5. , 3.4, 1.5, 0.2],
       [4.4, 2.9, 1.4, 0.2],
       [4.9, 3.1, 1.5, 0.1],
       [5.4, 3.7, 1.5, 0.2],
       [4.8, 3.4, 1.6, 0.2],
       [4.8, 3. , 1.4, 0.1],
       [4.3, 3. , 1.1, 0.1],
       [5.8, 4. , 1.2, 0.2],
       [5.7, 4.4, 1.5, 0.4],
       [5.4, 3.9, 1.3, 0.4],
       [5.1, 3.5, 1.4, 0.3],
       [5.7, 3.8, 1.7, 0.3],
       [5.1, 3.8, 1.5, 0.3],
       [5.4, 3.4, 1.7, 0.2],
       [5.1, 3.7, 1.5, 0.4],
       [4.6, 3.6, 1. , 0.2],
       [5.1, 3.3, 1.7, 0.5],
       [4.8, 3.4, 1.9, 0.2],
       [5. , 3. , 1.6, 0.2],
       [5. , 3.4, 1.6, 0.4],
       [5.2, 3.5, 1.5, 0.2],
       [5.2, 3.4, 1.4, 0.2],
       [4.7, 3.2, 1.6, 0.2],
       [4.8, 3.1, 1.6, 0.2],
       [5.4, 3.4, 1.5, 0.4],
       [5.2, 4.1, 1.5, 0.1],
       [5.5, 4.2, 1.4, 0.2],
       [4.9, 3

In [4]:
y

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2])

In [5]:
# We'll use only two classes for binary classification
X = X[y != 2]
y = y[y != 2]

In [6]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)


In [8]:
# Creating an XGBoost DMatrix for efficient data handling
dtrain = xgb.DMatrix(X_train, label=y_train)
dtest = xgb.DMatrix(X_test, label=y_test)

In [9]:
# Setting hyperparameters for the XGBoost model
params = {
    'max_depth': 3,  # Maximum depth of the trees
    'eta': 0.1,      # Learning rate
    'objective': 'binary:logistic',
    'eval_metric': 'logloss'
}

In [10]:
# Training the XGBoost model
num_round = 100  # Number of boosting rounds (iterations)
model = xgb.train(params, dtrain, num_round)

In [11]:
y_pred = model.predict(dtest)

In [12]:
y_pred_binary = [1 if pred > 0.5 else 0 for pred in y_pred]


In [13]:
# Calculatimg accuracy on the test set
accuracy = accuracy_score(y_test, y_pred_binary)
print(f"Accuracy: {accuracy}")

Accuracy: 1.0


In [14]:
importance = model.get_score(importance_type='weight')
print("Feature Importance:")
for feature, score in importance.items():
    print(f"{feature}: {score}")

Feature Importance:
f2: 40.0
