# Gradient Boosting Machines (GBMs)

Gradient Boosting Machines (GBMs) are a powerful ensemble learning technique for both classification and regression tasks. They build models sequentially, where each subsequent model attempts to correct the errors of the previous models. The models are trained using the gradient descent optimization method to minimize a specific loss function.

In this notebook, we will cover the intuition behind GBMs, advantages, disadvantages, usage areas, and demonstrate an implementation in Python using `scikit-learn`'s `GradientBoostingClassifier`.

---

## Table of Contents

1. [What is Gradient Boosting?](#1-what-is-gradient-boosting)
2. [How Gradient Boosting Works](#2-how-gradient-boosting-works)
3. [Advantages and Disadvantages of Gradient Boosting](#3-advantages-and-disadvantages-of-gradient-boosting)
4. [Use Cases of Gradient Boosting Machines](#4-use-cases-of-gradient-boosting-machines)
5. [Implementing Gradient Boosting in Python](#5-implementing-gradient-boosting-in-python)
6. [Evaluating the Gradient Boosting Model](#6-evaluating-the-gradient-boosting-model)

--- 

## 1. What is Gradient Boosting?

Gradient Boosting is an ensemble technique that creates a strong predictive model by combining the outputs of many weaker models, typically decision trees. The models are built sequentially, with each new model focusing on correcting the errors made by the previous models.

### Key Concepts:
- **Boosting**: An iterative process that adjusts weights or residuals to improve prediction accuracy.
- **Gradient Descent**: An optimization algorithm that minimizes the loss function of the model by iteratively adjusting the model parameters.

The models are trained in sequence, and each model is fit to the residual errors of the previous model. This method minimizes a given loss function by combining the predictions of the individual models.

---


## 2. How Gradient Boosting Works

1. **Initialization**: Start with an initial prediction for each training example. Typically, this is a mean value for regression or class probabilities for classification.
2. **Build Weak Learner (Tree)**: Fit a weak learner, typically a shallow decision tree, to the residual errors or gradients.
3. **Update Prediction**: Combine the weak learner’s predictions with the previous model’s predictions, typically using a weighted sum.
4. **Iterate**: Repeat steps 2 and 3 until the model converges or the maximum number of weak learners (trees) is reached.

### Loss Function and Gradient Descent
In each iteration, the gradient of the loss function with respect to the model’s predictions is calculated. The new tree is fit to this gradient, effectively correcting the errors of the previous model.

---

## 3. Advantages and Disadvantages of Gradient Boosting

### Advantages:
- **High Accuracy**: GBMs often achieve state-of-the-art performance for both classification and regression tasks.
- **Handles Missing Data**: Gradient boosting can handle missing data well, particularly with tree-based models.
- **Versatility**: It can be applied to both regression and classification tasks.
- **Feature Importance**: Provides a measure of feature importance, which is helpful in understanding the model.

### Disadvantages:
- **Prone to Overfitting**: If not properly tuned, GBMs can overfit, especially when too many trees are added.
- **Computationally Intensive**: Training GBMs can be slow and requires more memory than simpler models.
- **Parameter Sensitivity**: GBMs have several hyperparameters (learning rate, number of trees, max depth, etc.) that must be carefully tuned for optimal performance.

---

## 4. Use Cases of Gradient Boosting Machines

GBMs are used in a wide range of applications:
- **Financial Fraud Detection**: To detect fraudulent transactions.
- **Risk Modeling**: In insurance and finance to predict risks.
- **Marketing Analytics**: To predict customer behavior and segmentation.
- **Healthcare**: Used in predicting patient outcomes and personalized medicine.

---

## 5. Implementing Gradient Boosting in Python

Let’s implement Gradient Boosting using the `GradientBoostingClassifier` from `scikit-learn` on the famous **Iris dataset**.


In [1]:
import numpy as np
import pandas as pd
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix

In [2]:
iris = load_iris()
X = iris.data
y = iris.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

In [3]:
gbc = GradientBoostingClassifier(n_estimators=100, learning_rate=0.1, max_depth=3, random_state=42)
gbc.fit(X_train, y_train)
y_pred = gbc.predict(X_test)

## 6. Evaluating the Gradient Boosting Model

In [4]:
accuracy = accuracy_score(y_test, y_pred)
print(f'Accuracy of Gradient Boosting model: {accuracy:.2f}')

conf_matrix = confusion_matrix(y_test, y_pred)
print("Confusion Matrix:\n", conf_matrix)

class_report = classification_report(y_test, y_pred)
print("Classification Report:\n", class_report)


Accuracy of Gradient Boosting model: 1.00
Confusion Matrix:
 [[10  0  0]
 [ 0  9  0]
 [ 0  0 11]]
Classification Report:
               precision    recall  f1-score   support

           0       1.00      1.00      1.00        10
           1       1.00      1.00      1.00         9
           2       1.00      1.00      1.00        11

    accuracy                           1.00        30
   macro avg       1.00      1.00      1.00        30
weighted avg       1.00      1.00      1.00        30

