***Gradient Boosting Classifier*** 

Is a powerful machine learning algorithm It builds an ensemble of weak learners, typically decision trees, in a sequential manner where each new tree attempts to correct the errors made by the previous trees. The final model is a weighted sum of all the individual trees, which helps to improve accuracy and reduce overfitting.

why 'gradient' instead of explicitly computing the residuals, the algorithm use the negative gradient of the loss function with respect to the current model's predictions as a proxy for the residuals. This approach allows for more flexibility in choosing different loss functions and can lead to better performance in certain scenarios.


In [None]:
# implementation of gradient boosting classifier using sklearn

from sklearn.ensemble import GradientBoostingRegressor
from sklearn.datasets import load_boston
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error

# Load data
X, y = load_boston(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Model
gbr = GradientBoostingRegressor(
    n_estimators=200,
    learning_rate=0.1,
    max_depth=3,
    random_state=42
)

# Train
gbr.fit(X_train, y_train)
preds = gbr.predict(X_test)

# Evaluate
print("MSE:", mean_squared_error(y_test, preds))

***Key Hyperparameters***

| Parameter       | Meaning                           | Typical Range                     |
| --------------- | --------------------------------- | --------------------------------- |
| `n_estimators`  | Number of boosting stages         | 100–1000                          |
| `learning_rate` | Shrinks contribution of each tree | 0.01–0.1                          |
| `max_depth`     | Tree depth                        | 3–5                               |
| `subsample`     | % of data used per tree           | 0.5–1.0                           |
| `loss`          | Objective function                | `squared_error`, `log_loss`, etc. |


Log loss measures how well a classification model predicts the probability of class membership, with lower values indicating better performance.

| True Label | Predicted Probability | Log Loss (per sample) |
|-----------:|---------------------:|----------------------:|
| 1          | 0.9                  | 0.105 (small)         |
| 0          | 0.1                  | 0.105 (small)         |
| 1          | 0.8                  | 0.223 (Medium)        |
| 1          | 0.4                  | 0.916 (Large)         |
| **Mean**   |                      | **0.338**             |