# Day 6 — Practical Gradient Boosting Tuning
### Machine Learning Roadmap — Week 4
### Author — N Manish Kumar
---

Gradient Boosting is a very powerful model, but it is also sensitive to
hyperparameters.

Unlike Random Forest, boosting models can easily overfit if not tuned
carefully.

Key parameters that control boosting behavior:

- n_estimators → number of boosting stages
- learning_rate → contribution of each tree
- max_depth → complexity of individual trees
- min_samples_split → regularization for splits

In this notebook we will:
- Train a baseline Gradient Boosting model
- Tune its hyperparameters using cross-validation
- Study interaction between learning rate and number of trees
- Compare tuned vs default boosting performance

Dataset used: **Breast Cancer Dataset (sklearn)**

---

## 1. Dataset Loading and Train/Test Split

In [1]:
import pandas as pd
import numpy as np

from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.metrics import accuracy_score

# Load dataset
data = load_breast_cancer()
X = pd.DataFrame(data.data, columns=data.feature_names)
y = pd.Series(data.target)

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(
    X, y,
    test_size=0.2,
    random_state=42,
    stratify=y
)

print("Training set shape:", X_train.shape)
print("Test set shape:", X_test.shape)

Training set shape: (455, 30)
Test set shape: (114, 30)


---
## 2. Default Gradient Boosting Baseline

Before tuning hyperparameters, we first train a Gradient Boosting model with
default settings.

This baseline performance will help us measure whether tuning actually leads
to better generalization on unseen data.

In [2]:
# Training default Gradient Boosting Model
gb_default = GradientBoostingClassifier(random_state=42)
gb_default.fit(X_train,y_train)

# Evaluate baseline performance
train_acc_default = accuracy_score(y_train,gb_default.predict(X_train))
test_acc_default = accuracy_score(y_test,gb_default.predict(X_test))

print("Default GB -> Train Accuracy:", train_acc_default)
print("Default GB -> Test Accuracy:", test_acc_default)

Default GB -> Train Accuracy: 1.0
Default GB -> Test Accuracy: 0.956140350877193


### Interpretation

The baseline Gradient Boosting model usually achieves high training accuracy,
but the gap between training and test accuracy indicates whether some
overfitting is occurring.

This baseline result will serve as a reference point to evaluate whether
hyperparameter tuning improves generalization.

---

## 3. Hyperparameter Grid and GridSearchCV

Gradient Boosting performance is highly sensitive to its hyperparameters.

The most important parameters are:

- n_estimators: number of boosting stages (trees)
- learning_rate: contribution of each tree
- max_depth: complexity of individual trees
- min_samples_split: regularization to avoid overly specific splits

Unlike Random Forest, boosting models can easily overfit if these parameters
are not chosen carefully.

We use GridSearchCV to test different combinations of these parameters and
select the configuration that provides the best cross-validated performance.


In [3]:
# Hyperparameter grid for Gradient Boosting
param_grid ={
    "n_estimators": [100,200],
    "learning_rate": [0.05,0.1],
    "max_depth": [2,3],
    "min_samples_split": [2,5]
}
gb = GradientBoostingClassifier(random_state=42)

grid_search = GridSearchCV(
    gb,
    param_grid = param_grid,
    cv = 5,
    scoring="accuracy",
    n_jobs=-1,
    verbose=1
)
grid_search.fit(X_train,y_train)

print("Best Parameters:", grid_search.best_params_)
print("Best CV Accuracy:", grid_search.best_score_)    

Fitting 5 folds for each of 16 candidates, totalling 80 fits
Best Parameters: {'learning_rate': 0.1, 'max_depth': 2, 'min_samples_split': 2, 'n_estimators': 200}
Best CV Accuracy: 0.9714285714285715


### Interpretation

GridSearchCV evaluates many Gradient Boosting configurations using
cross-validation and selects the combination that performs best on average.

Because boosting models are sensitive to hyperparameters, this step is crucial
for obtaining a model that generalizes well rather than simply memorizing the
training data.

The best parameters represent a balance between:
- model complexity
- learning speed
- regularization
---

## 4. Comparing Tuned Gradient Boosting with Default Model

After performing hyperparameter tuning using cross-validation, we evaluate the
best model on the test set.

This comparison answers the most important question:

Did tuning actually improve generalization, or did it only improve performance
on the training data?

We compare training and test accuracy for both the default and tuned models.


In [4]:
# Best tuned model from grid search
gb_tuned = grid_search.best_estimator_

# Evaluate Best Model
train_acc_tuned = accuracy_score(y_train,gb_tuned.predict(X_train))
test_acc_tuned = accuracy_score(y_test,gb_tuned.predict(X_test))

print("Default GB -> Train:", train_acc_default, "Test:", test_acc_default)
print("Tuned GB   -> Train:", train_acc_tuned, "Test:", test_acc_tuned)

Default GB -> Train: 1.0 Test: 0.956140350877193
Tuned GB   -> Train: 1.0 Test: 0.9473684210526315


### Interpretation

If the tuned model shows higher test accuracy than the default model, it means
that hyperparameter tuning successfully improved generalization.

If training accuracy increases but test accuracy does not, the tuning process
may have overfit to the cross-validation folds.

The ideal outcome is a tuned model with:
- similar or slightly higher training accuracym
- clearly improved test accuracy


In this case, the default Gradient Boosting model achieved slightly higher
test accuracy than the tuned model.

This does not mean tuning failed. It simply indicates that:

- Default parameters were already very strong for this dataset
- The difference in performance is small and within normal variance
- GridSearch optimized cross-validation performance, not test performance

In real-world practice, when default settings perform better, it is perfectly
valid to keep the simpler default model.

---

## 5. Interaction Between Learning Rate and Number of Trees

In Gradient Boosting, two hyperparameters are strongly connected:

- learning_rate
- n_estimators

These parameters work together:

- A **smaller learning rate** requires **more trees** to reach good performance  
- A **larger learning rate** may need **fewer trees** but risks overfitting

This trade-off is one of the most important practical aspects of boosting.

To observe this relationship, we train multiple models with different
combinations of learning rate and number of trees.


In [5]:
rates = [0.01,0.05,0.1,0.5]
trees = [50,100,200,300]

results=[]

for lr in rates:
    for n in trees:
        model = GradientBoostingClassifier(
            learning_rate = lr,
            n_estimators = n,
            random_state = 42
        )
        model.fit(X_train,y_train)

        test_acc = accuracy_score(y_test,model.predict(X_test))

        results.append({
            "Learning Rate": lr,
            "Trees": n,
            "Test Accuracy": test_acc
        })

pd.DataFrame(results)

Unnamed: 0,Learning Rate,Trees,Test Accuracy
0,0.01,50,0.938596
1,0.01,100,0.921053
2,0.01,200,0.929825
3,0.01,300,0.921053
4,0.05,50,0.929825
5,0.05,100,0.947368
6,0.05,200,0.95614
7,0.05,300,0.95614
8,0.1,50,0.947368
9,0.1,100,0.95614


### Interpretation

The results usually show a clear pattern:

- With very small learning rates (e.g., 0.01), performance is low with few trees
  but improves steadily as the number of trees increases.

- With higher learning rates (e.g., 0.1), good accuracy may be reached quickly,
  but adding too many trees can lead to overfitting.

This demonstrates that optimal boosting performance depends on balancing
learning rate and number of estimators together rather than tuning them
independently.

---

# Notebook Summary — Week 4 Day 6

In this notebook, we focused on practical hyperparameter tuning for
Gradient Boosting models and understanding how to control overfitting.

### What was done
- Trained a default Gradient Boosting model as baseline
- Tuned key hyperparameters using GridSearchCV
- Compared tuned and default model performance
- Analyzed interaction between learning rate and number of trees
- Observed how boosting behavior changes with different parameter choices

### Key Learnings
- Gradient Boosting is highly sensitive to hyperparameters
- Lower learning rate with more trees often gives better generalization
- Tuning does not always guarantee better test accuracy
- Cross-validation optimizes validation performance, not final test performance
- Understanding parameter interactions is more important than blind search

### Final Outcome
A deeper understanding was gained of how to tune boosting models effectively
and how to interpret cases where default settings may outperform tuned models.
