# Gradient Boosting: Advanced Tutorial

**Gradient Boosting** is a powerful ensemble technique that builds additive models in a forward stage-wise fashion.
Each new model corrects the errors made by the previous ones. It is widely used in tabular ML competitions.

## 1. Import Required Libraries

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.datasets import load_breast_cancer
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report

sns.set(style='whitegrid')


## 2. Load Dataset

In [None]:
data = load_breast_cancer()
X = pd.DataFrame(data.data, columns=data.feature_names)
y = pd.Series(data.target)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

X.head()


## 3. Train Gradient Boosting Model

In [None]:
model = GradientBoostingClassifier(n_estimators=100, learning_rate=0.1, max_depth=3, random_state=42)
model.fit(X_train, y_train)


## 4. Evaluate Model

In [None]:
y_pred = model.predict(X_test)

print("Accuracy:", accuracy_score(y_test, y_pred))
print("Confusion Matrix:\n", confusion_matrix(y_test, y_pred))
print("Classification Report:\n", classification_report(y_test, y_pred))


## 5. Feature Importance

In [None]:
importances = model.feature_importances_
indices = np.argsort(importances)[::-1]
features = X.columns

plt.figure(figsize=(10, 6))
plt.title("Feature Importances")
sns.barplot(x=importances[indices], y=features[indices])
plt.show()


## 6. Cross-Validation

In [None]:
cv_scores = cross_val_score(model, X, y, cv=5)
print("CV Accuracy Scores:", np.round(cv_scores, 3))
print("Mean CV Accuracy:", round(np.mean(cv_scores), 3))


## 7. Effect of Learning Rate

In [None]:
learning_rates = [0.001, 0.01, 0.1, 0.2, 0.5]
cv_means = []

for lr in learning_rates:
    gb = GradientBoostingClassifier(n_estimators=100, learning_rate=lr, random_state=42)
    score = cross_val_score(gb, X, y, cv=5).mean()
    cv_means.append(score)

plt.plot(learning_rates, cv_means, marker='o')
plt.title("Effect of Learning Rate on CV Accuracy")
plt.xlabel("Learning Rate")
plt.ylabel("CV Accuracy")
plt.show()


## 8. Summary

- Gradient Boosting builds models sequentially
- Highly effective for structured/tabular data
- Tune `n_estimators`, `learning_rate`, and `max_depth`
- Compare with XGBoost and LightGBM for better speed/performance