# ⚡️ AdaBoost Regression

## 📖 What Is AdaBoost?

**AdaBoost (Adaptive Boosting)** is an ensemble method that combines multiple **weak learners** (typically shallow decision trees) by adaptively adjusting their weights based on how well they perform on the training data.

Unlike Gradient Boosting, which fits residuals using gradient descent, **AdaBoost adjusts the weights of the training data points**: harder-to-predict points receive more focus in the next iteration.

---

### 🔍 Key Ideas

- Weak learners are trained sequentially.
- After each round, examples with higher errors are given **more weight**.
- The final prediction is a **weighted sum** of all weak learners.

---

### ⚙️ Key Parameters (in `AdaBoostRegressor`)

| Parameter | Description |
|-----------|-------------|
| `n_estimators` | Number of boosting rounds |
| `learning_rate` | Shrinks contribution of each regressor |
| `base_estimator` | Typically a shallow `DecisionTreeRegressor` |

---

### 🧪 Simulated Example

We fit:

\[
y = \sin(2\pi x) + \varepsilon
\]

and compare AdaBoost to other methods.


In [None]:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.ensemble import AdaBoostRegressor
from sklearn.tree import DecisionTreeRegressor

# Simulated data
np.random.seed(0)
X = np.sort(np.random.rand(100, 1), axis=0)
y = np.sin(2 * np.pi * X).ravel() + np.random.normal(0, 0.2, size=100)
X_plot = np.linspace(0, 1, 500).reshape(-1, 1)

# Fit AdaBoost
ada = AdaBoostRegressor(
    estimator=DecisionTreeRegressor(max_depth=4),
    n_estimators=100,
    learning_rate=0.5,
    random_state=0
)
ada.fit(X, y)

# Predict
y_pred = ada.predict(X_plot)

# Plot
plt.figure(figsize=(8, 5))
plt.scatter(X, y, s=20, label="Data")
plt.plot(X_plot, y_pred, color="red", label="AdaBoost Prediction")
plt.title("AdaBoost Regression on Simulated Data")
plt.xlabel("x")
plt.ylabel("y")
plt.legend()
plt.tight_layout()
plt.show()

## 🏠 Real Data: California Housing

In [None]:
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, r2_score
import seaborn as sns
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.ensemble import AdaBoostRegressor
from sklearn.tree import DecisionTreeRegressor

# Load data
data = fetch_california_housing()
X_real = data.data
y_real = data.target
feature_names = data.feature_names

# Train/test split
X_train, X_test, y_train, y_test = train_test_split(X_real, y_real, test_size=0.2, random_state=1)

# Fit AdaBoost
ada_real = AdaBoostRegressor(
    estimator=DecisionTreeRegressor(max_depth=4),
    n_estimators=200,
    learning_rate=0.5,
    random_state=1
)
ada_real.fit(X_train, y_train)
y_pred_real = ada_real.predict(X_test)

# Evaluation
print(f"R^2: {r2_score(y_test, y_pred_real):.4f}")
print(f"MSE: {mean_squared_error(y_test, y_pred_real):.4f}")

# Plot predicted vs. true
plt.figure(figsize=(7, 5))
plt.scatter(y_test, y_pred_real, alpha=0.5)
plt.plot([y_test.min(), y_test.max()], [y_test.min(), y_test.max()], 'r--')
plt.xlabel("True Value")
plt.ylabel("Predicted Value")
plt.title("AdaBoost Regression on California Housing")
plt.tight_layout()
plt.show()

In [None]:
# Feature importance
importances = ada_real.feature_importances_
importance_df = pd.DataFrame({'Feature': feature_names, 'Importance': importances})
importance_df = importance_df.sort_values(by='Importance', ascending=False)

plt.figure(figsize=(8, 5))
sns.barplot(data=importance_df, x='Importance', y='Feature', palette='viridis')
plt.title("Feature Importance from AdaBoost")
plt.tight_layout()
plt.show()