# AdaBoosting
   1. AdaBoost is an ensemble learning algorithm that combines many weak learners (usually decision stumps) to create a strong learner.

   2. Key Idea:
Each learner is trained one after the other, and each one tries to fix the mistakes of the previous ones by giving more weight to the misclassified samples.   

In [2]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns


In [4]:
from sklearn.datasets import fetch_california_housing
# Step 1: Load the dataset
data = fetch_california_housing()

df = pd.DataFrame(data.data, columns=data.feature_names)
df['target'] = data.target

In [6]:
df.head(10)

Unnamed: 0,MedInc,HouseAge,AveRooms,AveBedrms,Population,AveOccup,Latitude,Longitude,target
0,8.3252,41.0,6.984127,1.02381,322.0,2.555556,37.88,-122.23,4.526
1,8.3014,21.0,6.238137,0.97188,2401.0,2.109842,37.86,-122.22,3.585
2,7.2574,52.0,8.288136,1.073446,496.0,2.80226,37.85,-122.24,3.521
3,5.6431,52.0,5.817352,1.073059,558.0,2.547945,37.85,-122.25,3.413
4,3.8462,52.0,6.281853,1.081081,565.0,2.181467,37.85,-122.25,3.422
5,4.0368,52.0,4.761658,1.103627,413.0,2.139896,37.85,-122.25,2.697
6,3.6591,52.0,4.931907,0.951362,1094.0,2.128405,37.84,-122.25,2.992
7,3.12,52.0,4.797527,1.061824,1157.0,1.788253,37.84,-122.25,2.414
8,2.0804,42.0,4.294118,1.117647,1206.0,2.026891,37.84,-122.26,2.267
9,3.6912,52.0,4.970588,0.990196,1551.0,2.172269,37.84,-122.25,2.611


In [8]:
# Split Independent Feature X and Dependent Feature y

x = df.drop('target', axis = 'columns')
y = df.target

In [20]:
# Split data into Train and Test
from sklearn.model_selection import train_test_split, GridSearchCV
X_train, X_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=42)

In [26]:
# Step 4: AdaBoost Regressor with base estimator
from sklearn.ensemble import AdaBoostRegressor
from sklearn.tree import DecisionTreeRegressor

base_estimator = DecisionTreeRegressor(max_depth=4)
model = AdaBoostRegressor(estimator=base_estimator, random_state=42)

In [28]:
# Step 5: Define hyperparameter grid
param_grid = {
    'n_estimators': [50, 100, 150],
    'learning_rate': [0.01, 0.1, 0.5, 1.0]
}


In [30]:
# Step 6: GridSearchCV
grid_search = GridSearchCV(model, param_grid, cv=5, scoring='neg_mean_squared_error', n_jobs=-1, verbose=1)
grid_search.fit(X_train, y_train)

Fitting 5 folds for each of 12 candidates, totalling 60 fits


In [40]:
# Step 7: Evaluation
from sklearn.metrics import mean_squared_error, r2_score
best_model = grid_search.best_estimator_
y_pred = best_model.predict(X_test)


In [42]:
# Step 8: Output metrics
print("Best Parameters:", grid_search.best_params_)
print("MSE:", mean_squared_error(y_test, y_pred)*100)
print("R² Score:", r2_score(y_test, y_pred))

Best Parameters: {'learning_rate': 0.1, 'n_estimators': 100}
MSE: 48.93125845805706
R² Score: 0.6265955188757737


## 🔍 Notes:
estimator is a weak learner (typically a small decision tree).

You can try other base estimators as well, but DecisionTreeRegressor is default.

scoring='neg_mean_squared_error' is used for regression error.