# Ensemble Methods in Machine Learning

## 1. Introduction

Ensemble methods are techniques that create multiple models and then combine them to produce improved results. The main idea is that by combining models, we can reduce the likelihood of errors and increase the overall accuracy and robustness of the predictions. This notebook will cover three popular ensemble methods: Bagging, Boosting, and Stacking.

## 2. Problem Definition

We will use the "California Housing Prices" dataset to demonstrate the effectiveness of ensemble methods. The goal is to predict the median house value in various districts based on features such as median income, average housing age, and total number of rooms.

## 3. Ensemble Methods

### 3.1 Bagging
Bagging, or Bootstrap Aggregating, is an ensemble method that fits multiple versions of a model on different subsets of the dataset and averages their predictions.

### 3.2 Boosting
Boosting is an ensemble technique that combines weak learners to create a strong learner by training models sequentially, each new model focusing on the errors of the previous ones.

### 3.3 Stacking
Stacking involves training multiple models and then using another model (meta-learner) to combine their predictions.

## 4. Evaluation Criteria

The following criteria will be used to evaluate the ensemble methods:
- **Mean Squared Error (MSE)**: A measure of the quality of the estimator.
- **Training Time**: The time taken to train the model.

## 5. Experiment Setup

We will preprocess the data, split it into training and testing sets, and standardize the features. We will then train each ensemble method and evaluate their performance based on the defined criteria.

## 6. Implementation and Comparison


In [21]:
# Import necessary libraries
import numpy as np
import pandas as pd
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import mean_squared_error
import time

# Load the California Housing Prices dataset
housing = fetch_california_housing(as_frame=True)
data = housing.frame

# Display the first few rows of the dataset
data.head()


Unnamed: 0,MedInc,HouseAge,AveRooms,AveBedrms,Population,AveOccup,Latitude,Longitude,MedHouseVal
0,8.3252,41.0,6.984127,1.02381,322.0,2.555556,37.88,-122.23,4.526
1,8.3014,21.0,6.238137,0.97188,2401.0,2.109842,37.86,-122.22,3.585
2,7.2574,52.0,8.288136,1.073446,496.0,2.80226,37.85,-122.24,3.521
3,5.6431,52.0,5.817352,1.073059,558.0,2.547945,37.85,-122.25,3.413
4,3.8462,52.0,6.281853,1.081081,565.0,2.181467,37.85,-122.25,3.422


### Data Preprocessing
We will split the data into features (X) and target (y), then standardize the features.


In [22]:
# Split the data into features (X) and target (y)
X = data.drop('MedHouseVal', axis=1)
y = data['MedHouseVal']

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Standardize the features
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)


### Bagging
We will use the BaggingRegressor from scikit-learn to implement bagging with a base estimator of DecisionTreeRegressor.


In [23]:
from sklearn.ensemble import BaggingRegressor
from sklearn.tree import DecisionTreeRegressor

# Initialize the base estimator
base_estimator = DecisionTreeRegressor()

# Initialize the Bagging Regressor with the base estimator
bagging = BaggingRegressor(n_estimators=100, random_state=42)

# Train the model and measure the training time
start_time = time.time()
bagging.fit(X_train, y_train)
training_time_bagging = time.time() - start_time

# Make predictions
y_pred_bagging = bagging.predict(X_test)

# Evaluate the model
mse_bagging = mean_squared_error(y_test, y_pred_bagging)

# Display results
print(f"Bagging MSE: {mse_bagging}")
print(f"Bagging Training Time: {training_time_bagging}")


Bagging MSE: 0.2557488658168742
Bagging Training Time: 14.415130853652954


### Boosting
We will use the GradientBoostingRegressor from scikit-learn to implement boosting.


In [24]:
from sklearn.ensemble import GradientBoostingRegressor

# Initialize the Gradient Boosting Regressor
boosting = GradientBoostingRegressor(n_estimators=100, random_state=42)

# Train the model and measure the training time
start_time = time.time()
boosting.fit(X_train, y_train)
training_time_boosting = time.time() - start_time

# Make predictions
y_pred_boosting = boosting.predict(X_test)

# Evaluate the model
mse_boosting = mean_squared_error(y_test, y_pred_boosting)

# Display results
print(f"Boosting MSE: {mse_boosting}")
print(f"Boosting Training Time: {training_time_boosting}")


Boosting MSE: 0.29399901242474274
Boosting Training Time: 4.080959796905518


### Stacking
We will use the StackingRegressor from scikit-learn to implement stacking with LinearRegression, DecisionTreeRegressor, and KNeighborsRegressor as base estimators and GradientBoostingRegressor as the meta-learner.


In [25]:
from sklearn.ensemble import StackingRegressor
from sklearn.linear_model import LinearRegression
from sklearn.neighbors import KNeighborsRegressor

# Initialize the Stacking Regressor
estimators = [
    ('lr', LinearRegression()),
    ('dt', DecisionTreeRegressor()),
    ('knn', KNeighborsRegressor())
]
stacking = StackingRegressor(estimators=estimators, final_estimator=GradientBoostingRegressor())

# Train the model and measure the training time
start_time = time.time()
stacking.fit(X_train, y_train)
training_time_stacking = time.time() - start_time

# Make predictions
y_pred_stacking = stacking.predict(X_test)

# Evaluate the model
mse_stacking = mean_squared_error(y_test, y_pred_stacking)

# Display results
print(f"Stacking MSE: {mse_stacking}")
print(f"Stacking Training Time: {training_time_stacking}")


Stacking MSE: 0.3385051504083837
Stacking Training Time: 4.0293190479278564


## 7. Conclusion

Based on our experiments, we can summarize the performance of each ensemble method:

- **Bagging**
  - MSE: 0.2557
  - Training Time: 14.42 seconds

- **Boosting**
  - MSE: 0.2940
  - Training Time: 4.08 seconds

- **Stacking**
  - MSE: 0.3385
  - Training Time: 4.03 seconds

These results indicate that the Bagging model achieved the lowest Mean Squared Error (MSE) among the three ensemble methods, with a training time of approximately 14.42 seconds. The Boosting model had a slightly higher MSE but significantly lower training time, around 4.08 seconds. The Stacking model had the highest MSE and a training time similar to the Boosting model. However please note that the dataset used is a relatively small dataset, so for this case the training time is not important but for larger datasets it might become an issue if you are using Bagging model.

## 8. References
- Scikit-learn documentation: https://scikit-learn.org/
- California Housing Prices dataset: https://scikit-learn.org/stable/modules/generated/sklearn.datasets.fetch_california_housing.html
