# **Part A: Data Preprocessing & Baseline**

### 1.  Data Loading and Feature Engineering:

In [2]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import OneHotEncoder
from sklearn.compose import ColumnTransformer
from sklearn.metrics import mean_squared_error
from sklearn.tree import DecisionTreeRegressor
from sklearn.linear_model import LinearRegression
import numpy as np

In [3]:
df = pd.read_csv("hour.csv")

In [9]:
df.isnull().sum().sum()

np.int64(0)

In [27]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 17379 entries, 0 to 17378
Data columns (total 13 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   season      17379 non-null  int64  
 1   yr          17379 non-null  int64  
 2   mnth        17379 non-null  int64  
 3   hr          17379 non-null  int64  
 4   holiday     17379 non-null  int64  
 5   weekday     17379 non-null  int64  
 6   workingday  17379 non-null  int64  
 7   weathersit  17379 non-null  int64  
 8   temp        17379 non-null  float64
 9   atemp       17379 non-null  float64
 10  hum         17379 non-null  float64
 11  windspeed   17379 non-null  float64
 12  cnt         17379 non-null  int64  
dtypes: float64(4), int64(9)
memory usage: 1.7 MB


In [7]:
df = df.drop(columns=["instant", "dteday", "casual", "registered"])

In [8]:
categorical_cols = ["season", "weathersit", "mnth", "hr", "weekday", "workingday", "holiday"]
numerical_cols = [col for col in df.columns if col not in categorical_cols + ["cnt"]]

In [9]:
preprocessor = ColumnTransformer(
    transformers=[
        ("cat", OneHotEncoder(drop="first", handle_unknown="ignore"), categorical_cols)
    ],
    remainder="passthrough"
)

In [10]:
X = df.drop(columns=["cnt"])
y = df["cnt"]

In [11]:
X_processed = preprocessor.fit_transform(X)

### 2.  Train/Test Split:

In [12]:
X_train, X_test, y_train, y_test = train_test_split(X_processed, y, test_size=0.2, random_state=42)

### 3.  Baseline Model (Single Regressor):

In [13]:
dt = DecisionTreeRegressor(max_depth=6, random_state=42)
dt.fit(X_train, y_train)
dt_pred = dt.predict(X_test)
dt_rmse = np.sqrt(mean_squared_error(y_test, dt_pred))

In [14]:
lr = LinearRegression()
lr.fit(X_train, y_train)
lr_pred = lr.predict(X_test)
lr_rmse = np.sqrt(mean_squared_error(y_test, lr_pred))

In [15]:
print("Decision Tree RMSE:", dt_rmse)
print("Linear Regression RMSE:", lr_rmse)

Decision Tree RMSE: 118.45551730357617
Linear Regression RMSE: 100.44594634649623


# **Part B: Ensemble Techniques**

### 1.  Bagging (Variance Reduction):

In [16]:
from sklearn.ensemble import BaggingRegressor, GradientBoostingRegressor

In [18]:
bagging = BaggingRegressor(
    estimator=DecisionTreeRegressor(max_depth=6, random_state=42),
    n_estimators=50,
    random_state=42
)
bagging.fit(X_train, y_train)
bagging_pred = bagging.predict(X_test)
bagging_rmse = np.sqrt(mean_squared_error(y_test, bagging_pred))

print("Bagging RMSE:", bagging_rmse)

Bagging RMSE: 112.34963461581316


### 2.  Boosting (Bias Reduction): 

In [19]:
gbr = GradientBoostingRegressor(random_state=42)
gbr.fit(X_train, y_train)
gbr_pred = gbr.predict(X_test)
gbr_rmse = np.sqrt(mean_squared_error(y_test, gbr_pred))

print("Gradient Boosting RMSE:", gbr_rmse)

Gradient Boosting RMSE: 78.96518555055427


# **Part C: Stacking Regressor**

Explanation: Principle of Stacking & Meta-Learner Behavior

Stacking (Stacked Generalization) is an ensemble technique that combines the strengths of multiple diverse base models to produce a final prediction that is typically more accurate than any single model.

How Stacking Works

Level-0 learners (Base Models)
These models independently learn patterns from the same training data.
In your case:

KNN Regressor (instance-based, non-parametric)

Bagging Regressor (variance-reduction ensemble)

Gradient Boosting Regressor (bias-reduction ensemble)

Each brings different inductive biases and captures different aspects of the data.

Meta-Learner (Level-1 model)
After the base learners make their predictions, those predictions become input features for a second modelâ€”the meta-learner.

In your assignment:

Ridge Regression is the meta-learner.

How the Meta-Learner Learns Optimal Combination

The meta-learner receives:

Predictions from KNN

Predictions from Bagging

Predictions from Gradient Boosting

These are treated just like normal numeric features.

The meta-learner is trained to minimize the final prediction error (RMSE) by:

Learning which base model is more reliable for specific patterns in the data.

Adjusting weights accordingly (Ridge helps prevent overfitting by penalizing overly large weights).

Combining base-model outputs in a way that reduces bias and variance simultaneously.

In short:

Base learners provide diverse viewpoints.

The meta-learner figures out the optimal blend.

This leads to improved generalization compared to any single model or homogeneous ensemble.

![image.png](image.png)

In [20]:
from sklearn.ensemble import StackingRegressor
from sklearn.neighbors import KNeighborsRegressor
from sklearn.linear_model import Ridge

In [None]:
level0 = [
    ("knn", KNeighborsRegressor()),
    ("bagging", bagging),
    ("gbr", gbr)
]

In [22]:
meta = Ridge()

In [23]:
stack = StackingRegressor(
    estimators=level0,
    final_estimator=meta,
    passthrough=False
)

In [24]:
stack.fit(X_train, y_train)
stack_pred = stack.predict(X_test)
stack_rmse = np.sqrt(mean_squared_error(y_test, stack_pred))

print("Stacking Regressor RMSE:", stack_rmse)

Stacking Regressor RMSE: 67.05385916199337
