# Part A

### Data Loading and Feature Engineering


In [17]:
import pandas as pd
import requests
import os
import zipfile


zip_url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/00275/Bike-Sharing-Dataset.zip'
filename = 'hour.csv'
temp_dir = 'temp_data'
zip_file_path = os.path.join(temp_dir, 'Bike-Sharing-Dataset.zip')


os.makedirs(temp_dir, exist_ok=True)


if not os.path.exists(zip_file_path):
    print(f"Downloading {zip_url}...")
    response = requests.get(zip_url)
    response.raise_for_status()
    with open(zip_file_path, 'wb') as f:
        f.write(response.content)
    print("Zip file downloaded successfully.")
else:
    print("Zip file already exists, skipping download.")


print(f"Extracting {zip_file_path}...")
with zipfile.ZipFile(zip_file_path, 'r') as zip_ref:
    zip_ref.extractall(temp_dir)
print("Zip file extracted successfully.")


df = pd.read_csv(os.path.join(temp_dir, filename))


df.drop(['instant', 'dteday', 'casual', 'registered'], axis=1, inplace=True)


categorical_cols = ['season', 'yr', 'mnth', 'hr', 'weekday', 'weathersit', 'holiday']


df = pd.get_dummies(df, columns=categorical_cols, drop_first=True)

print("DataFrame after loading, dropping columns, and one-hot encoding:")
print(df.head())
print("Shape of the DataFrame:", df.shape)

Downloading https://archive.ics.uci.edu/ml/machine-learning-databases/00275/Bike-Sharing-Dataset.zip...
Zip file downloaded successfully.
Extracting temp_data/Bike-Sharing-Dataset.zip...
Zip file extracted successfully.
DataFrame after loading, dropping columns, and one-hot encoding:
   workingday  temp   atemp   hum  windspeed  cnt  season_2  season_3  \
0           0  0.24  0.2879  0.81        0.0   16     False     False   
1           0  0.22  0.2727  0.80        0.0   40     False     False   
2           0  0.22  0.2727  0.80        0.0   32     False     False   
3           0  0.24  0.2879  0.75        0.0   13     False     False   
4           0  0.24  0.2879  0.75        0.0    1     False     False   

   season_4   yr_1  ...  weekday_1  weekday_2  weekday_3  weekday_4  \
0     False  False  ...      False      False      False      False   
1     False  False  ...      False      False      False      False   
2     False  False  ...      False      False      False      F

### Train/Test Split


In [18]:
from sklearn.model_selection import train_test_split


X = df.drop('cnt', axis=1)
y = df['cnt']


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

print("Shape of X_train:", X_train.shape)
print("Shape of X_test:", X_test.shape)
print("Shape of y_train:", y_train.shape)
print("Shape of y_test:", y_test.shape)

Shape of X_train: (13903, 53)
Shape of X_test: (3476, 53)
Shape of y_train: (13903,)
Shape of y_test: (3476,)


### Baseline Model Training and Evaluation


In [19]:
from sklearn.tree import DecisionTreeRegressor
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
import numpy as np
dtr_model = DecisionTreeRegressor(max_depth=6, random_state=42)


dtr_model.fit(X_train, y_train)


dtr_predictions = dtr_model.predict(X_test)


dtr_rmse = np.sqrt(mean_squared_error(y_test, dtr_predictions))

lr_model = LinearRegression()


lr_model.fit(X_train, y_train)


lr_predictions = lr_model.predict(X_test)


lr_rmse = np.sqrt(mean_squared_error(y_test, lr_predictions))


print(f"Decision Tree Regressor RMSE: {dtr_rmse:.4f}")
print(f"Linear Regression RMSE: {lr_rmse:.4f}")

Decision Tree Regressor RMSE: 118.4555
Linear Regression RMSE: 100.4459


# Part B

## Bagging (Variance Reduction)


In [24]:
from sklearn.ensemble import BaggingRegressor
from sklearn.tree import DecisionTreeRegressor
from sklearn.metrics import mean_squared_error
import numpy as np

decision_tree_base = DecisionTreeRegressor(max_depth=6, random_state=42)

bagging_model = BaggingRegressor(estimator=decision_tree_base, n_estimators=50, random_state=42, n_jobs=-1)

bagging_model.fit(X_train, y_train)

bagging_predictions = bagging_model.predict(X_test)

bagging_rmse = np.sqrt(mean_squared_error(y_test, bagging_predictions))

print(f"Bagging Regressor RMSE: {bagging_rmse:.4f}")
print(f"Single Decision Tree Regressor RMSE (baseline for comparison): {dtr_rmse:.4f}")

Bagging Regressor RMSE: 112.3521
Single Decision Tree Regressor RMSE (baseline for comparison): 118.4555


### Discussion: Boosting (Bias Reduction)

The Gradient Boosting Regressor achieved an RMSE of **78.9652**. This is significantly lower than the RMSE of the single Decision Tree Regressor (**118.4555**), the Linear Regression baseline (**100.4459**), and even the Bagging Regressor (**112.3521**). The substantial improvement in RMSE with Gradient Boosting demonstrates its effectiveness in achieving a better result than both the single models and the bagging ensemble.

This outcome strongly supports the hypothesis that boosting primarily targets bias reduction. Boosting works by sequentially building models, where each new model attempts to correct the errors (residuals) of the previous models. By focusing on mispredicted instances and gradually improving the model's accuracy, boosting effectively reduces the overall bias of the ensemble. The fact that Gradient Boosting outperformed the relatively simpler models and even the variance-reducing Bagging technique highlights its power in capturing complex patterns and reducing systematic errors, thereby leading to a more accurate and less biased prediction.

## Boosting (Bias Reduction)


In [25]:
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.metrics import mean_squared_error
import numpy as np

gbr_model = GradientBoostingRegressor(n_estimators=100, learning_rate=0.1, max_depth=3, random_state=42)

gbr_model.fit(X_train, y_train)

gbr_predictions = gbr_model.predict(X_test)

gbr_rmse = np.sqrt(mean_squared_error(y_test, gbr_predictions))

print(f"Gradient Boosting Regressor RMSE: {gbr_rmse:.4f}")
print(f"Single Decision Tree Regressor RMSE: {dtr_rmse:.4f}")
print(f"Linear Regression RMSE (Baseline): {lr_rmse:.4f}")
print(f"Bagging Regressor RMSE: {bagging_rmse:.4f}")

Gradient Boosting Regressor RMSE: 78.9652
Single Decision Tree Regressor RMSE: 118.4555
Linear Regression RMSE (Baseline): 100.4459
Bagging Regressor RMSE: 112.3521


### Discussion: Bagging (Variance Reduction)

The Bagging Regressor achieved an RMSE of **112.3521**, which is notably lower than the RMSE of the single Decision Tree Regressor (**118.4555**). This reduction in RMSE suggests that the bagging technique was effective in reducing the variance of the model. Bagging works by training multiple base estimators on different subsets of the training data (bootstrap samples) and then averaging their predictions. This process helps to reduce the impact of individual noisy predictions from highly variant models like Decision Trees, thereby smoothing out the overall prediction and reducing variance without significantly increasing bias. The improvement observed here aligns with the hypothesis that bagging primarily targets variance reduction.

# Part C

## Stacking Principle Explained

Stacking (Stacked Generalization) is an advanced ensemble learning technique that combines multiple diverse models, called **Base Learners**, and uses another model, called a **Meta-Learner** (or blender), to learn how to optimally combine their predictions. The core idea behind Stacking is to harness the strengths of different models by allowing a meta-learner to make the final prediction based on the outputs of the base learners.

### Role of Base Learners

Base Learners are the individual models that are trained on the original dataset (or subsets of it). These models can be of different types (e.g., Decision Trees, Linear Regression, Support Vector Machines, Neural Networks) and are chosen for their diverse predictive capabilities. Each base learner makes its own set of predictions on the data. For instance, in a regression problem, each base learner would output a predicted numerical value.

### Role of the Meta-Learner

The Meta-Learner is a model that is trained on the *predictions* generated by the base learners, rather than directly on the original features. The predictions from the base learners serve as the input features for the meta-learner. The meta-learner then learns to weigh or combine these predictions to produce a final, more robust prediction. A common practice is to use a simple model like Linear Regression, Logistic Regression, or a Ridge Classifier as the meta-learner to prevent overfitting, but more complex models can also be used.

### How it Works (in a nutshell):

1.  **Training Base Learners**: Multiple diverse base models are trained on the training data.
2.  **Generating Meta-Features**: Each base learner makes predictions on a *different fold* of the training data (e.g., using k-fold cross-validation) to avoid data leakage. These predictions become the input features for the meta-learner. The base learners also make predictions on the test set.
3.  **Training Meta-Learner**: The meta-learner is trained on the meta-features (predictions from base learners) and the true labels of the training data.
4.  **Final Prediction**: For new, unseen data, the base learners first make their predictions. These predictions are then fed into the trained meta-learner, which produces the final output.

### Advantages of Stacking

-   **Leverages Diverse Strengths**: Stacking effectively combines the unique predictive powers of different base models. If one model is good at capturing certain patterns and another excels at different ones, the meta-learner can learn to optimally utilize both.
-   **Improved Performance**: By learning how to best combine predictions, Stacking often leads to better predictive performance (e.g., lower RMSE for regression or higher accuracy for classification) compared to individual base models or simpler ensemble methods like Bagging or Boosting.
-   **Reduced Bias and Variance**: It can help in reducing both bias (by combining strong learners) and variance (by averaging out errors, similar to bagging, but with an intelligent weighted combination).


## Implement Stacking Regressor


In [26]:
from sklearn.neighbors import KNeighborsRegressor
from sklearn.linear_model import Ridge
from sklearn.ensemble import StackingRegressor, BaggingRegressor, GradientBoostingRegressor
from sklearn.tree import DecisionTreeRegressor

knn_model = KNeighborsRegressor(n_neighbors=5)

bagging_base_estimator = DecisionTreeRegressor(max_depth=6, random_state=42)
bagging_model = BaggingRegressor(estimator=bagging_base_estimator, n_estimators=50, random_state=42, n_jobs=-1)

gbr_model = GradientBoostingRegressor(n_estimators=100, learning_rate=0.1, max_depth=3, random_state=42)

meta_learner = Ridge(random_state=42)

estimation = [
    ('knn', knn_model),
    ('bagging', bagging_model),
    ('gbr', gbr_model)
]

stacking_regressor = StackingRegressor(estimators=estimation, final_estimator=meta_learner, n_jobs=-1)

stacking_regressor.fit(X_train, y_train)

print("Stacking Regressor initialized and fitted.")

Stacking Regressor initialized and fitted.


## Evaluate Stacking Regressor


In [27]:
from sklearn.metrics import mean_squared_error
import numpy as np

stacking_predictions = stacking_regressor.predict(X_test)

stacking_rmse = np.sqrt(mean_squared_error(y_test, stacking_predictions))

print(f"Stacking Regressor RMSE: {stacking_rmse:.4f}")
print(f"Single Decision Tree Regressor RMSE: {dtr_rmse:.4f}")
print(f"Linear Regression RMSE (Baseline): {lr_rmse:.4f}")
print(f"Bagging Regressor RMSE: {bagging_rmse:.4f}")
print(f"Gradient Boosting Regressor RMSE: {gbr_rmse:.4f}")

Stacking Regressor RMSE: 67.0291
Single Decision Tree Regressor RMSE: 118.4555
Linear Regression RMSE (Baseline): 100.4459
Bagging Regressor RMSE: 112.3521
Gradient Boosting Regressor RMSE: 78.9652


### Discussion: Stacking Regressor Performance

The Stacking Regressor achieved an RMSE of **67.0291**. This is the lowest RMSE among all the models implemented:
-   Single Decision Tree Regressor: **118.4555**
-   Linear Regression (Baseline): **100.4459**
-   Bagging Regressor: **112.3521**
-   Gradient Boosting Regressor: **78.9652**


# Part D

### Comparative Table of Model Performance (RMSE)

| Model                               | RMSE       |
| :---------------------------------- | :--------- |
| Single Decision Tree Regressor      | 118.4555   |
| Linear Regression (Baseline)        | 100.4459   |
| Bagging Regressor                   | 112.3521   |
| Gradient Boosting Regressor         | 78.9652    |
| **Stacking Regressor**              | **67.0291**|

### Conclusion

Based on the Root Mean Squared Error (RMSE) values, the **Stacking Regressor** is the best-performing model, achieving an RMSE of **67.0291**. This significantly outperforms all other models, including the single models (Decision Tree and Linear Regression) and the other ensemble techniques (Bagging and Gradient Boosting).

#### Why Stacking Outperformed the Single Model Baseline:

The Stacking Regressor's superior performance can be attributed to its sophisticated approach to ensemble learning, which effectively leverages the principles of the bias-variance trade-off and model diversity:

1.  **Bias-Variance Trade-off**: The individual base learners (K-Nearest Neighbors, Bagging Regressor, and Gradient Boosting Regressor) inherently possess different strengths and weaknesses concerning bias and variance. For instance:
    *   **Decision Trees** (and thus the Bagging Regressor's base estimators) are high-variance, low-bias models. Bagging effectively reduces this variance by averaging the predictions of multiple diverse trees.
    *   **Gradient Boosting** primarily targets bias reduction by sequentially building models that correct the errors of previous models, thereby reducing systematic errors.
    *   **K-Nearest Neighbors** can be sensitive to local data structure and feature scaling, offering another distinct predictive perspective.
    
    Stacking allows the Meta-Learner (Ridge Regression in this case) to learn the optimal way to combine these diverse predictions. The meta-learner acts as a sophisticated weighted average or a more complex function that can 'learn' when to trust certain base models over others, effectively minimizing the combined bias and variance that the individual models or simpler ensembles might leave behind. It seeks a sweet spot in the bias-variance trade-off by intelligently integrating models that address different aspects of prediction error.

2.  **Model Diversity**: The key to Stacking's success is the diversity of its base learners. By including models with fundamentally different learning paradigms (e.g., instance-based learning with KNN, tree-based ensembles with Bagging and Gradient Boosting), the Stacking Regressor captures a wider range of patterns and relationships within the data. Each base learner might excel at predicting different subsets of the data or handling different types of relationships. The meta-learner then synthesizes these varied 'opinions' to produce a more robust and accurate final prediction than any single model could achieve. This diversity ensures that the errors made by one base model are not systematically repeated by others, allowing the meta-learner to correct for them and reduce the overall prediction error.