# 📝 Exercise M6.01

The aim of this notebook is to investigate if we can tune the hyperparameters
of a bagging regressor and evaluate the gain obtained.

We will load the California housing dataset and split it into a training and
a testing set.

In [1]:
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split

data, target = fetch_california_housing(as_frame=True, return_X_y=True)
target *= 100  # rescale the target in k$
data_train, data_test, target_train, target_test = train_test_split(
    data, target, random_state=0, test_size=0.5)

<div class="admonition note alert alert-info">
<p class="first admonition-title" style="font-weight: bold;">Note</p>
<p class="last">If you want a deeper overview regarding this dataset, you can refer to the
Appendix - Datasets description section at the end of this MOOC.</p>
</div>

Create a `BaggingRegressor` and provide a `DecisionTreeRegressor`
to its parameter `base_estimator`. Train the regressor and evaluate its
statistical performance on the testing set using the mean absolute error.

In [3]:
# Write your code here.
from sklearn.ensemble import BaggingRegressor
from sklearn.tree import DecisionTreeRegressor
from sklearn.metrics import mean_absolute_error

bagging_reg = BaggingRegressor(base_estimator=DecisionTreeRegressor())
bagging_reg.fit(data_train, target_train)

preds = bagging_reg.predict(data_test)
print(f"MAE: {mean_absolute_error(target_test, preds)}")

MAE: 36.435431531007744


Now, create a `RandomizedSearchCV` instance using the previous model and
tune the important parameters of the bagging regressor. Find the best
parameters  and check if you are able to find a set of parameters that
improve the default regressor still using the mean absolute error as a
metric.

<div class="admonition tip alert alert-warning">
<p class="first admonition-title" style="font-weight: bold;">Tip</p>
<p class="last">You can list the bagging regressor's parameters using the <tt class="docutils literal">get_params</tt>
method.</p>
</div>

In [17]:
# Write your code here.
from sklearn.model_selection import RandomizedSearchCV
from sklearn.ensemble import BaggingRegressor
import numpy as np

params = {"n_estimators": range(1,100,10)}

clf = RandomizedSearchCV(BaggingRegressor(), param_distributions=params)
search = clf.fit(data_train, target_train)
preds = search.best_estimator_.predict(data_test)
print("Best Parameters: ", search.best_params_, 
      "\nBest Score: ", search.best_score_)
print(f"MAE with best parameters: {mean_absolute_error(target_test, preds)}")

Best Parameters:  {'n_estimators': 91} 
Best Score:  0.7920353231042266
MAE with best parameters: 34.58756028303093


We see that the bagging regressor provides a predictor in which fine tuning
is not as important as in the case of fitting a single decision tree.