## Hyperparameter Tuning

In our pursuit of optimizing predictive performance for California housing price prediction, we turn our attention towards hyperparameter tuning. 

Hyperparameters play a pivotal role in shaping the behavior and performance of machine learning models, and fine-tuning them can lead to significant improvements in predictive accuracy and generalization.

#### Loading and preparing the data

In [1]:
from sklearn.datasets import  fetch_california_housing
import pandas as pd
import numpy as np

import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeRegressor
from sklearn.ensemble import BaggingRegressor, RandomForestRegressor,AdaBoostRegressor, GradientBoostingRegressor

from sklearn.model_selection import GridSearchCV
from sklearn.model_selection import RandomizedSearchCV

from sklearn.preprocessing import MinMaxScaler, StandardScaler
from sklearn.metrics import r2_score, mean_absolute_error, mean_squared_error

In [2]:
california = fetch_california_housing()
print(california["DESCR"])

.. _california_housing_dataset:

California Housing dataset
--------------------------

**Data Set Characteristics:**

:Number of Instances: 20640

:Number of Attributes: 8 numeric, predictive attributes and the target

:Attribute Information:
    - MedInc        median income in block group
    - HouseAge      median house age in block group
    - AveRooms      average number of rooms per household
    - AveBedrms     average number of bedrooms per household
    - Population    block group population
    - AveOccup      average number of household members
    - Latitude      block group latitude
    - Longitude     block group longitude

:Missing Attribute Values: None

This dataset was obtained from the StatLib repository.
https://www.dcc.fc.up.pt/~ltorgo/Regression/cal_housing.html

The target variable is the median house value for California districts,
expressed in hundreds of thousands of dollars ($100,000).

This dataset was derived from the 1990 U.S. census, using one row per ce

In [3]:
df_cali = pd.DataFrame(california["data"], columns = california["feature_names"])
df_cali["median_house_value"] = california["target"]

df_cali.head()

Unnamed: 0,MedInc,HouseAge,AveRooms,AveBedrms,Population,AveOccup,Latitude,Longitude,median_house_value
0,8.3252,41.0,6.984127,1.02381,322.0,2.555556,37.88,-122.23,4.526
1,8.3014,21.0,6.238137,0.97188,2401.0,2.109842,37.86,-122.22,3.585
2,7.2574,52.0,8.288136,1.073446,496.0,2.80226,37.85,-122.24,3.521
3,5.6431,52.0,5.817352,1.073059,558.0,2.547945,37.85,-122.25,3.413
4,3.8462,52.0,6.281853,1.081081,565.0,2.181467,37.85,-122.25,3.422


#### Normalization & Feature Selection

Like we did in Feature Engineering lesson, we are going to normalize our data and select a subset of columns as our features.

#### Train Test Split

In [4]:
features = df_cali.drop(columns = ["median_house_value","AveOccup", "Population", "AveBedrms"])
target = df_cali["median_house_value"]

In [5]:
X_train, X_test, y_train, y_test = train_test_split(features, target, test_size = 0.20, random_state=0)

Create an instance of the normalizer

In [6]:
normalizer = MinMaxScaler()

normalizer.fit(X_train)

In [7]:
X_train_norm = normalizer.transform(X_train)

X_test_norm = normalizer.transform(X_test)

In [8]:
X_train_norm = pd.DataFrame(X_train_norm, columns = X_train.columns)
X_test_norm = pd.DataFrame(X_test_norm, columns = X_test.columns)

# Grid Search

**Grid Search** - we define a grid of hyperparameter values we want to try. Grid Search tries all possible combinations.

So far, our best model was AdaBoost yield a R-Squared of 0.83.


Let's see how we fine tune our model, in order to that, we will optimize the following hyperparameters:

- **n_estimators:** number of estimators, in this case, number of trees

- **max_leaf_nodes:** maxium number of total leafs to consider

- **max_depth:** maxium number of levels in each tree

- First we define the grid with values to consider when train all possible combinations.

In [9]:
grid = {"n_estimators": [50, 100, 500],
        "estimator__max_leaf_nodes": [250, 500, 1000, None],
        "estimator__max_depth":[10,30,50]}

In [10]:
ada_reg = AdaBoostRegressor(DecisionTreeRegressor())

In [12]:
model = GridSearchCV(estimator = ada_reg, param_grid = grid, cv=5, verbose=4, n_jobs = -1)

In [13]:
model.fit(X_train_norm, y_train)

Fitting 5 folds for each of 36 candidates, totalling 180 fits
[CV 1/5] END estimator__max_depth=10, estimator__max_leaf_nodes=250, n_estimators=50;, score=0.776 total time=   2.9s
[CV 2/5] END estimator__max_depth=10, estimator__max_leaf_nodes=250, n_estimators=50;, score=0.795 total time=   2.9s
[CV 4/5] END estimator__max_depth=10, estimator__max_leaf_nodes=250, n_estimators=50;, score=0.780 total time=   2.9s
[CV 3/5] END estimator__max_depth=10, estimator__max_leaf_nodes=250, n_estimators=50;, score=0.778 total time=   2.9s
[CV 5/5] END estimator__max_depth=10, estimator__max_leaf_nodes=250, n_estimators=50;, score=0.782 total time=   2.9s
[CV 1/5] END estimator__max_depth=10, estimator__max_leaf_nodes=250, n_estimators=100;, score=0.780 total time=   5.3s
[CV 2/5] END estimator__max_depth=10, estimator__max_leaf_nodes=250, n_estimators=100;, score=0.791 total time=   5.3s
[CV 3/5] END estimator__max_depth=10, estimator__max_leaf_nodes=250, n_estimators=100;, score=0.775 total time

- After training, we check what are the best values for the hyperparameters that we have tested.

In [15]:
model.best_params_

{'estimator__max_depth': 50,
 'estimator__max_leaf_nodes': None,
 'n_estimators': 500}

- You can retrieve the best model with the best parameters when accessing **best_estimator_** attribute

In [16]:
best_model = model.best_estimator_

- Evaluate our model

In [17]:
pred = best_model.predict(X_test_norm)

print("MAE", mean_absolute_error(pred, y_test))
print("RMSE", mean_squared_error(pred, y_test, squared=False))
print("R2 score", best_model.score(X_test_norm, y_test))

MAE 0.287548003875969
RMSE 0.4635229770149839




R2 score 0.8352293090509296


# Random Search

**Random Search** - we define probability distributions for each hyperparameter, from which random values are sampled. It’s up to the researcher to set the maximum number of combinations.

In [18]:
grid = {"n_estimators": [int(x) for x in np.linspace(start = 200, stop = 2000, num = 10)],
        "estimator__max_leaf_nodes": [int(x) for x in np.linspace(start = 500, stop = 3000, num = 10)],
        "estimator__max_depth":[int(x) for x in np.linspace(10, 110, num = 11)]}

In [19]:
ada_reg = AdaBoostRegressor(DecisionTreeRegressor())

model = RandomizedSearchCV(estimator = ada_reg, param_distributions = grid, n_iter = 10, cv = 5, n_jobs = -1)

In [20]:
model.fit(X_train_norm,y_train)

In [22]:
model.best_params_

{'n_estimators': 1000,
 'estimator__max_leaf_nodes': 2722,
 'estimator__max_depth': 50}

- You can retrieve the best model with the best parameters when accessing **best_estimator_** attribute

In [24]:
best_model = model.best_estimator_

- Evaluate our model

In [25]:
pred = best_model.predict(X_test_norm)

print("MAE", mean_absolute_error(pred, y_test))
print("RMSE", mean_squared_error(pred, y_test, squared=False))
print("R2 score", best_model.score(X_test_norm, y_test))

MAE 0.2904758570574426
RMSE 0.46306714366719065




R2 score 0.8355532241464539


We dont guarantee these hyperparameters are optimal! We can just guarantee that these are the best from the ones we tried!