# Hyperparameter Sweeps
A notebook on optimising simpler models with hyperparameter sweeps to find optimum parameters for a model.

We will use a simple Random Forests Regressor model on the sklearn diabetes dataset.

In [9]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, r2_score
from sklearn.datasets import load_diabetes
from sklearn.ensemble import RandomForestRegressor
from itertools import product
from dataclasses import dataclass
from tqdm import tqdm

In [18]:
dataset_df = load_diabetes(as_frame=True).frame
training_data, test_data = train_test_split(dataset_df, test_size=0.2, random_state=42)
training_data_X = training_data.drop(columns=["target"])
training_data_y = training_data["target"]
test_data_X = test_data.drop(columns=["target"])
test_data_y = test_data["target"]

In [6]:
@dataclass
class Model:
    model: object
    n_estimators: int
    max_depth: int
    y_pred: np.ndarray
    rmse: float
    r2_score: float


In [41]:
def run_model(X_train, X_test, y_train, y_test, n_estimators, max_depth, random_state=42):
    model = RandomForestRegressor(n_estimators=n_estimators, max_depth=max_depth, random_state=random_state)
    model.fit(X_train, y_train)
    y_pred = model.predict(X_test)
    rmse = np.sqrt(mean_squared_error(y_test, y_pred))
    r2 = r2_score(y_test, y_pred)
    return Model(model, n_estimators, max_depth, y_pred, rmse, r2)

## Grid search - all products

Now we create the permutations of hyperparameters that we want to run. We will test every combination of `n_estimators` and `max_depth` in the range we are interested in. This is a somewhat thorough but inefficient approach.

In [14]:
hyperparameter_n_estimators = range(50, 1050, 50)
hyperparameter_max_depth = range(1, 21)

hyperparameters = list(product(hyperparameter_n_estimators, hyperparameter_max_depth))

In [40]:
models = []
for n_estimators, max_depth in tqdm(hyperparameters):
    model = run_model(training_data_X, test_data_X, training_data_y, test_data_y, n_estimators, max_depth)
    models.append(model)

 68%|██████▊   | 270/400 [03:59<01:55,  1.13it/s]


KeyboardInterrupt: 

In [24]:
best_model_rmse = min([model.rmse for model in models])
best_model_r2 = max([model.r2_score for model in models])

best_model = [model for model in models if model.rmse == best_model_rmse][0]

print(f"Best model RMSE: {best_model_rmse}")
print(f"Best model R2: {best_model_r2}")
print(f"Best model n_estimators: {best_model.n_estimators}")
print(f"Best model max_depth: {best_model.max_depth}")

Best model RMSE: 52.63947369649441
Best model R2: 0.4770036188903538
Best model n_estimators: 300
Best model max_depth: 3


## Random Sampling
Surprisingly, this is often considered the best approach for hyperparameter tuning.

In [42]:
np.random.seed(42)
hyperparameters = []
for _ in range(400):
    n_estimators = np.random.randint(50, 1050)
    max_depth = np.random.randint(1, 21)
    hyperparameters.append((n_estimators, max_depth))


In [43]:
models = []
for n_estimators, max_depth in tqdm(hyperparameters):
    model = run_model(training_data_X, test_data_X, training_data_y, test_data_y, n_estimators, max_depth)
    models.append(model)

100%|██████████| 400/400 [08:14<00:00,  1.24s/it]


In [44]:
best_model_rmse = min([model.rmse for model in models])
best_model_r2 = max([model.r2_score for model in models])

best_model = [model for model in models if model.rmse == best_model_rmse][0]

print(f"Best model RMSE: {best_model_rmse}")
print(f"Best model R2: {best_model_r2}")
print(f"Best model n_estimators: {best_model.n_estimators}")
print(f"Best model max_depth: {best_model.max_depth}")

Best model RMSE: 52.61843998996702
Best model R2: 0.47742149368353737
Best model n_estimators: 282
Best model max_depth: 3


## Bayesian Optimisation

In [33]:
from skopt import BayesSearchCV
from skopt.space import Integer

np.int = np.int64  # Workaround for old version of skopt

In [34]:
optimiser = BayesSearchCV(
    RandomForestRegressor(),
    {
        "n_estimators": Integer(50, 1000),
        "max_depth": Integer(1, 20),
    },
    n_iter=50,
    cv=5,
    random_state=42
)


In [35]:
optimiser.fit(training_data_X, training_data_y)



In [36]:
optimiser.best_params_

OrderedDict([('max_depth', 4), ('n_estimators', 739)])

In [39]:
optimiser.best_score_

0.41542172750369655

## Alternative Approaches
The remaining common approach to hyperparameter tuning is to use a gradient descent optimiser. While this is highly effective as a technique for deep learning model weights and biases, it has significant drawbacks in hyperparameter sweeps:
1. Gradient descent assumes that the solution space is non-convex
2. Gradient descent requires a "smooth" space.

hyperparameter space is often non-convex, resulting in a high likelihood of local minima or non-convergence. Furthermore, it is non "smooth", and it is likely that near-zero gradients will be fed into the descent algorithm.