# Hyperparameter tuning

Machine learning models always have a set of parameters that can be set. Some examples are the maximum depth of a tree in a tree-based model, or the amount of trees to use. These are called hyperparameters, and the optimal combination depends on each individual problem.

Tuning these parameters can be done in different ways. Usually, a dictionary is set up containing the different possibilities to try out.

In [10]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
import lightgbm as lgb

In [11]:
df = pd.read_csv('data/chl_regression_tutorial.csv')
df_train, df_test = train_test_split(df, test_size=0.2, random_state=42)

features = ['rho_443_a', 'rho_492_a', 'rho_560_a', 'rho_665_a', 'rho_704_a', 'rho_740_a', 'rho_783_a', 'rho_865_a']
target = 'CHL'

X_train = df_train[features]
y_train = df_train[target]

X_test = df_test[features]
y_test = df_test[target]

## Grid search
One way to find the optimal parameters is the "brute-force" method, by trying out each individual combination. This is called grid-search, where you define a "hyperparameter grid", and evaluate the performance of each combination using cross-validation.

In [22]:
from sklearn.model_selection import GridSearchCV

model = lgb.LGBMRegressor()

hyperparameter_search_space = {
    'n_estimators': [10, 100, 1000],
    'learning_rate': [0.01, 0.1, 1],
    'max_depth': [3, 5, 7],
}

search = GridSearchCV(model, hyperparameter_search_space, n_jobs=-1, cv=5, scoring='neg_mean_squared_error', verbose=1)
search.fit(X_train, y_train)

print('Best hyperparameters:', search.best_params_)

Fitting 5 folds for each of 27 candidates, totalling 135 fits
Best hyperparameters: {'learning_rate': 0.1, 'max_depth': 7, 'n_estimators': 1000}


We can then train the model using the found parameters and evaluate.

In [23]:
model = lgb.LGBMRegressor(**search.best_params_)
model.fit(X_train, y_train)

y_pred = model.predict(X_test)
print('Test MSE:', mean_squared_error(y_test, y_pred))

Test MSE: 3.017702545136178


## Random search

When your search space is large, an exhaustive search as is done in grid search can be unfeasible. In many cases, random search also does a good job, where random combinations from your set are tried out, and you retain the combination with the best results.

In [24]:
from sklearn.model_selection import RandomizedSearchCV
from scipy.stats import randint as sp_randint

model = lgb.LGBMRegressor()

hyperparameter_search_space = {
    'n_estimators': sp_randint(10, 1000),
    'learning_rate': [0.01, 0.1, 1],
    'max_depth': sp_randint(3, 10),
}

search = RandomizedSearchCV(model, hyperparameter_search_space, n_iter=10, n_jobs=-1, cv=5, scoring='neg_mean_squared_error', verbose=1)
search.fit(X_train, y_train)
print('Best hyperparameters:', search.best_params_)

model = lgb.LGBMRegressor(**search.best_params_)
model.fit(X_train, y_train)

y_pred = model.predict(X_test)
print('Test MSE:', mean_squared_error(y_test, y_pred))

Fitting 5 folds for each of 10 candidates, totalling 50 fits
Best hyperparameters: {'learning_rate': 0.1, 'max_depth': 7, 'n_estimators': 416}
Test MSE: 2.971126907243025


## Optuna library

There are also python packages such as Optuna that implemeny more complex hyperparameter tuning strategies. Instead of doing a random search, Optuna's algorithm tries to identify the most promising area's of the search space based on prior evaluations.

In [31]:
# using Optuna
import optuna

optuna.logging.set_verbosity(optuna.logging.INFO)

def objective(trial):
    n_estimators = trial.suggest_int('n_estimators', 10, 1000)
    learning_rate = trial.suggest_float('learning_rate', 1e-3, 1, log=True)
    max_depth = trial.suggest_int('max_depth', 3, 10)

    model = lgb.LGBMRegressor(n_estimators=n_estimators, learning_rate=learning_rate, max_depth=max_depth)
    model.fit(X_train, y_train)

    y_pred = model.predict(X_test)
    return mean_squared_error(y_test, y_pred)

sampler = optuna.samplers.TPESampler(seed=42)
study = optuna.create_study(direction='minimize', sampler=sampler)
study.optimize(objective, n_trials=100)

print('Best hyperparameters:', study.best_params)

model = lgb.LGBMRegressor(**study.best_params)
model.fit(X_train, y_train)

y_pred = model.predict(X_test)
print('Test MSE:', mean_squared_error(y_test, y_pred))

[I 2024-10-30 16:03:31,147] A new study created in memory with name: no-name-d6ef3045-1624-4a6d-984c-98c7d6daa4a2
[I 2024-10-30 16:03:31,461] Trial 0 finished with value: 3.9582895354072942 and parameters: {'n_estimators': 381, 'learning_rate': 0.711447600934342, 'max_depth': 8}. Best is trial 0 with value: 3.9582895354072942.
[I 2024-10-30 16:03:31,733] Trial 1 finished with value: 6.276020288307635 and parameters: {'n_estimators': 603, 'learning_rate': 0.0029380279387035343, 'max_depth': 4}. Best is trial 0 with value: 3.9582895354072942.
[I 2024-10-30 16:03:31,790] Trial 2 finished with value: 3.5012417575314347 and parameters: {'n_estimators': 67, 'learning_rate': 0.39676050770529875, 'max_depth': 7}. Best is trial 2 with value: 3.5012417575314347.
[I 2024-10-30 16:03:32,406] Trial 3 finished with value: 9.618441508331184 and parameters: {'n_estimators': 711, 'learning_rate': 0.00115279871282324, 'max_depth': 10}. Best is trial 2 with value: 3.5012417575314347.
[I 2024-10-30 16:03:

Best hyperparameters: {'n_estimators': 773, 'learning_rate': 0.14482642335895063, 'max_depth': 4}
Test MSE: 2.7783331409279315
