# Hyperparameter tuning

Machine learning models always have a set of parameters that can be set. Some examples are the maximum depth of a tree in a tree-based model, or the amount of trees to use. These are called hyperparameters, and the optimal combination depends on each individual problem.

Tuning these parameters can be done in different ways. Usually, a dictionary is set up containing the different possibilities to try out.

In [1]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
import lightgbm as lgb

In [2]:
df = pd.read_csv('data/chl_regression_tutorial.csv')
df_train, df_test = train_test_split(df, test_size=0.2, random_state=42)

features = ['rho_443_a', 'rho_492_a', 'rho_560_a', 'rho_665_a', 'rho_704_a', 'rho_740_a', 'rho_783_a', 'rho_865_a']
target = 'CHL'

X_train = df_train[features]
y_train = df_train[target]

X_test = df_test[features]
y_test = df_test[target]

## Grid search
One way to find the optimal parameters is the "brute-force" method, by trying out each individual combination. This is called grid-search, where you define a "hyperparameter grid", and evaluate the performance of each combination using cross-validation.

In [3]:
from sklearn.model_selection import GridSearchCV

model = lgb.LGBMRegressor()

hyperparameter_search_space = {
    'n_estimators': [10, 100, 1000],
    'learning_rate': [0.01, 0.1, 1],
    'max_depth': [3, 5, 7],
}

search = GridSearchCV(model, hyperparameter_search_space, n_jobs=-1, cv=5, scoring='neg_mean_squared_error', verbose=1)
search.fit(X_train, y_train)

print('Best hyperparameters:', search.best_params_)

Fitting 5 folds for each of 27 candidates, totalling 135 fits
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.000253 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 2040
[LightGBM] [Info] Number of data points in the train set: 8000, number of used features: 8
[LightGBM] [Info] Start training from score 4.527730
Best hyperparameters: {'learning_rate': 0.1, 'max_depth': 7, 'n_estimators': 1000}


We can then train the model using the found parameters and evaluate.

In [4]:
model = lgb.LGBMRegressor(**search.best_params_)
model.fit(X_train, y_train)

y_pred = model.predict(X_test)
print('Test MSE:', mean_squared_error(y_test, y_pred))

[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.000256 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 2040
[LightGBM] [Info] Number of data points in the train set: 8000, number of used features: 8
[LightGBM] [Info] Start training from score 4.527730
Test MSE: 3.017702545136178


## Random search

When your search space is large, an exhaustive search as is done in grid search can be unfeasible. In many cases, random search also does a good job, where random combinations from your set are tried out, and you retain the combination with the best results.

In [5]:
from sklearn.model_selection import RandomizedSearchCV
from scipy.stats import randint as sp_randint

model = lgb.LGBMRegressor()

hyperparameter_search_space = {
    'n_estimators': sp_randint(10, 1000),
    'learning_rate': [0.01, 0.1, 1],
    'max_depth': sp_randint(3, 10),
}

search = RandomizedSearchCV(model, hyperparameter_search_space, n_iter=10, n_jobs=-1, cv=5, scoring='neg_mean_squared_error', verbose=1)
search.fit(X_train, y_train)
print('Best hyperparameters:', search.best_params_)

model = lgb.LGBMRegressor(**search.best_params_)
model.fit(X_train, y_train)

y_pred = model.predict(X_test)
print('Test MSE:', mean_squared_error(y_test, y_pred))

Fitting 5 folds for each of 10 candidates, totalling 50 fits
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.000328 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 2040
[LightGBM] [Info] Number of data points in the train set: 8000, number of used features: 8
[LightGBM] [Info] Start training from score 4.527730
Best hyperparameters: {'learning_rate': 0.1, 'max_depth': 6, 'n_estimators': 560}
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.000360 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 2040
[LightGBM] [Info] Number of data points in the train set: 8000, number of used features: 8
[LightGBM] [Info] Start training from score 4.527730
Test MSE: 2.938300557657747


## Optuna library

There are also python packages such as Optuna that implemeny more complex hyperparameter tuning strategies. Instead of doing a random search, Optuna's algorithm tries to identify the most promising area's of the search space based on prior evaluations.

In [10]:
# using Optuna
import optuna

optuna.logging.set_verbosity(optuna.logging.ERROR)

def objective(trial):
    n_estimators = trial.suggest_int('n_estimators', 10, 1000)
    learning_rate = trial.suggest_float('learning_rate', 1e-3, 1, log=True)
    max_depth = trial.suggest_int('max_depth', 3, 10)

    model = lgb.LGBMRegressor(n_estimators=n_estimators, learning_rate=learning_rate, max_depth=max_depth)
    model.fit(X_train, y_train)

    y_pred = model.predict(X_test)
    return mean_squared_error(y_test, y_pred)

sampler = optuna.samplers.TPESampler(seed=42)
study = optuna.create_study(direction='minimize', sampler=sampler)
study.optimize(objective, n_trials=100)

[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.000113 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 2040
[LightGBM] [Info] Number of data points in the train set: 8000, number of used features: 8
[LightGBM] [Info] Start training from score 4.527730
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.000159 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 2040
[LightGBM] [Info] Number of data points in the train set: 8000, number of used features: 8
[LightGBM] [Info] Start training from score 4.527730
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.000484 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 2040
[LightGBM] [Info] Number of data points in the train set: 80

In [11]:
print('Best hyperparameters:', study.best_params)

model = lgb.LGBMRegressor(**study.best_params)
model.fit(X_train, y_train)

y_pred = model.predict(X_test)
print('Test MSE:', mean_squared_error(y_test, y_pred))

Best hyperparameters: {'n_estimators': 773, 'learning_rate': 0.14482642335895063, 'max_depth': 4}
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.000313 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 2040
[LightGBM] [Info] Number of data points in the train set: 8000, number of used features: 8
[LightGBM] [Info] Start training from score 4.527730
Test MSE: 2.7783331409279315
