[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://github.com/datarootsio/rootlabs-hyperparameter-optimization/blob/main/tutorial_notebook.ipynb)

**Set-up**


In [None]:
%%capture
!pip install optuna


In [None]:
# Import Packages
## for data and preprocessing
import numpy as np
import pandas as pd
from sklearn.preprocessing import StandardScaler

## for model fitting
import lightgbm as lgb
import xgboost as xgb
import sklearn.metrics as metric

## for hyperparameter optimization
import optuna

In [None]:
train = pd.read_csv('/content/sample_data/california_housing_train.csv')
test = pd.read_csv('/content/sample_data/california_housing_test.csv')

names = train.columns

scaler = StandardScaler()
train = pd.DataFrame(scaler.fit_transform(train),columns=names)
test = pd.DataFrame(scaler.transform(test), columns=names)


X_train = train.drop(['median_house_value'],axis=1)
X_test  = test.drop(['median_house_value'],axis=1)
y_train = train.median_house_value
y_test  = test.median_house_value

# **OPTUNA**

## **General Overview**

Optuna optimizes any objective function. This objective function takes a set of arguments (e.g., hyperparameters) and returns a single value (e.g., validation score).  

In Optuna, we create a study. A Study consists of a set of Trials. A study is defined by the objective function and the hyperparameter space. 
Each trial is, thus, a single selection from the hyperparameter space for which we evaluate the objective function.

The optimization algorithm helps in intelligently picking the next trial to evaluate in a smart(er) way, until we find the optimal value.

In practice, every hyperparameter optimization exercise consist of 4 steps:

* define a function which **trains a model** and **returns the validation score**

* define the **hyperparameter space** through which the optimization algorithm can search (trials are instances/realizations of this space)

* create a **study**, which describes the optimization exercise: 
    * *Direction* : 
        * minimize: for (Root) Mean Squared Errors, minus-log-likelihood, ...
        * maximize: r2_score, auc, accuracy, precision, recall, f1_score, ...
    * *Sampler* : the chosen optimization technique **(Optimization)**
    * *Pruner* : early stopping of unpromising trials **(Steroids)**

* **optimize** the study using different trials in a smart way **(worker function)**


In [None]:
%%script false --no-raise-error
# STEP 1 #
#========#

def train_evaluate(params):
    train_data = lgb.Dataset(X_train, label=y_train)
    test_data = lgb.Dataset(X_test, label=y_test, reference=train_data)
    # Train a Model
    model = lgb.train(params, train_data,
                      num_boost_round=params['NUM_BOOST_ROUND'],
                      early_stopping_rounds=params['EARLY_STOPPING_ROUNDS'],
                      valid_sets=[test_data],
                      valid_names=['valid'],
                      )
    # Evaluate the model
    preds = model.predict(test_data,num_iteration=model.best_iteration)
    truth = test_data.get_label()
    score = metric.mean_squared_error(truth, preds, squared=False)
      
    #score = model.best_score['valid']['rmse']
    # Return the validation score
    return score

# STEP 2 #
#========#

def objective(trial):
    # Define the Hyper-parameter Space
    params = {'learning_rate': trial.suggest_loguniform('learning_rate', 0.01, 0.5),
              'max_depth': trial.suggest_int('max_depth', 1, 30, 1),
              'num_leaves': trial.suggest_int('num_leaves', 2, 100),
              'min_data_in_leaf': trial.suggest_int('min_data_in_leaf', 10, 100),
              'feature_fraction': trial.suggest_uniform('feature_fraction', 0.1, 1.0),
              'subsample': trial.suggest_discrete_uniform('subsample', 0.1, 1.0,.1),
              'colsample_by_tree': 1,
              'lambda_l1': trial.suggest_float('lambda_l1', 0, 10),
              'lambda_l2': trial.suggest_float('lambda_l2', 0, 10),
              'NUM_BOOST_ROUND': 200,
              'EARLY_STOPPING_ROUNDS': 20,
              'objective': 'rmse',
              }
    # Train the model and return the validation score
    score = train_evaluate(params)
    # Return the validation score
    return score

# STEP 3 #
#========#

study = optuna.create_study(
    direction = 'minimize',
    sampler = optuna.samplers.RandomSampler(),      # GridSampler, RandomSampler, CmaEsSampler, TPESampler (default), ...
    pruner = optuna.pruners.NopPruner()             # NopPruner, MedianPruner (default), SuccessiveHalvingPruner, HyperbandPruner,...
    )

# STEP 4 #
#========#

study.optimize(objective, n_trials=100)

In [None]:
N_TRIALS = 200

## **Grid Search**

In [None]:
%%time
%%capture

def train_evaluate(params):
    #X_train, X_valid, y_train, y_valid = train_test_split(X, y, test_size=0.2, random_state=1234)

    train_data = lgb.Dataset(X_train, label=y_train)
    test_data = lgb.Dataset(X_test, label=y_test, reference=train_data)

    model = lgb.train(params, train_data,
                      num_boost_round=params['NUM_BOOST_ROUND'],
                      early_stopping_rounds=params['EARLY_STOPPING_ROUNDS'],
                      valid_sets=[test_data],
                      valid_names=['valid'],
                      verbose_eval=params['verbose']
                      )
    preds = model.predict(X_test,num_iteration=model.best_iteration)
    truth = test_data.get_label()
    score = metric.mean_squared_error(truth, preds, squared=False)
    return score

def objective(trial):
    # Define the Hyper-parameter Space
    params = {'learning_rate': trial.suggest_float('learning_rate', 0.01, 0.5),
              'max_depth': trial.suggest_int('max_depth', 1, 50),
              'num_leaves': trial.suggest_int('num_leaves', 2, 200),
              'NUM_BOOST_ROUND': 200,
              'EARLY_STOPPING_ROUNDS': 20,
              'objective': 'rmse',
              'verbose': -1,
              }
    
    score = train_evaluate(params)
    return score

search_space = {'learning_rate': [0.01, 0.10, 0.50],
              'max_depth': [1, 10, 20, 30],
              'num_leaves': [2, 10, 20, 100]}
study1 = optuna.create_study(
    direction='minimize',
    sampler=optuna.samplers.GridSampler(search_space)
    )
study1.optimize(objective, n_trials=N_TRIALS)

[32m[I 2021-10-07 01:21:12,512][0m A new study created in memory with name: no-name-e2b93d56-ffbe-4cd1-8542-e01b4f5164df[0m
[32m[I 2021-10-07 01:21:14,180][0m Trial 0 finished with value: 0.4737538089007302 and parameters: {'learning_rate': 0.01, 'max_depth': 20, 'num_leaves': 100}. Best is trial 0 with value: 0.4737538089007302.[0m
[32m[I 2021-10-07 01:21:14,772][0m Trial 1 finished with value: 0.5502752208435101 and parameters: {'learning_rate': 0.01, 'max_depth': 20, 'num_leaves': 20}. Best is trial 0 with value: 0.4737538089007302.[0m
[32m[I 2021-10-07 01:21:15,968][0m Trial 2 finished with value: 0.402072317736397 and parameters: {'learning_rate': 0.1, 'max_depth': 30, 'num_leaves': 100}. Best is trial 2 with value: 0.402072317736397.[0m
[32m[I 2021-10-07 01:21:16,110][0m Trial 3 finished with value: 0.5988083394846583 and parameters: {'learning_rate': 0.1, 'max_depth': 1, 'num_leaves': 20}. Best is trial 2 with value: 0.402072317736397.[0m
[32m[I 2021-10-07 01:21:

CPU times: user 35.3 s, sys: 836 ms, total: 36.2 s
Wall time: 18.9 s


In [None]:
gridsearch = {'score': study1.best_value, 'params': study1.best_params}
print(gridsearch)

{'score': 0.4009600455057644, 'params': {'learning_rate': 0.1, 'max_depth': 10, 'num_leaves': 100}}


## **Random Search**

In [None]:
%%time
%%capture
def train_evaluate(params):
    train_data = lgb.Dataset(X_train, label=y_train)
    test_data = lgb.Dataset(X_test, label=y_test, reference=train_data)
    # Train a Model
    model = lgb.train(params, train_data,
                      num_boost_round=params['NUM_BOOST_ROUND'],
                      early_stopping_rounds=params['EARLY_STOPPING_ROUNDS'],
                      valid_sets=[test_data],
                      valid_names=['valid'],
                      )
    # Evaluate the model 
    preds = model.predict(X_test,num_iteration=model.best_iteration)
    truth = test_data.get_label()
    score = metric.mean_squared_error(truth, preds, squared=False)
    # Return the validation score
    return score

def objective(trial):
    # Define the Hyper-parameter Space
    params = {'learning_rate': trial.suggest_loguniform('learning_rate', 0.01, 0.5),
              'max_depth': trial.suggest_int('max_depth', 1, 50),
              'num_leaves': trial.suggest_int('num_leaves', 2, 200),
              'feature_fraction': trial.suggest_uniform('feature_fraction', 0.1, 1.0),
              'subsample': trial.suggest_discrete_uniform('subsample', 0.1, 1.0, .1),
              'colsample_by_tree': 1,
              'lambda_l1': trial.suggest_float('lambda_l1', 0, 10),
              'lambda_l2': trial.suggest_float('lambda_l2', 0, 10),
              'bagging_fraction':trial.suggest_uniform('bagging_fraction', 0, 1),
              'bagging_freq':trial.suggest_int('bagging_freq', 0, 10),
              'NUM_BOOST_ROUND': 200,
              'EARLY_STOPPING_ROUNDS': 20,
              'objective': 'rmse',
              }
    # Train the model and return the validation score
    score = train_evaluate(params)

    # Return the validation score
    return score

study2 = optuna.create_study(
    direction = 'minimize',
    sampler = optuna.samplers.RandomSampler()
    )

study2.optimize(objective, n_trials=N_TRIALS)

[32m[I 2021-10-07 01:21:31,498][0m A new study created in memory with name: no-name-d1eb4a82-e3e8-4e8c-9d10-eb7e5f1dafae[0m
[32m[I 2021-10-07 01:21:31,870][0m Trial 0 finished with value: 0.5120763365095197 and parameters: {'learning_rate': 0.02676860698231233, 'max_depth': 5, 'num_leaves': 148, 'feature_fraction': 0.33437769373274373, 'subsample': 0.30000000000000004, 'lambda_l1': 7.7580423677684465, 'lambda_l2': 2.5783070946178785, 'bagging_fraction': 0.9607039423632531, 'bagging_freq': 0}. Best is trial 0 with value: 0.5120763365095197.[0m
[32m[I 2021-10-07 01:21:32,374][0m Trial 1 finished with value: 0.4618767039039299 and parameters: {'learning_rate': 0.12270954718004036, 'max_depth': 17, 'num_leaves': 161, 'feature_fraction': 0.8508318097759415, 'subsample': 0.4, 'lambda_l1': 8.259095591833736, 'lambda_l2': 0.056614271026483864, 'bagging_fraction': 0.07423096430454479, 'bagging_freq': 7}. Best is trial 1 with value: 0.4618767039039299.[0m
[32m[I 2021-10-07 01:21:33,135

CPU times: user 3min 59s, sys: 8.08 s, total: 4min 7s
Wall time: 2min 15s


In [None]:
randomsearch = {'score': study2.best_value, 'params': study2.best_params}
print(randomsearch)

{'score': 0.3949193696028479, 'params': {'learning_rate': 0.09812167934287964, 'max_depth': 26, 'num_leaves': 118, 'feature_fraction': 0.8374528130365796, 'subsample': 1.0, 'lambda_l1': 1.4648167623075892, 'lambda_l2': 0.9579344812307833, 'bagging_fraction': 0.16990975490921567, 'bagging_freq': 0}}


## **CMAES**

In [None]:
%%time
%%capture
def train_evaluate(params):
    train_data = lgb.Dataset(X_train, label=y_train)
    test_data = lgb.Dataset(X_test, label=y_test, reference=train_data)
    # Train a Model
    model = lgb.train(params, train_data,
                      num_boost_round=params['NUM_BOOST_ROUND'],
                      early_stopping_rounds=params['EARLY_STOPPING_ROUNDS'],
                      valid_sets=[test_data],
                      valid_names=['valid'],
                      )
    # Evaluate the model 
    preds = model.predict(X_test,num_iteration=model.best_iteration)
    truth = test_data.get_label()
    score = metric.mean_squared_error(truth, preds, squared=False)
    # Return the validation score
    return score

def objective(trial):
    # Define the Hyper-parameter Space
    params = {'learning_rate': trial.suggest_loguniform('learning_rate', 0.01, 0.5),
              'max_depth': trial.suggest_int('max_depth', 1, 50),
              'num_leaves': trial.suggest_int('num_leaves', 2, 200),
              'feature_fraction': trial.suggest_uniform('feature_fraction', 0.1, 1.0),
              'subsample': trial.suggest_discrete_uniform('subsample', 0.1, 1.0, .1),
              'colsample_by_tree': 1,
              'lambda_l1': trial.suggest_float('lambda_l1', 0, 10),
              'lambda_l2': trial.suggest_float('lambda_l2', 0, 10),
              'bagging_fraction':trial.suggest_uniform('bagging_fraction', 0, 1),
              'bagging_freq':trial.suggest_int('bagging_freq',0,10),
              'NUM_BOOST_ROUND': 200,
              'EARLY_STOPPING_ROUNDS': 20,
              'objective': 'rmse',
              }
    # Train the model and return the validation score
    score = train_evaluate(params)
    
    #Check Pruning
    trial.report(score,200)
    if trial.should_prune():
      raise optuna.TrialPruned()
    
    # Return the validation score
    return score

study3 = optuna.create_study(
    direction = 'minimize',
    sampler = optuna.samplers.CmaEsSampler()
    )

study3.optimize(objective, n_trials=N_TRIALS)

[32m[I 2021-10-07 01:23:46,584][0m A new study created in memory with name: no-name-5bf6ecf1-28eb-402f-a136-2ba6c5c58e39[0m
[32m[I 2021-10-07 01:23:47,005][0m Trial 0 finished with value: 0.5655526032872678 and parameters: {'learning_rate': 0.02057851511553912, 'max_depth': 32, 'num_leaves': 181, 'feature_fraction': 0.6446725656399469, 'subsample': 0.9, 'lambda_l1': 9.773342467208998, 'lambda_l2': 2.9053839133184187, 'bagging_fraction': 0.04430321364823009, 'bagging_freq': 10}. Best is trial 0 with value: 0.5655526032872678.[0m
[32m[I 2021-10-07 01:23:48,046][0m Trial 1 finished with value: 0.416784488628391 and parameters: {'learning_rate': 0.06358947330854328, 'max_depth': 26, 'num_leaves': 101, 'feature_fraction': 0.44536891179013033, 'subsample': 0.9, 'lambda_l1': 4.818242676907324, 'lambda_l2': 5.031557248950617, 'bagging_fraction': 0.46010718442317455, 'bagging_freq': 5}. Best is trial 1 with value: 0.416784488628391.[0m
[32m[I 2021-10-07 01:23:49,153][0m Trial 2 finis

CPU times: user 8min, sys: 25.1 s, total: 8min 25s
Wall time: 4min 22s


In [None]:
cmaessearch = {'score': study3.best_value, 'params': study3.best_params}
print(cmaessearch)

{'score': 0.3961867047381666, 'params': {'learning_rate': 0.09775495158772657, 'max_depth': 27, 'num_leaves': 101, 'feature_fraction': 0.9401156742658524, 'subsample': 0.6, 'lambda_l1': 4.944264108252659, 'lambda_l2': 5.299814512899757, 'bagging_fraction': 0.9352250272022755, 'bagging_freq': 5}}


## **BOHB**

In [None]:
%%time
%%capture
def train_evaluate(params):
    train_data = lgb.Dataset(X_train, label=y_train)
    test_data = lgb.Dataset(X_test, label=y_test, reference=train_data)
    # Train a Model
    model = lgb.train(params, train_data,
                      num_boost_round=params['NUM_BOOST_ROUND'],
                      early_stopping_rounds=params['EARLY_STOPPING_ROUNDS'],
                      valid_sets=[test_data],
                      valid_names=['valid'],
                      )
    # Evaluate the model 
    preds = model.predict(X_test,num_iteration=model.best_iteration)
    truth = test_data.get_label()
    score = metric.mean_squared_error(truth, preds, squared=False)
    # Return the validation score
    return score

def objective(trial):
    # Define the Hyper-parameter Space
    params = {'learning_rate': trial.suggest_loguniform('learning_rate', 0.01, 0.5),
              'max_depth': trial.suggest_int('max_depth', 1, 50),
              'num_leaves': trial.suggest_int('num_leaves', 2, 200),
              'feature_fraction': trial.suggest_uniform('feature_fraction', 0.1, 1.0),
              'subsample': trial.suggest_discrete_uniform('subsample', 0.1, 1.0, .1),
              'colsample_by_tree': 1,
              'lambda_l1': trial.suggest_float('lambda_l1', 0, 10),
              'lambda_l2': trial.suggest_float('lambda_l2', 0, 10),
              'bagging_fraction':trial.suggest_uniform('bagging_fraction', 0, 1),
              'bagging_freq':trial.suggest_int('bagging_freq',0,10),
              'NUM_BOOST_ROUND': 200,
              'EARLY_STOPPING_ROUNDS': 20,
              'objective': 'rmse',
              }
    # Train the model and return the validation score
    score = train_evaluate(params)
    
    #Check Pruning
    trial.report(score,200)
    if trial.should_prune():
      raise optuna.TrialPruned()
    
    # Return the validation score
    return score

study4 = optuna.create_study(
    direction = 'minimize',
    sampler = optuna.samplers.TPESampler(),
    pruner = optuna.pruners.HyperbandPruner()
    )

study4.optimize(objective, n_trials=N_TRIALS)

[32m[I 2021-10-07 01:28:47,316][0m A new study created in memory with name: no-name-926323c4-ae49-4777-b2e0-d27f778fa5a7[0m
[32m[I 2021-10-07 01:28:47,583][0m Trial 0 finished with value: 0.7282144152548657 and parameters: {'learning_rate': 0.020933708674259855, 'max_depth': 38, 'num_leaves': 44, 'feature_fraction': 0.994753117665759, 'subsample': 0.1, 'lambda_l1': 5.695641975735559, 'lambda_l2': 3.0028851474907716, 'bagging_fraction': 0.005327893455257748, 'bagging_freq': 8}. Best is trial 0 with value: 0.7282144152548657.[0m
[32m[I 2021-10-07 01:28:48,081][0m Trial 1 finished with value: 0.4521620775958458 and parameters: {'learning_rate': 0.12010621545467294, 'max_depth': 23, 'num_leaves': 91, 'feature_fraction': 0.5861256965609413, 'subsample': 0.6, 'lambda_l1': 9.965239547144773, 'lambda_l2': 4.718510941183589, 'bagging_fraction': 0.14917764261089583, 'bagging_freq': 6}. Best is trial 1 with value: 0.4521620775958458.[0m
[32m[I 2021-10-07 01:28:48,927][0m Trial 2 pruned

CPU times: user 7min 42s, sys: 14.8 s, total: 7min 57s
Wall time: 4min 11s


In [None]:
bohbsearch = {'score': study4.best_value, 'params': study4.best_params}
print(bohbsearch)

{'score': 0.3939383912104589, 'params': {'learning_rate': 0.03367031552814103, 'max_depth': 30, 'num_leaves': 173, 'feature_fraction': 0.7430356033606765, 'subsample': 0.4, 'lambda_l1': 0.8221561100753121, 'lambda_l2': 5.678242496254832, 'bagging_fraction': 0.8560807493513717, 'bagging_freq': 5}}


In [None]:
pd.DataFrame([gridsearch['score'],randomsearch['score'],cmaessearch['score'],bohbsearch['score']],index=['Grid','Random','CMAES','BOHB'],columns=['RMSE'])

Unnamed: 0,RMSE
Grid,0.40096
Random,0.394919
CMAES,0.396187
BOHB,0.393938


## Further Topics

### **Visualization**

In [None]:
#History: 
trials_df = study4.trials_dataframe()
trials_df

Unnamed: 0,number,value,datetime_start,datetime_complete,duration,params_bagging_fraction,params_bagging_freq,params_feature_fraction,params_lambda_l1,params_lambda_l2,params_learning_rate,params_max_depth,params_num_leaves,params_subsample,system_attrs_completed_rung_0,system_attrs_completed_rung_1,system_attrs_completed_rung_2,system_attrs_completed_rung_3,system_attrs_completed_rung_4,state
0,0,0.728214,2021-10-07 01:28:47.320324,2021-10-07 01:28:47.582601,0 days 00:00:00.262277,0.005328,8,0.994753,5.695642,3.002885,0.020934,38,44,0.1,,,,,,COMPLETE
1,1,0.452162,2021-10-07 01:28:47.588875,2021-10-07 01:28:48.081290,0 days 00:00:00.492415,0.149178,6,0.586126,9.965240,4.718511,0.120106,23,91,0.6,0.452162,0.452162,0.452162,0.452162,,COMPLETE
2,2,0.552877,2021-10-07 01:28:48.086254,2021-10-07 01:28:48.927727,0 days 00:00:00.841473,0.669446,3,0.232559,1.667155,9.138452,0.092936,18,174,0.9,0.552877,,,,,PRUNED
3,3,0.458643,2021-10-07 01:28:48.932395,2021-10-07 01:28:50.284379,0 days 00:00:01.351984,0.851838,0,0.727590,1.763611,4.196262,0.011385,10,154,0.2,0.458643,0.458643,0.458643,,,COMPLETE
4,4,0.441069,2021-10-07 01:28:50.299204,2021-10-07 01:28:50.900319,0 days 00:00:00.601115,0.260925,9,0.712155,4.145490,4.941619,0.040973,6,123,0.2,0.441069,0.441069,0.441069,0.441069,0.441069,COMPLETE
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
195,195,0.393938,2021-10-07 01:32:52.151528,2021-10-07 01:32:54.155205,0 days 00:00:02.003677,0.856081,5,0.743036,0.822156,5.678242,0.033670,30,173,0.4,0.393938,0.393938,0.393938,0.393938,0.393938,COMPLETE
196,196,0.405264,2021-10-07 01:32:54.158235,2021-10-07 01:32:55.607008,0 days 00:00:01.448773,0.896642,6,0.971414,8.308732,7.632739,0.147189,32,114,0.3,0.405264,0.405264,,,,PRUNED
197,197,0.414432,2021-10-07 01:32:55.609497,2021-10-07 01:32:56.179608,0 days 00:00:00.570111,0.959051,0,0.985552,1.628162,5.005721,0.345267,23,160,0.8,0.414432,,,,,PRUNED
198,198,0.405928,2021-10-07 01:32:56.182120,2021-10-07 01:32:57.094646,0 days 00:00:00.912526,0.888340,1,0.941228,2.921307,5.621442,0.261995,8,119,0.5,0.405928,0.405928,,,,PRUNED


In [None]:
optuna.visualization.plot_optimization_history(study4)

In [None]:
optuna.visualization.plot_param_importances(study4)

In [None]:
optuna.visualization.plot_slice(study4)

In [None]:
optuna.visualization.plot_parallel_coordinate(study4)