# Optuna: A hyperparameter optimization framework

### This is modified public notebook from Kaggle, showing how to use Optuna.
### I have modified it to do cross-validation within Optuna.

* [1.Basic Concepts](#chapter1)
* [2. Let's build our optimization function using optuna](#chapter2)
* [3. XGBoost using Optuna](#chapter3)
* [5. Submission](#chapter5)

* <h4> In This Kernel I will use an amazing framework called <b>Optuna</b> to find the best hyparameters of our XGBoost and CatBoost </h4>

**So, Optuna is an automatic hyperparameter optimization software framework, particularly designed for machine learning. It features an imperative, define-by-run style user API.<br> The code written with Optuna enjoys high modularity, and the user of Optuna can dynamically construct the search spaces for the hyperparameters.** 
* To learn more about Optuna check this [link](https://optuna.org/)

MP: Good fast introduction to Optuna: https://towardsdatascience.com/why-is-everyone-at-kaggle-obsessed-with-optuna-for-hyperparameter-tuning-7608fdca337c

# 1. Basic Concepts <a class="anchor" id="chapter1"></a>
So, We use the terms study and trial as follows:
* <b>Study</b> : optimization based on an objective function
* <b>Trial</b> : a single execution of the objective function

In [1]:
#import optuna 
import optuna

from xgboost import XGBRegressor
import numpy as np
import pandas as pd
from sklearn.model_selection import KFold, GridSearchCV, KFold, RepeatedKFold
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import train_test_split
import time, warnings
#from optuna.visualization.matplotlib import plot_param_importances

warnings.filterwarnings('ignore')

train = pd.read_csv('../input/tabular-playground-series-jan-2021/train.csv')
test  = pd.read_csv('../input/tabular-playground-series-jan-2021/test.csv')
sub = pd.read_csv('../input/tabular-playground-series-jan-2021/sample_submission.csv')

train.head()

print(train.shape)
columns = [col for col in train.columns.to_list() if col not in ['id','target']]

data=train[columns]
target=train['target']

X_train, X_test, y_train, y_test = train_test_split(data, target, test_size=0.15,random_state=4)

(300000, 16)


# 2. Let's build our optimization function using optuna <a class="anchor" id="chapter2"></a>

### The following optimization function uses XGBoostRegressor model, so it takes the following arguments:
* the data
* the target
* trial (How many executions we will do)  
### and returns
* RMSE (Root Mean Squared Rrror)

## Notes:
* Note that I used some XGBoostRegressor hyperparameters from Xgboost official site. 
* So if you like to add more parameters or change them, check this [link](https://xgboost.readthedocs.io/en/latest/parameter.html) 
* Also I used early_stopping_rounds to avoid overfiting
* to speedup the training process we can use the GPU or you can comment the first param argument (the training process will takes a lot of time by only using the cpu 😩) 

# 3. XGBoost using Optuna <a class="anchor" id="chapter3"></a>

In [2]:
def evaluate_model_rkf(model, X_df, y_df, n_splits=4, random_state=3):
    X_values = X_df.values
    y_values = y_df.values
    rkf = KFold(n_splits=n_splits, random_state=random_state)
    y_pred = np.zeros_like(y_values)
    for train_index, test_index in rkf.split(X_values):
        X_t, X_v = X_values[train_index, :], X_values[test_index, :]
        y_t = y_values[train_index]
        model.fit(
            X_t, y_t,
        )
        y_pred[test_index] += model.predict(X_v)
    y_pred
    return np.sqrt(mean_squared_error(y_train, y_pred))


# First, try raw XGBoost to make sure that evaluate_model_rkf works:

model = XGBRegressor(tree_method = 'gpu_hist', gpu_id=0, max_depth=8, eta=0.03, n_estimators=200)
evaluate_model_rkf(model, X_train, y_train, n_splits=4, random_state=2)

0.703764751179221

In [3]:
def objective(trial, random_state=1, n_splits=4, n_jobs=-1, early_stopping_rounds=50):
    params = {
        "tree_method": 'gpu_hist',
        "gpu_id": 0,
        "verbosity": 0,  # 0 (silent) - 3 (debug)
        "objective": "reg:squarederror",
        "n_estimators": 400,
        "max_depth": trial.suggest_int("max_depth", 2, 15),
        "learning_rate": trial.suggest_uniform("learning_rate", 0.01, 0.08),
        "colsample_bytree": trial.suggest_uniform("colsample_bytree", 0.2, 1),
        "subsample": trial.suggest_uniform("subsample", 0.3, 1),
        "alpha": trial.suggest_loguniform("alpha", 0.01, 10.0),
        "lambda": trial.suggest_loguniform("lambda", 1e-8, 10.0),
        "gamma": trial.suggest_loguniform("lambda", 1e-8, 10.0),
        "min_child_weight": trial.suggest_uniform("min_child_weight", 10, 1000),
        "seed": random_state,
        "n_jobs": n_jobs,
    }

    X = X_train
    y = y_train
    
    model = XGBRegressor(**params)
    rkf = KFold(n_splits=n_splits, random_state=random_state)
    X_values = X.values
    y_values = y.values
    y_pred = np.zeros_like(y_values)
    for train_index, test_index in rkf.split(X_values):
        X_A, X_B = X_values[train_index, :], X_values[test_index, :]
        y_A, y_B = y_values[train_index], y_values[test_index]
        model.fit(X_A, y_A, eval_set=[(X_B, y_B)],
            eval_metric="rmse", early_stopping_rounds=early_stopping_rounds, verbose = False)
        y_pred[test_index] += model.predict(X_B)
    return (mean_squared_error(y_train, y_pred, squared=False))



time1 = time.time()
study = optuna.create_study(direction="minimize")
study.optimize(objective, n_trials=20)
print('Total time ', time.time()-time1)

# display params
hp = study.best_params
for key, value in hp.items():
    print(f"{key:>20s} : {value}")
print(f"{'best objective value':>20s} : {study.best_value}")

[32m[I 2022-06-04 20:23:41,049][0m A new study created in memory with name: no-name-39ad46d8-a842-4e2a-971b-153ccc60183e[0m
[32m[I 2022-06-04 20:23:45,542][0m Trial 0 finished with value: 0.7122660786862711 and parameters: {'max_depth': 2, 'learning_rate': 0.05195085272826953, 'colsample_bytree': 0.47952588987255984, 'subsample': 0.5602922191370104, 'alpha': 0.21187056065297036, 'lambda': 0.00010545779009505479, 'min_child_weight': 195.47451947284634}. Best is trial 0 with value: 0.7122660786862711.[0m
[32m[I 2022-06-04 20:24:17,451][0m Trial 1 finished with value: 0.7017116049114462 and parameters: {'max_depth': 14, 'learning_rate': 0.07729284434572192, 'colsample_bytree': 0.5775365343700496, 'subsample': 0.8309369882567488, 'alpha': 0.12032459070247384, 'lambda': 0.00025089875473915946, 'min_child_weight': 55.88260673224424}. Best is trial 1 with value: 0.7017116049114462.[0m
[32m[I 2022-06-04 20:24:46,829][0m Trial 2 finished with value: 0.702229356941937 and parameters: 

Total time  441.1238269805908
           max_depth : 13
       learning_rate : 0.039802844302637114
    colsample_bytree : 0.6952356940618502
           subsample : 0.9164605438407302
               alpha : 0.037490266214095305
              lambda : 2.9193514677158114e-05
    min_child_weight : 730.7930557677373
best objective value : 0.6982600215100139


In [4]:
# compare timing with GridSearchCV on XGBoost:

time1 = time.time()
xgbb = XGBRegressor(tree_method='gpu_hist', gpu_id=0)
param_grid = {'n_estimators':[400], 'eta':[0.02, 0.04, 0.06], 'max_depth':[4,7,10,15]}
xgbm = GridSearchCV(xgbb, param_grid, cv=4, scoring='neg_root_mean_squared_error')
xgbm.fit(X_train, y_train)
print('XGB ', xgbm.best_params_, xgbm.best_score_, time.time()-time1)


XGB  {'eta': 0.06, 'max_depth': 7, 'n_estimators': 400} -0.7003756237179315 1632.807345867157


In [5]:
optuna_xgb = XGBRegressor(tree_method = 'gpu_hist', gpu_id = 0,
                         n_estimators=300, max_depth=12, eta=0.04, subsample=0.35, colsample_bytree=0.55,
                         min_child_weight=200)
optuna_xgb.fit(X_train, y_train)
gs_xgb = xgbm

print('Optuna model:', mean_squared_error(optuna_xgb.predict(X_test), y_test, squared = False))
print('GS model:', mean_squared_error(gs_xgb.predict(X_test), y_test, squared = False))

Optuna model: 0.6973884455545938
GS model: 0.6986591315057581


In [6]:
study.trials_dataframe()

Unnamed: 0,number,value,datetime_start,datetime_complete,duration,params_alpha,params_colsample_bytree,params_lambda,params_learning_rate,params_max_depth,params_min_child_weight,params_subsample,state
0,0,0.712266,2022-06-04 20:23:41.053397,2022-06-04 20:23:45.541948,0 days 00:00:04.488551,0.211871,0.479526,0.0001054578,0.051951,2,195.474519,0.560292,COMPLETE
1,1,0.701712,2022-06-04 20:23:45.543298,2022-06-04 20:24:17.450749,0 days 00:00:31.907451,0.120325,0.577537,0.0002508988,0.077293,14,55.882607,0.830937,COMPLETE
2,2,0.702229,2022-06-04 20:24:17.452199,2022-06-04 20:24:46.829045,0 days 00:00:29.376846,0.044953,0.906941,7.538432e-06,0.077284,15,72.597651,0.751108,COMPLETE
3,3,0.69929,2022-06-04 20:24:46.830384,2022-06-04 20:25:06.181459,0 days 00:00:19.351075,1.430926,0.985106,5.250087e-06,0.062516,11,266.577237,0.615163,COMPLETE
4,4,0.709055,2022-06-04 20:25:06.196676,2022-06-04 20:25:12.086617,0 days 00:00:05.889941,8.422817,0.242845,0.003831639,0.048508,3,594.948634,0.505094,COMPLETE
5,5,0.702092,2022-06-04 20:25:12.087939,2022-06-04 20:25:34.162130,0 days 00:00:22.074191,0.974338,0.343585,0.0003399284,0.018212,8,185.608269,0.780009,COMPLETE
6,6,0.707587,2022-06-04 20:25:34.163573,2022-06-04 20:25:38.920749,0 days 00:00:04.757176,7.538362,0.647522,6.248142,0.064169,4,216.43184,0.903823,COMPLETE
7,7,0.704453,2022-06-04 20:25:38.922080,2022-06-04 20:26:24.730964,0 days 00:00:45.808884,3.87817,0.449901,2.393077e-06,0.011962,11,110.939403,0.47905,COMPLETE
8,8,0.700375,2022-06-04 20:26:24.732299,2022-06-04 20:26:36.131062,0 days 00:00:11.398763,0.098074,0.326429,0.4803113,0.074102,6,418.712588,0.635002,COMPLETE
9,9,0.699625,2022-06-04 20:26:36.132416,2022-06-04 20:26:50.377500,0 days 00:00:14.245084,0.013117,0.936703,6.245877e-08,0.072361,9,165.589576,0.587511,COMPLETE


# Let's do some Quick Visualization for Hyperparameter Optimization Analysis
#### Optuna provides various visualization features in optuna.visualization to analyze optimization results visually

In [7]:
#plot_optimization_histor: shows the scores from all trials as well as the best score so far at each point.
optuna.visualization.plot_optimization_history(study)

In [8]:
#plot_parallel_coordinate: interactively visualizes the hyperparameters and scores
optuna.visualization.plot_parallel_coordinate(study)

In [9]:
'''plot_slice: shows the evolution of the search. You can see where in the hyperparameter space your search
went and which parts of the space were explored more.'''
optuna.visualization.plot_slice(study)

In [10]:
#plot_contour: plots parameter interactions on an interactive chart. You can choose which hyperparameters you would like to explore.
optuna.visualization.plot_contour(study, params=['alpha',
                            #'max_depth',
                            'lambda',
                            'subsample',
                            'learning_rate',
                            'subsample'])

In [11]:
#Visualize parameter importances.
optuna.visualization.plot_param_importances(study)

In [12]:
#Visualize empirical distribution function
optuna.visualization.plot_edf(study)

### Optuna with cross-validation code was developed using this: https://aetperf.github.io/2021/02/16/Optuna-+-XGBoost-on-a-tabular-dataset.html

# Let's create an XGBoostRegressor model with the best hyperparameters

In [13]:
#Best_trial = study.best_trial.params
#Best_trial["n_estimators"], Best_trial["tree_method"] = 10000, 'gpu_hist'
#Best_trial

In [14]:
#preds = np.zeros(test.shape[0])
#kf = KFold(n_splits=5,random_state=48,shuffle=True)
#rmse=[]  # list contains rmse for each fold
#n=0
#for trn_idx, test_idx in kf.split(train[columns],train['target']):
#    X_tr,X_val=train[columns].iloc[trn_idx],train[columns].iloc[test_idx]
#    y_tr,y_val=train['target'].iloc[trn_idx],train['target'].iloc[test_idx]
#    model = xgb.XGBRegressor(**Best_trial)
#    model.fit(X_tr,y_tr,eval_set=[(X_val,y_val)],early_stopping_rounds=100,verbose=False)
#    preds+=model.predict(test[columns])/kf.n_splits
#    rmse.append(mean_squared_error(y_val, model.predict(X_val), squared=False))
#    print(f"fold: {n+1} ==> rmse: {rmse[n]}")
#    n+=1

In [15]:
#np.mean(rmse)

# 5. Submission <a class="anchor" id="chapter5"></a>

In [16]:
#sub['target']=preds
#sub.to_csv('submission.csv', index=False)