# Optuna: A hyperparameter optimization framework

### This is modified public notebook from Kaggle, showing how to use Optuna.
### I have modified it to do cross-validation within Optuna.

* [1.Basic Concepts](#chapter1)
* [2. Let's build our optimization function using optuna](#chapter2)
* [3. XGBoost using Optuna](#chapter3)
* [5. Submission](#chapter5)

* <h4> In This Kernel I will use an amazing framework called <b>Optuna</b> to find the best hyparameters of our XGBoost and CatBoost </h4>

**So, Optuna is an automatic hyperparameter optimization software framework, particularly designed for machine learning. It features an imperative, define-by-run style user API.<br> The code written with Optuna enjoys high modularity, and the user of Optuna can dynamically construct the search spaces for the hyperparameters.** 
* To learn more about Optuna check this [link](https://optuna.org/)

MP: Good fast introduction to Optuna: https://towardsdatascience.com/why-is-everyone-at-kaggle-obsessed-with-optuna-for-hyperparameter-tuning-7608fdca337c

# 1. Basic Concepts <a class="anchor" id="chapter1"></a>
So, We use the terms study and trial as follows:
* <b>Study</b> : optimization based on an objective function
* <b>Trial</b> : a single execution of the objective function

In [20]:
#import optuna 
import optuna

from xgboost import XGBRegressor
import numpy as np
import pandas as pd
from sklearn.model_selection import KFold, GridSearchCV, KFold, RepeatedKFold
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import train_test_split
import time, warnings
#from optuna.visualization.matplotlib import plot_param_importances

warnings.filterwarnings('ignore')

train = pd.read_csv('../input/tabular-playground-series-jan-2021/train.csv')
test  = pd.read_csv('../input/tabular-playground-series-jan-2021/test.csv')
sub = pd.read_csv('../input/tabular-playground-series-jan-2021/sample_submission.csv')

train.head()

print(train.shape)
columns = [col for col in train.columns.to_list() if col not in ['id','target']]

data=train[columns]
target=train['target']

X_train, X_test, y_train, y_test = train_test_split(data, target, test_size=0.15,random_state=5)

(300000, 16)


# 2. Let's build our optimization function using optuna <a class="anchor" id="chapter2"></a>

### The following optimization function uses XGBoostRegressor model, so it takes the following arguments:
* the data
* the target
* trial (How many executions we will do)  
### and returns
* RMSE (Root Mean Squared Rrror)

## Notes:
* Note that I used some XGBoostRegressor hyperparameters from Xgboost official site. 
* So if you like to add more parameters or change them, check this [link](https://xgboost.readthedocs.io/en/latest/parameter.html) 
* Also I used early_stopping_rounds to avoid overfiting
* to speedup the training process we can use the GPU or you can comment the first param argument (the training process will takes a lot of time by only using the cpu 😩) 

# 3. XGBoost using Optuna <a class="anchor" id="chapter3"></a>

In [21]:
def evaluate_model_rkf(model, X_df, y_df, n_splits=4, random_state=3):
    X_values = X_df.values
    y_values = y_df.values
    rkf = KFold(n_splits=n_splits, random_state=random_state)
    y_pred = np.zeros_like(y_values)
    for train_index, test_index in rkf.split(X_values):
        X_t, X_v = X_values[train_index, :], X_values[test_index, :]
        y_t = y_values[train_index]
        model.fit(
            X_t, y_t,
        )
        y_pred[test_index] += model.predict(X_v)
    y_pred
    return np.sqrt(mean_squared_error(y_train, y_pred))


# First, try raw XGBoost to make sure that evaluate_model_rkf works:

model = XGBRegressor(tree_method = 'gpu_hist', gpu_id=0, max_depth=8, eta=0.03, n_estimators=200)
evaluate_model_rkf(model, X_train, y_train, n_splits=4, random_state=2)

0.7038075728686064

In [22]:
def objective(trial, random_state=1, n_splits=4, n_jobs=-1, early_stopping_rounds=50):
    params = {
        "tree_method": 'gpu_hist',
        "gpu_id": 0,
        "verbosity": 0,  # 0 (silent) - 3 (debug)
        "objective": "reg:squarederror",
        "n_estimators": 300,
        "max_depth": trial.suggest_int("max_depth", 2, 15),
        "learning_rate": trial.suggest_uniform("learning_rate", 0.01, 0.09),
        "colsample_bytree": trial.suggest_uniform("colsample_bytree", 0.2, 1),
        "subsample": trial.suggest_uniform("subsample", 0.3, 1),
        "alpha": trial.suggest_loguniform("alpha", 0.01, 10.0),
        "lambda": trial.suggest_loguniform("lambda", 1e-8, 10.0),
        "gamma": trial.suggest_loguniform("lambda", 1e-8, 10.0),
        "min_child_weight": trial.suggest_uniform("min_child_weight", 10, 1000),
        "seed": random_state,
        "n_jobs": n_jobs,
    }

    X = X_train
    y = y_train
    
    model = XGBRegressor(**params)
    rkf = KFold(n_splits=n_splits, random_state=random_state)
    X_values = X.values
    y_values = y.values
    y_pred = np.zeros_like(y_values)
    for train_index, test_index in rkf.split(X_values):
        X_A, X_B = X_values[train_index, :], X_values[test_index, :]
        y_A, y_B = y_values[train_index], y_values[test_index]
        model.fit(X_A, y_A, eval_set=[(X_B, y_B)],
            eval_metric="rmse", early_stopping_rounds=early_stopping_rounds, verbose = False)
        y_pred[test_index] += model.predict(X_B)
    return (mean_squared_error(y_train, y_pred, squared=False))



time1 = time.time()
study = optuna.create_study(direction="minimize")
study.optimize(objective, n_trials=25)
print('Total time ', time.time()-time1)

# display params
hp = study.best_params
for key, value in hp.items():
    print(f"{key:>20s} : {value}")
print(f"{'best objective value':>20s} : {study.best_value}")

[32m[I 2022-06-04 21:13:51,134][0m A new study created in memory with name: no-name-0ffaf289-958d-46c3-93dd-a32867d76004[0m
[32m[I 2022-06-04 21:14:10,541][0m Trial 0 finished with value: 0.6994663963926937 and parameters: {'max_depth': 9, 'learning_rate': 0.06482032108086883, 'colsample_bytree': 0.5569373394796577, 'subsample': 0.5011762194874223, 'alpha': 0.05968826543208823, 'lambda': 0.0006774184815367551, 'min_child_weight': 324.67143644693823}. Best is trial 0 with value: 0.6994663963926937.[0m
[32m[I 2022-06-04 21:14:35,421][0m Trial 1 finished with value: 0.7016600053807593 and parameters: {'max_depth': 12, 'learning_rate': 0.021797214345854066, 'colsample_bytree': 0.9413363521767146, 'subsample': 0.7554959619175221, 'alpha': 0.5035967548008482, 'lambda': 4.682324020403498e-07, 'min_child_weight': 873.6215392715981}. Best is trial 0 with value: 0.6994663963926937.[0m
[32m[I 2022-06-04 21:15:09,459][0m Trial 2 finished with value: 0.699116020640101 and parameters: {'m

Total time  641.1749036312103
           max_depth : 15
       learning_rate : 0.04643603962788352
    colsample_bytree : 0.3715574236018544
           subsample : 0.9866775967772321
               alpha : 0.14814617354838014
              lambda : 0.21918711620744202
    min_child_weight : 194.64263571322925
best objective value : 0.6984142427154156


In [23]:
# compare timing with GridSearchCV on XGBoost:

time1 = time.time()
xgbb = XGBRegressor(tree_method='gpu_hist', gpu_id=0)
param_grid = {'n_estimators':[300], 'eta':[0.02, 0.04, 0.06], 'max_depth':[4,7,10,12,15]}
xgbm = GridSearchCV(xgbb, param_grid, cv=4, scoring='neg_root_mean_squared_error')
xgbm.fit(X_train, y_train)
print('XGB ', xgbm.best_params_, xgbm.best_score_, time.time()-time1)


XGB  {'eta': 0.06, 'max_depth': 7, 'n_estimators': 300} -0.7009206350927327 1650.7947299480438


In [24]:
optuna_hyperpars = study.best_params
optuna_hyperpars['tree_method']='gpu_hist'
optuna_hyperpars['gpu_id']=0
optuna_hyperpars['n_estimators']=300
#optuna_hyperpars
optuna_xgb = XGBRegressor(**optuna_hyperpars)
optuna_xgb.fit(X_train, y_train)

print('Optuna model:', mean_squared_error(optuna_xgb.predict(X_test), y_test, squared = False))
print('GS model:', mean_squared_error(xgbm.predict(X_test), y_test, squared = False))

Optuna model: 0.6962171161578563
GS model: 0.6986172860276971


In [25]:
study.trials_dataframe()

Unnamed: 0,number,value,datetime_start,datetime_complete,duration,params_alpha,params_colsample_bytree,params_lambda,params_learning_rate,params_max_depth,params_min_child_weight,params_subsample,state
0,0,0.699466,2022-06-04 21:13:51.139114,2022-06-04 21:14:10.541222,0 days 00:00:19.402108,0.059688,0.556937,0.0006774185,0.06482,9,324.671436,0.501176,COMPLETE
1,1,0.70166,2022-06-04 21:14:10.543684,2022-06-04 21:14:35.421340,0 days 00:00:24.877656,0.503597,0.941336,4.682324e-07,0.021797,12,873.621539,0.755496,COMPLETE
2,2,0.699116,2022-06-04 21:14:35.423370,2022-06-04 21:15:09.459279,0 days 00:00:34.035909,0.136678,0.906234,2.38148e-05,0.043532,15,343.954587,0.448787,COMPLETE
3,3,0.727395,2022-06-04 21:15:09.460925,2022-06-04 21:15:40.634793,0 days 00:00:31.173868,1.517065,0.308176,2.845794e-08,0.012461,15,916.551701,0.793155,COMPLETE
4,4,0.712364,2022-06-04 21:15:40.636573,2022-06-04 21:16:19.153961,0 days 00:00:38.517388,8.01723,0.60095,8.020958e-05,0.014344,11,72.13104,0.305627,COMPLETE
5,5,0.707849,2022-06-04 21:16:19.155884,2022-06-04 21:16:24.792801,0 days 00:00:05.636917,6.21319,0.863639,9.309031,0.043665,12,515.200482,0.489237,COMPLETE
6,6,0.701183,2022-06-04 21:16:24.795029,2022-06-04 21:16:36.771666,0 days 00:00:11.976637,8.867475,0.642956,2.159479e-07,0.063582,7,669.913161,0.337164,COMPLETE
7,7,0.699906,2022-06-04 21:16:36.773475,2022-06-04 21:16:55.598960,0 days 00:00:18.825485,1.244505,0.830288,2.300949e-07,0.085133,13,959.788039,0.492521,COMPLETE
8,8,0.700769,2022-06-04 21:16:55.600689,2022-06-04 21:17:16.107737,0 days 00:00:20.507048,0.073326,0.605702,2.130845,0.031272,13,774.319439,0.648789,COMPLETE
9,9,0.704992,2022-06-04 21:17:16.109789,2022-06-04 21:17:25.448871,0 days 00:00:09.339082,0.259194,0.286014,5.292791,0.039773,9,309.609521,0.663777,COMPLETE


# Let's do some Quick Visualization for Hyperparameter Optimization Analysis
#### Optuna provides various visualization features in optuna.visualization to analyze optimization results visually

In [26]:
#plot_optimization_histor: shows the scores from all trials as well as the best score so far at each point.
optuna.visualization.plot_optimization_history(study)

In [27]:
#plot_parallel_coordinate: interactively visualizes the hyperparameters and scores
optuna.visualization.plot_parallel_coordinate(study)

In [28]:
'''plot_slice: shows the evolution of the search. You can see where in the hyperparameter space your search
went and which parts of the space were explored more.'''
optuna.visualization.plot_slice(study)

In [29]:
#plot_contour: plots parameter interactions on an interactive chart. You can choose which hyperparameters you would like to explore.
optuna.visualization.plot_contour(study, params=['alpha',
                            #'max_depth',
                            'lambda',
                            'subsample',
                            'learning_rate',
                            'subsample'])

In [30]:
#Visualize parameter importances.
optuna.visualization.plot_param_importances(study)

In [31]:
#Visualize empirical distribution function
optuna.visualization.plot_edf(study)

### Optuna with cross-validation code was developed using this: https://aetperf.github.io/2021/02/16/Optuna-+-XGBoost-on-a-tabular-dataset.html

# Let's create an XGBoostRegressor model with the best hyperparameters

In [32]:
#Best_trial = study.best_trial.params
#Best_trial["n_estimators"], Best_trial["tree_method"] = 10000, 'gpu_hist'
#Best_trial

In [33]:
#preds = np.zeros(test.shape[0])
#kf = KFold(n_splits=5,random_state=48,shuffle=True)
#rmse=[]  # list contains rmse for each fold
#n=0
#for trn_idx, test_idx in kf.split(train[columns],train['target']):
#    X_tr,X_val=train[columns].iloc[trn_idx],train[columns].iloc[test_idx]
#    y_tr,y_val=train['target'].iloc[trn_idx],train['target'].iloc[test_idx]
#    model = xgb.XGBRegressor(**Best_trial)
#    model.fit(X_tr,y_tr,eval_set=[(X_val,y_val)],early_stopping_rounds=100,verbose=False)
#    preds+=model.predict(test[columns])/kf.n_splits
#    rmse.append(mean_squared_error(y_val, model.predict(X_val), squared=False))
#    print(f"fold: {n+1} ==> rmse: {rmse[n]}")
#    n+=1

In [34]:
#np.mean(rmse)

# 5. Submission <a class="anchor" id="chapter5"></a>

In [35]:
#sub['target']=preds
#sub.to_csv('submission.csv', index=False)