# Optuna: A hyperparameter optimization framework

### This is modified public notebook from Kaggle, showing how to use Optuna.
### I have modified it to do cross-validation within Optuna.

* [1.Basic Concepts](#chapter1)
* [2. Let's build our optimization function using optuna](#chapter2)
* [3. XGBoost using Optuna](#chapter3)
* [5. Submission](#chapter5)

* <h4> In This Kernel I will use an amazing framework called <b>Optuna</b> to find the best hyparameters of our XGBoost and CatBoost </h4>

**So, Optuna is an automatic hyperparameter optimization software framework, particularly designed for machine learning. It features an imperative, define-by-run style user API.<br> The code written with Optuna enjoys high modularity, and the user of Optuna can dynamically construct the search spaces for the hyperparameters.** 
* To learn more about Optuna check this [link](https://optuna.org/)

MP: Good fast introduction to Optuna: https://towardsdatascience.com/why-is-everyone-at-kaggle-obsessed-with-optuna-for-hyperparameter-tuning-7608fdca337c

# 1. Basic Concepts <a class="anchor" id="chapter1"></a>
So, We use the terms study and trial as follows:
* <b>Study</b> : optimization based on an objective function
* <b>Trial</b> : a single execution of the objective function

In [45]:
#import optuna 
import optuna

from xgboost import XGBRegressor
import numpy as np
import pandas as pd
from sklearn.model_selection import KFold, GridSearchCV, KFold, RepeatedKFold
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import train_test_split
import time, warnings
#from optuna.visualization.matplotlib import plot_param_importances

warnings.filterwarnings('ignore')

train = pd.read_csv('../input/tabular-playground-series-jan-2021/train.csv')
test  = pd.read_csv('../input/tabular-playground-series-jan-2021/test.csv')
sub = pd.read_csv('../input/tabular-playground-series-jan-2021/sample_submission.csv')

train.head()

print(train.shape)
columns = [col for col in train.columns.to_list() if col not in ['id','target']]

data=train[columns]
target=train['target']

X_train, X_test, y_train, y_test = train_test_split(data, target, test_size=0.15,random_state=2)

# 2. Let's build our optimization function using optuna <a class="anchor" id="chapter2"></a>

### The following optimization function uses XGBoostRegressor model, so it takes the following arguments:
* the data
* the target
* trial (How many executions we will do)  
### and returns
* RMSE (Root Mean Squared Rrror)

## Notes:
* Note that I used some XGBoostRegressor hyperparameters from Xgboost official site. 
* So if you like to add more parameters or change them, check this [link](https://xgboost.readthedocs.io/en/latest/parameter.html) 
* Also I used early_stopping_rounds to avoid overfiting
* to speedup the training process we can use the GPU or you can comment the first param argument (the training process will takes a lot of time by only using the cpu 😩) 

# 3. XGBoost using Optuna <a class="anchor" id="chapter3"></a>

In [47]:
def evaluate_model_rkf(model, X_df, y_df, n_splits=4, random_state=3):
    X_values = X_df.values
    y_values = y_df.values
    rkf = KFold(n_splits=n_splits, random_state=random_state)
    y_pred = np.zeros_like(y_values)
    for train_index, test_index in rkf.split(X_values):
        X_t, X_v = X_values[train_index, :], X_values[test_index, :]
        y_t = y_values[train_index]
        model.fit(
            X_t, y_t,
        )
        y_pred[test_index] += model.predict(X_v)
    y_pred
    return np.sqrt(mean_squared_error(y_train, y_pred))


# First, try raw XGBoost to make sure that evaluate_model_rkf works:

model = XGBRegressor(tree_method = 'gpu_hist', gpu_id=0, max_depth=8, eta=0.03, n_estimators=200)
evaluate_model_rkf(model, X_train, y_train, n_splits=4, random_state=2)

In [65]:
def objective(trial, random_state=1, n_splits=4, n_jobs=-1, early_stopping_rounds=50):
    params = {
        "tree_method": 'gpu_hist',
        "gpu_id": 0,
        "verbosity": 0,  # 0 (silent) - 3 (debug)
        "objective": "reg:squarederror",
        "n_estimators": 300,
        "max_depth": trial.suggest_int("max_depth", 2, 12),
        "learning_rate": trial.suggest_uniform("learning_rate", 0.01, 0.08),
        "colsample_bytree": trial.suggest_uniform("colsample_bytree", 0.2, 1),
        "subsample": trial.suggest_uniform("subsample", 0.3, 1),
        "alpha": trial.suggest_loguniform("alpha", 0.01, 10.0),
        "lambda": trial.suggest_loguniform("lambda", 1e-8, 10.0),
        "gamma": trial.suggest_loguniform("lambda", 1e-8, 10.0),
        "min_child_weight": trial.suggest_uniform("min_child_weight", 10, 1000),
        "seed": random_state,
        "n_jobs": n_jobs,
    }

    X = X_train
    y = y_train
    
    model = XGBRegressor(**params)
    rkf = KFold(n_splits=n_splits, random_state=random_state)
    X_values = X.values
    y_values = y.values
    y_pred = np.zeros_like(y_values)
    for train_index, test_index in rkf.split(X_values):
        X_A, X_B = X_values[train_index, :], X_values[test_index, :]
        y_A, y_B = y_values[train_index], y_values[test_index]
        model.fit(X_A, y_A, eval_set=[(X_B, y_B)],
            eval_metric="rmse", early_stopping_rounds=early_stopping_rounds, verbose = False)
        y_pred[test_index] += model.predict(X_B)
    return (mean_squared_error(y_train, y_pred, squared=False))



time1 = time.time()
study = optuna.create_study(direction="minimize")
study.optimize(objective, n_trials=20)
print('Total time ', time.time()-time1)

# display params
hp = study.best_params
for key, value in hp.items():
    print(f"{key:>20s} : {value}")
print(f"{'best objective value':>20s} : {study.best_value}")

In [None]:
# compare timing with GridSearchCV on XGBoost:

time1 = time.time()
xgbb = XGBRegressor(tree_method='gpu_hist', gpu_id=0)
param_grid = {'n_estimators':[300], 'eta':[0.02, 0.04, 0.06], 'max_depth':[4,8,12]}
xgbm = GridSearchCV(xgbb, param_grid, cv=2, scoring='neg_root_mean_squared_error')
xgbm.fit(X_train, y_train)
print('XGB ', xgbm.best_params_, xgbm.best_score_, time.time()-time1)


In [61]:
optuna_xgb = XGBRegressor(tree_method = 'gpu_hist', gpu_id = 0,
                         n_estimators=300, max_depth=12, eta=0.04, subsample=0.35, colsample_bytree=0.55,
                         min_child_weight=200)
optuna_xgb.fit(X_train, y_train)
gs_xgb = xgbm

print('Optuna model:', mean_squared_error(optuna_xgb.predict(X_test), y_test, squared = False))
print('GS model:', mean_squared_error(gs_xgb.predict(X_test), y_test, squared = False))

Optuna model: 0.6948667148516773
GS model: 0.6976312353994483


In [10]:
study.trials_dataframe()

Unnamed: 0,number,value,datetime_start,datetime_complete,duration,params_alpha,params_colsample_bytree,params_lambda,params_learning_rate,params_max_depth,params_min_child_weight,params_random_state,params_subsample,state
0,0,0.696114,2022-06-04 18:52:09.899173,2022-06-04 18:52:14.644990,0 days 00:00:04.745817,10,0.7,0.084875,0.04,10,336,2020,0.4,COMPLETE
1,1,0.6969,2022-06-04 18:52:14.646994,2022-06-04 18:52:17.375151,0 days 00:00:02.728157,10,0.4,0.075186,0.05,8,310,2020,0.4,COMPLETE
2,2,0.698246,2022-06-04 18:52:17.377366,2022-06-04 18:52:23.380972,0 days 00:00:06.003606,10,0.6,0.007114,0.02,10,257,2020,0.8,COMPLETE
3,3,0.697535,2022-06-04 18:52:23.385967,2022-06-04 18:52:26.452551,0 days 00:00:03.066584,10,0.5,0.00219,0.035,8,365,2020,0.8,COMPLETE
4,4,0.695742,2022-06-04 18:52:26.454154,2022-06-04 18:52:30.969949,0 days 00:00:04.515795,10,0.7,0.018284,0.05,10,319,2020,1.0,COMPLETE
5,5,0.696791,2022-06-04 18:52:30.971854,2022-06-04 18:52:36.393449,0 days 00:00:05.421595,10,0.8,0.351967,0.025,12,311,2020,0.5,COMPLETE
6,6,0.703829,2022-06-04 18:52:36.395216,2022-06-04 18:52:37.587388,0 days 00:00:01.192172,10,0.4,1.223062,0.05,4,388,2020,1.0,COMPLETE
7,7,0.698129,2022-06-04 18:52:37.590177,2022-06-04 18:52:40.767488,0 days 00:00:03.177311,10,0.7,0.002422,0.03,8,274,2020,1.0,COMPLETE
8,8,0.704032,2022-06-04 18:52:40.769124,2022-06-04 18:52:42.981718,0 days 00:00:02.212594,10,0.8,0.004568,0.02,6,281,2020,0.4,COMPLETE
9,9,0.700566,2022-06-04 18:52:42.983455,2022-06-04 18:52:46.098812,0 days 00:00:03.115357,10,0.5,0.065198,0.02,8,339,2020,0.6,COMPLETE


# Let's do some Quick Visualization for Hyperparameter Optimization Analysis
#### Optuna provides various visualization features in optuna.visualization to analyze optimization results visually

In [56]:
#plot_optimization_histor: shows the scores from all trials as well as the best score so far at each point.
optuna.visualization.plot_optimization_history(study)

In [57]:
#plot_parallel_coordinate: interactively visualizes the hyperparameters and scores
optuna.visualization.plot_parallel_coordinate(study)

In [58]:
'''plot_slice: shows the evolution of the search. You can see where in the hyperparameter space your search
went and which parts of the space were explored more.'''
optuna.visualization.plot_slice(study)

In [36]:
#plot_contour: plots parameter interactions on an interactive chart. You can choose which hyperparameters you would like to explore.
optuna.visualization.plot_contour(study, params=['alpha',
                            #'max_depth',
                            'lambda',
                            'subsample',
                            'learning_rate',
                            'subsample'])

In [59]:
#Visualize parameter importances.
optuna.visualization.plot_param_importances(study)

In [38]:
#Visualize empirical distribution function
optuna.visualization.plot_edf(study)

### Try Optuna with cross-validation

see this: https://aetperf.github.io/2021/02/16/Optuna-+-XGBoost-on-a-tabular-dataset.html

# Let's create an XGBoostRegressor model with the best hyperparameters

In [None]:
Best_trial = study.best_trial.params
Best_trial["n_estimators"], Best_trial["tree_method"] = 10000, 'gpu_hist'
Best_trial

In [None]:
preds = np.zeros(test.shape[0])
kf = KFold(n_splits=5,random_state=48,shuffle=True)
rmse=[]  # list contains rmse for each fold
n=0
for trn_idx, test_idx in kf.split(train[columns],train['target']):
    X_tr,X_val=train[columns].iloc[trn_idx],train[columns].iloc[test_idx]
    y_tr,y_val=train['target'].iloc[trn_idx],train['target'].iloc[test_idx]
    model = xgb.XGBRegressor(**Best_trial)
    model.fit(X_tr,y_tr,eval_set=[(X_val,y_val)],early_stopping_rounds=100,verbose=False)
    preds+=model.predict(test[columns])/kf.n_splits
    rmse.append(mean_squared_error(y_val, model.predict(X_val), squared=False))
    print(f"fold: {n+1} ==> rmse: {rmse[n]}")
    n+=1

In [None]:
np.mean(rmse)

# 5. Submission <a class="anchor" id="chapter5"></a>

In [None]:
sub['target']=preds
sub.to_csv('submission.csv', index=False)