# Optuna: A hyperparameter optimization framework

* [1.Basic Concepts](#chapter1)
* [2. Let's build our optimization function using optuna](#chapter2)
* [3. XGBoost using Optuna](#chapter3)
* [4. CatBoost using Optuna](#chapter4)
* [5. Submission](#chapter5)

* <h4> In This Kernel I will use an amazing framework called <b>Optuna</b> to find the best hyparameters of our XGBoost and CatBoost </h4>

**So, Optuna is an automatic hyperparameter optimization software framework, particularly designed for machine learning. It features an imperative, define-by-run style user API.<br> The code written with Optuna enjoys high modularity, and the user of Optuna can dynamically construct the search spaces for the hyperparameters.** 
* To learn more about Optuna check this [link](https://optuna.org/)

MP: Good fast introduction to Optuna: https://towardsdatascience.com/why-is-everyone-at-kaggle-obsessed-with-optuna-for-hyperparameter-tuning-7608fdca337c

# 1. Basic Concepts <a class="anchor" id="chapter1"></a>
So, We use the terms study and trial as follows:
* <b>Study</b> : optimization based on an objective function
* <b>Trial</b> : a single execution of the objective function

In [18]:
#import optuna 
import optuna

from xgboost import XGBRegressor
from catboost import CatBoostRegressor
import numpy as np
import pandas as pd
from sklearn.model_selection import KFold, GridSearchCV, KFold, RepeatedKFold
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import train_test_split
import time, warnings
#from optuna.visualization.matplotlib import plot_param_importances

warnings.filterwarnings('ignore')

In [3]:
train = pd.read_csv('../input/tabular-playground-series-jan-2021/train.csv')
test  = pd.read_csv('../input/tabular-playground-series-jan-2021/test.csv')
sub = pd.read_csv('../input/tabular-playground-series-jan-2021/sample_submission.csv')

train.head()

Unnamed: 0,id,cont1,cont2,cont3,cont4,cont5,cont6,cont7,cont8,cont9,cont10,cont11,cont12,cont13,cont14,target
0,1,0.67039,0.8113,0.643968,0.291791,0.284117,0.855953,0.8907,0.285542,0.558245,0.779418,0.921832,0.866772,0.878733,0.305411,7.243043
1,3,0.388053,0.621104,0.686102,0.501149,0.64379,0.449805,0.510824,0.580748,0.418335,0.432632,0.439872,0.434971,0.369957,0.369484,8.203331
2,4,0.83495,0.227436,0.301584,0.293408,0.606839,0.829175,0.506143,0.558771,0.587603,0.823312,0.567007,0.677708,0.882938,0.303047,7.776091
3,5,0.820708,0.160155,0.546887,0.726104,0.282444,0.785108,0.752758,0.823267,0.574466,0.580843,0.769594,0.818143,0.914281,0.279528,6.957716
4,8,0.935278,0.421235,0.303801,0.880214,0.66561,0.830131,0.487113,0.604157,0.874658,0.863427,0.983575,0.900464,0.935918,0.435772,7.951046


In [4]:
print(train.shape)
columns = [col for col in train.columns.to_list() if col not in ['id','target']]

data=train[columns]
target=train['target']

(300000, 16)


# 2. Let's build our optimization function using optuna <a class="anchor" id="chapter2"></a>

### The following optimization function uses XGBoostRegressor model, so it takes the following arguments:
* the data
* the target
* trial (How many executions we will do)  
### and returns
* RMSE (Root Mean Squared Rrror)

## Notes:
* Note that I used some XGBoostRegressor hyperparameters from Xgboost official site. 
* So if you like to add more parameters or change them, check this [link](https://xgboost.readthedocs.io/en/latest/parameter.html) 
* Also I used early_stopping_rounds to avoid overfiting
* to speedup the training process we can use the GPU or you can comment the first param argument (the training process will takes a lot of time by only using the cpu 😩) 

# 3. XGBoost using Optuna <a class="anchor" id="chapter3"></a>

In [5]:
def objective(trial,data=data,target=target):
    
    train_x, test_x, train_y, test_y = train_test_split(data, target, test_size=0.15,random_state=42)
    param = {
        'tree_method':'gpu_hist',  # this means using the GPU to speedup the training process
        'lambda': trial.suggest_loguniform('lambda', 1e-3, 10.0),
        'alpha': trial.suggest_int('alpha', 10, 10),
        'colsample_bytree': trial.suggest_categorical('colsample_bytree', [0.4,0.5,0.6,0.7,0.8]),
        'subsample': trial.suggest_categorical('subsample', [0.3,0.4,0.5,0.6,0.7,0.8,1]),
        'learning_rate': trial.suggest_categorical('learning_rate', 
                        [0.02,0.025,0.03,0.035,0.04,0.05]),
        'n_estimators': 300,
        'max_depth': trial.suggest_categorical('max_depth', [4,6,8,10,12]),
        'random_state': trial.suggest_categorical('random_state', [2020]),
        'min_child_weight': trial.suggest_int('min_child_weight', 200, 400),
    }
    model = XGBRegressor(**param)  
    
    model.fit(train_x,train_y,eval_set=[(test_x,test_y)],early_stopping_rounds=50,verbose=False)
    
    preds = model.predict(test_x)
    
    rmse = mean_squared_error(test_y, preds,squared=False)
    
    return rmse

## All thing is ready So let's start 🏄‍
* Note that the objective of our fuction is to minimize the RMSE that's why I set <b>direction='minimize'</b>
* you can modify n_trials (number of executions) 

In [6]:
time1 = time.time()
study = optuna.create_study(direction='minimize')
study.optimize(objective, n_trials=10)
print('Number of finished trials:', len(study.trials))
print('Best trial:', study.best_trial.params)
print('Time to run Optuna: ', time.time()-time1)

[32m[I 2022-06-04 18:52:09,896][0m A new study created in memory with name: no-name-d310499e-ecc5-4cb4-9f84-775d493b327b[0m
[32m[I 2022-06-04 18:52:14,645][0m Trial 0 finished with value: 0.6961135401621487 and parameters: {'lambda': 0.08487537304641818, 'alpha': 10, 'colsample_bytree': 0.7, 'subsample': 0.4, 'learning_rate': 0.04, 'max_depth': 10, 'random_state': 2020, 'min_child_weight': 336}. Best is trial 0 with value: 0.6961135401621487.[0m
[32m[I 2022-06-04 18:52:17,375][0m Trial 1 finished with value: 0.6969003155166433 and parameters: {'lambda': 0.07518624098765248, 'alpha': 10, 'colsample_bytree': 0.4, 'subsample': 0.4, 'learning_rate': 0.05, 'max_depth': 8, 'random_state': 2020, 'min_child_weight': 310}. Best is trial 0 with value: 0.6961135401621487.[0m
[32m[I 2022-06-04 18:52:23,381][0m Trial 2 finished with value: 0.6982460013412469 and parameters: {'lambda': 0.007113965054491007, 'alpha': 10, 'colsample_bytree': 0.6, 'subsample': 0.8, 'learning_rate': 0.02, 'ma

Number of finished trials: 10
Best trial: {'lambda': 0.0182844415097029, 'alpha': 10, 'colsample_bytree': 0.7, 'subsample': 1, 'learning_rate': 0.05, 'max_depth': 10, 'random_state': 2020, 'min_child_weight': 319}
Time to run Optuna:  36.211008071899414


In [7]:
# compare timing with GridSearchCV on XGBoost:

X_train, X_test, y_train, y_test = train_test_split(data, target, test_size=0.15,random_state=42)

#time1 = time.time()
#xgbb = xgb.XGBRegressor(tree_method='gpu_hist', gpu_id=0)
#param_grid = {'n_estimators':[1000], 'eta':[0.008, 0.012, 0.016], 'max_depth':[4,6]}
#xgbm = GridSearchCV(xgbb, param_grid, cv=2, scoring='neg_root_mean_squared_error')
#xgbm.fit(temptrain_x, temptrain_y)
#print('XGB ', xgbm.best_params_, xgbm.best_score_, time.time()-time1)


In [8]:
kf = KFold(n_splits=4, shuffle=False, random_state=2)
for train_idx, valid_idx in kf.split(X_train, y_train):
    print(train_idx, valid_idx)

[ 63750  63751  63752 ... 254997 254998 254999] [    0     1     2 ... 63747 63748 63749]
[     0      1      2 ... 254997 254998 254999] [ 63750  63751  63752 ... 127497 127498 127499]
[     0      1      2 ... 254997 254998 254999] [127500 127501 127502 ... 191247 191248 191249]
[     0      1      2 ... 191247 191248 191249] [191250 191251 191252 ... 254997 254998 254999]




In [9]:
print(X_train.shape, y_train.shape)

(255000, 14) (255000,)


In [10]:
study.trials_dataframe()

Unnamed: 0,number,value,datetime_start,datetime_complete,duration,params_alpha,params_colsample_bytree,params_lambda,params_learning_rate,params_max_depth,params_min_child_weight,params_random_state,params_subsample,state
0,0,0.696114,2022-06-04 18:52:09.899173,2022-06-04 18:52:14.644990,0 days 00:00:04.745817,10,0.7,0.084875,0.04,10,336,2020,0.4,COMPLETE
1,1,0.6969,2022-06-04 18:52:14.646994,2022-06-04 18:52:17.375151,0 days 00:00:02.728157,10,0.4,0.075186,0.05,8,310,2020,0.4,COMPLETE
2,2,0.698246,2022-06-04 18:52:17.377366,2022-06-04 18:52:23.380972,0 days 00:00:06.003606,10,0.6,0.007114,0.02,10,257,2020,0.8,COMPLETE
3,3,0.697535,2022-06-04 18:52:23.385967,2022-06-04 18:52:26.452551,0 days 00:00:03.066584,10,0.5,0.00219,0.035,8,365,2020,0.8,COMPLETE
4,4,0.695742,2022-06-04 18:52:26.454154,2022-06-04 18:52:30.969949,0 days 00:00:04.515795,10,0.7,0.018284,0.05,10,319,2020,1.0,COMPLETE
5,5,0.696791,2022-06-04 18:52:30.971854,2022-06-04 18:52:36.393449,0 days 00:00:05.421595,10,0.8,0.351967,0.025,12,311,2020,0.5,COMPLETE
6,6,0.703829,2022-06-04 18:52:36.395216,2022-06-04 18:52:37.587388,0 days 00:00:01.192172,10,0.4,1.223062,0.05,4,388,2020,1.0,COMPLETE
7,7,0.698129,2022-06-04 18:52:37.590177,2022-06-04 18:52:40.767488,0 days 00:00:03.177311,10,0.7,0.002422,0.03,8,274,2020,1.0,COMPLETE
8,8,0.704032,2022-06-04 18:52:40.769124,2022-06-04 18:52:42.981718,0 days 00:00:02.212594,10,0.8,0.004568,0.02,6,281,2020,0.4,COMPLETE
9,9,0.700566,2022-06-04 18:52:42.983455,2022-06-04 18:52:46.098812,0 days 00:00:03.115357,10,0.5,0.065198,0.02,8,339,2020,0.6,COMPLETE


# Let's do some Quick Visualization for Hyperparameter Optimization Analysis
#### Optuna provides various visualization features in optuna.visualization to analyze optimization results visually

In [46]:
#plot_optimization_histor: shows the scores from all trials as well as the best score so far at each point.
optuna.visualization.plot_optimization_history(study)

In [47]:
#plot_parallel_coordinate: interactively visualizes the hyperparameters and scores
optuna.visualization.plot_parallel_coordinate(study)

In [48]:
'''plot_slice: shows the evolution of the search. You can see where in the hyperparameter space your search
went and which parts of the space were explored more.'''
optuna.visualization.plot_slice(study)

In [49]:
#plot_contour: plots parameter interactions on an interactive chart. You can choose which hyperparameters you would like to explore.
optuna.visualization.plot_contour(study, params=['alpha',
                            #'max_depth',
                            'lambda',
                            'subsample',
                            'learning_rate',
                            'subsample'])

[33m[W 2022-06-04 15:04:19,979][0m Param alpha unique value length is less than 2.[0m
[33m[W 2022-06-04 15:04:19,999][0m Param alpha unique value length is less than 2.[0m
[33m[W 2022-06-04 15:04:20,013][0m Param alpha unique value length is less than 2.[0m
[33m[W 2022-06-04 15:04:20,036][0m Param alpha unique value length is less than 2.[0m
[33m[W 2022-06-04 15:04:20,111][0m Param alpha unique value length is less than 2.[0m
[33m[W 2022-06-04 15:04:20,187][0m Param alpha unique value length is less than 2.[0m


In [16]:
#Visualize parameter importances.
optuna.visualization.plot_param_importances(study)

In [51]:
#Visualize empirical distribution function
optuna.visualization.plot_edf(study)

### Scikit-learn way to run Optuna

It seems to be too raw to be reliable...

In [28]:
X_train

Unnamed: 0,cont1,cont2,cont3,cont4,cont5,cont6,cont7,cont8,cont9,cont10,cont11,cont12,cont13,cont14
54807,0.271308,0.619347,0.580420,0.245118,0.998111,0.346283,0.433870,0.399867,0.112511,0.339100,0.347212,0.306502,0.386577,0.852761
214026,0.256430,0.422118,0.669676,0.835266,0.320534,0.345452,0.436800,0.662903,0.266653,0.244176,0.209488,0.293948,0.310545,0.291418
265476,0.904672,0.277264,0.150361,0.468047,0.440187,0.968863,0.496451,0.525797,0.944639,0.849013,0.690002,0.851408,0.908065,0.388172
76732,0.499994,0.786427,0.770304,0.271402,0.281790,0.671330,0.745339,0.317400,0.517296,0.489400,0.729259,0.714750,0.787323,0.702515
143667,0.475216,0.800231,0.672264,0.317738,0.731492,0.746349,0.548399,0.403514,0.423718,0.814599,0.730931,0.740659,0.751890,0.596217
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
119879,0.602387,0.739987,0.362214,0.636481,0.920860,0.945076,0.850839,0.834863,0.576828,0.753136,0.961337,0.738973,0.766339,0.690419
259178,0.347795,0.553133,0.551271,0.318879,0.281284,0.346903,0.496220,0.346433,0.143852,0.393676,0.395426,0.377345,0.640165,0.604640
131932,0.321901,0.737304,0.889965,0.277251,0.282989,0.394406,0.539155,0.452907,0.211294,0.495081,0.803394,0.545659,0.279963,0.824950
146867,0.617149,0.811662,0.310680,0.618253,0.516217,0.961608,0.990442,0.872274,0.578402,0.845213,0.810798,0.642220,0.905399,0.291935


In [17]:
time1 = time.time()
xgbb = xgb.XGBRegressor(tree_method='gpu_hist', gpu_id=0, alpha=10, n_estimators=500)
param_distributions = {
        'lambda': optuna.distributions.LogUniformDistribution(1e-3, 10.0),
        'colsample_bytree': optuna.distributions.CategoricalDistribution([0.3,0.4,0.5,0.6,0.7,0.8,1]),
        'subsample': optuna.distributions.CategoricalDistribution([0.3,0.4,0.5,0.6,0.7,0.8,1]),
        'learning_rate': optuna.distributions.CategoricalDistribution(
                        [0.014,0.016,0.018,0.02,0.025,0.03,0.035,0.04,0.05]),
        'max_depth': optuna.distributions.CategoricalDistribution([4,6,8,10,12,15,18,21]),
        'random_state': optuna.distributions.CategoricalDistribution([2020]),
        'min_child_weight': optuna.distributions.UniformDistribution(200, 400),
    }
optuna_search = optuna.integration.OptunaSearchCV(xgbb, param_distributions)
optuna_search.fit(temptrain_x, temptrain_y)
print(time1 - time.time())

-0.006150007247924805


  del sys.path[0]


### Try Optuna with cross-validation

see this: https://aetperf.github.io/2021/02/16/Optuna-+-XGBoost-on-a-tabular-dataset.html

In [11]:
def evaluate_model_rkf(model, X_df, y_df, n_splits=4, random_state=2):
    X_values = X_df.values
    y_values = y_df.values
    rkf = KFold(n_splits=n_splits, random_state=random_state)
    y_pred = np.zeros_like(y_values)
    for train_index, test_index in rkf.split(X_values):
        X_t, X_v = X_values[train_index, :], X_values[test_index, :]
        y_t = y_values[train_index]
        model.fit(
            X_t, y_t,
        )
        y_pred[test_index] += model.predict(X_v)
    y_pred
    return np.sqrt(mean_squared_error(y_train, y_pred))

In [12]:
model = XGBRegressor(tree_method = 'gpu_hist', gpu_id=0, max_depth=8, eta=0.03, n_estimators=200)
evaluate_model_rkf(model, X_train, y_train, n_splits=4, random_state=2)



0.7038241995767399

In [21]:
def objective(trial, random_state=2, n_splits=4, n_jobs=-1, early_stopping_rounds=50):
    params = {
        "tree_method": 'gpu_hist',
        "gpu_id": 0,
        "verbosity": 0,  # 0 (silent) - 3 (debug)
        "objective": "reg:squarederror",
        "n_estimators": 200,
        "max_depth": trial.suggest_int("max_depth", 2, 12),
        "learning_rate": trial.suggest_loguniform("learning_rate", 0.005, 0.06),
        "colsample_bytree": trial.suggest_loguniform("colsample_bytree", 0.3, 0.8),
        "subsample": trial.suggest_loguniform("subsample", 0.4, 1),
        "alpha": trial.suggest_loguniform("alpha", 0.01, 10.0),
        "lambda": trial.suggest_loguniform("lambda", 1e-8, 10.0),
        "gamma": trial.suggest_loguniform("lambda", 1e-8, 10.0),
        "min_child_weight": trial.suggest_loguniform("min_child_weight", 10, 1000),
        "seed": random_state,
        "n_jobs": n_jobs,
    }

    X = X_train
    y = y_train
    
    model = XGBRegressor(**params)
    rkf = KFold(n_splits=n_splits, random_state=random_state)
    X_values = X.values
    y_values = y.values
    y_pred = np.zeros_like(y_values)
    for train_index, test_index in rkf.split(X_values):
        X_A, X_B = X_values[train_index, :], X_values[test_index, :]
        y_A, y_B = y_values[train_index], y_values[test_index]
        model.fit(X_A, y_A, eval_set=[(X_B, y_B)],
            eval_metric="rmse", early_stopping_rounds=early_stopping_rounds, verbose = False)
        y_pred[test_index] += model.predict(X_B)
    return np.sqrt(mean_squared_error(y_train, y_pred, squared=False))

In [None]:
time1 = time.time()
study = optuna.create_study(direction="minimize")
study.optimize(objective, n_trials=20)
print('Total time ', time.time()-time1)

[32m[I 2022-06-04 19:06:48,083][0m A new study created in memory with name: no-name-dc8d6e26-cbce-4a48-b779-63ee1fe05ad9[0m
[32m[I 2022-06-04 19:06:53,319][0m Trial 0 finished with value: 1.1626604161292025 and parameters: {'max_depth': 5, 'learning_rate': 0.009287685264487804, 'colsample_bytree': 0.6648657120976464, 'subsample': 0.5171172775266405, 'alpha': 0.6442659231252804, 'lambda': 4.90607273924417e-06, 'min_child_weight': 218.2348324954758}. Best is trial 0 with value: 1.1626604161292025.[0m
[32m[I 2022-06-04 19:06:58,416][0m Trial 1 finished with value: 1.6353790670968888 and parameters: {'max_depth': 5, 'learning_rate': 0.005267420768531392, 'colsample_bytree': 0.40801182644482376, 'subsample': 0.8633382362327411, 'alpha': 0.010077376973542823, 'lambda': 0.027567066542012197, 'min_child_weight': 763.0213542104512}. Best is trial 0 with value: 1.1626604161292025.[0m
[32m[I 2022-06-04 19:07:07,387][0m Trial 2 finished with value: 0.9467905366031112 and parameters: {'m

In [None]:
# display params
hp = study.best_params
for key, value in hp.items():
    print(f"{key:>20s} : {value}")
print(f"{'best objective value':>20s} : {study.best_value}")

[32m[I 2022-06-04 16:51:34,439][0m A new study created in memory with name: no-name-1a59d2a1-2743-44b5-8b43-f7a07d68a943[0m


[0]	validation_0-rmse:7.29564
Will train until validation_0-rmse hasn't improved in 50 rounds.
[0]	validation_0-rmse:7.37502
Will train until validation_0-rmse hasn't improved in 50 rounds.
[1]	validation_0-rmse:7.15434
[1]	validation_0-rmse:7.31072
[2]	validation_0-rmse:7.01580
[2]	validation_0-rmse:7.24704
[3]	validation_0-rmse:6.88005
[3]	validation_0-rmse:7.18391
[4]	validation_0-rmse:6.74698
[4]	validation_0-rmse:7.12136
[5]	validation_0-rmse:6.61651
[5]	validation_0-rmse:7.05940
[6]	validation_0-rmse:6.48862
[6]	validation_0-rmse:6.99786
[7]	validation_0-rmse:6.36320
[7]	validation_0-rmse:6.93699
[8]	validation_0-rmse:6.24030
[8]	validation_0-rmse:6.87658
[9]	validation_0-rmse:6.11984
[9]	validation_0-rmse:6.81675
[10]	validation_0-rmse:6.00176
[10]	validation_0-rmse:6.75746
[11]	validation_0-rmse:5.88607
[11]	validation_0-rmse:6.69867
[12]	validation_0-rmse:5.77260
[12]	validation_0-rmse:6.64045
[13]	validation_0-rmse:5.66141
[13]	validation_0-rmse:6.58270
[14]	validation_0-rmse

In [24]:
train_x.values

array([[0.27130786, 0.6193467 , 0.58042033, ..., 0.30650217, 0.38657695,
        0.85276088],
       [0.25643036, 0.4221183 , 0.669676  , ..., 0.29394827, 0.31054504,
        0.29141789],
       [0.90467162, 0.27726399, 0.15036077, ..., 0.85140828, 0.90806536,
        0.38817156],
       ...,
       [0.32190082, 0.73730373, 0.88996544, ..., 0.54565879, 0.27996302,
        0.82494994],
       [0.61714883, 0.81166156, 0.31068042, ..., 0.6422202 , 0.90539935,
        0.29193467],
       [0.7362509 , 0.62114242, 0.34387394, ..., 0.88526466, 0.82528192,
        0.42100269]])

In [None]:
def objective(trial: Trial, fast_check=True, target_meter=0, return_info=False):
    
    param = {
        'tree_method':'gpu_hist',  'gpu_id':0,
        'lambda': trial.suggest_loguniform('lambda', 1e-3, 10.0),
        'alpha': trial.suggest_int('alpha', 1, 100),
        'colsample_bytree': trial.suggest_categorical('colsample_bytree', [0.4,0.5,0.6,0.7,0.8]),
        'subsample': trial.suggest_categorical('subsample', [0.3,0.4,0.5,0.6,0.7,0.8,1]),
        'learning_rate': trial.suggest_categorical('learning_rate', 
                        [0.02,0.025,0.03,0.035,0.04,0.05]),
        'n_estimators': 300,
        'max_depth': trial.suggest_categorical('max_depth', [4,6,8,10,12]),
        'random_state': trial.suggest_categorical('random_state', [2020]),
        'min_child_weight': trial.suggest_int('min_child_weight', 200, 400),
    }
    model = xgb.XGBRegressor(**param) 
    
    
    folds = 5
    seed = 666
    kf = KFold(n_splits=folds, shuffle=False, random_state=seed)

    valid_score = 0
    for train_idx, valid_idx in kf.split(X_train, y_train):
        train_data = X_train.iloc[train_idx,:], y_train[train_idx]
        valid_data = X_train.iloc[valid_idx,:], y_train[valid_idx]

        print('train', len(train_idx), 'valid', len(valid_idx))
        model, y_pred_valid, log = fit_lgbm(trial, train_data, valid_data)
        y_valid_pred_total[valid_idx] = y_pred_valid
        models.append(model)
        gc.collect()
        valid_score += log["valid/l2"]
        if fast_check:
            break
    valid_score /= len(models)
    if return_info:
        return valid_score, models, y_pred_valid, y_train
    else:
        return valid_score

# Let's create an XGBoostRegressor model with the best hyperparameters

In [None]:
Best_trial = study.best_trial.params
Best_trial["n_estimators"], Best_trial["tree_method"] = 10000, 'gpu_hist'
Best_trial

In [None]:
preds = np.zeros(test.shape[0])
kf = KFold(n_splits=5,random_state=48,shuffle=True)
rmse=[]  # list contains rmse for each fold
n=0
for trn_idx, test_idx in kf.split(train[columns],train['target']):
    X_tr,X_val=train[columns].iloc[trn_idx],train[columns].iloc[test_idx]
    y_tr,y_val=train['target'].iloc[trn_idx],train['target'].iloc[test_idx]
    model = xgb.XGBRegressor(**Best_trial)
    model.fit(X_tr,y_tr,eval_set=[(X_val,y_val)],early_stopping_rounds=100,verbose=False)
    preds+=model.predict(test[columns])/kf.n_splits
    rmse.append(mean_squared_error(y_val, model.predict(X_val), squared=False))
    print(f"fold: {n+1} ==> rmse: {rmse[n]}")
    n+=1

In [None]:
np.mean(rmse)

# 5. Submission <a class="anchor" id="chapter5"></a>

In [None]:
sub['target']=preds
sub.to_csv('submission.csv', index=False)