# Extreme Fine Tuning of LGBM using Incremental training


In my efforts to push leaderboard i stumbled across a small trick to improve predictions in 4th to 5th decimal using same parameters and a single model, essentially it is a trick to improve prediction of your best parameter, squeezing more out of them!!. Trick is executed in following steps:

* Find the best parameters for your LGBM, manually or using optimization methods of your choice.


* train the model to the best RMSE you can get in one training round using high early stopping.


* train the model for 1 or 2 rounds with reduced learning rate.


* once the first few rounds are over, start reducing regularization params by a factor at each incremental training iteration, you will start observing improvements in 5th decimal place... which is enough to get 5th decimal improvement on your models leaderboard score.

At the top of leaderboard this make a huge difference, i pushed my rank from `39` at **0.84202** to my best `6th place`(17th Feb 2021) with **0.84193**

Lets check out.

In [7]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.model_selection import KFold, GridSearchCV, cross_validate, train_test_split
from sklearn.metrics import mean_squared_error
from sklearn.preprocessing import LabelEncoder

from lightgbm import LGBMRegressor

import optuna
from functools import partial

import warnings
warnings.filterwarnings('ignore')

In [8]:
train = pd.read_csv('..\\kaggle_data\\train.csv')
test = pd.read_csv('..\\kaggle_data\\test.csv')

In [9]:
X_train = train.drop(['id', 'target'], axis=1)
y_train = train.target
X_test = test.drop(['id'], axis=1)

In [10]:
cat_cols = [feature for feature in train.columns if 'cat' in feature]

def label_encoder(df):
    for feature in cat_cols:
        le = LabelEncoder()
        le.fit(df[feature])
        df[feature] = le.transform(df[feature])
    return df

In [11]:
X_train = label_encoder(X_train)
X_test = label_encoder(X_test)

In [12]:
split = KFold(n_splits=5)

In [13]:
lgbm_params = {'max_depth': 16, 
                'subsample': 0.8032697250789377, 
                'colsample_bytree': 0.21067140508531404, 
                'learning_rate': 0.009867383057779643,
                'reg_lambda': 10.987474846877767, 
                'reg_alpha': 17.335285595031994, 
                'min_child_samples': 31, 
                'num_leaves': 66, 
                'max_bin': 522, 
                'cat_smooth': 81, 
                'cat_l2': 0.029690334194270022, 
                'metric': 'rmse', 
                'n_jobs': -1, 
                'n_estimators': 20000}

In [14]:
preds_list_base = []
preds_list_final_iteration = []
preds_list_all = []

for train_idx, val_idx in split.split(X_train):
            X_tr = X_train.iloc[train_idx]
            X_val = X_train.iloc[val_idx]
            y_tr = y_train.iloc[train_idx]
            y_val = y_train.iloc[val_idx]
            
            Model = LGBMRegressor(**lgbm_params).fit(X_tr, y_tr, eval_set=[(X_val, y_val)],
                          eval_metric=['rmse'],
                          early_stopping_rounds=250, 
                          categorical_feature=[0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
                          #callbacks=[optuna.integration.LightGBMPruningCallback(trial, metric='rmse')],
                          verbose=0)
            
            preds_list_base.append(Model.predict(X_test))
            preds_list_all.append(Model.predict(X_test))
            print(f'RMSE for Base model is {np.sqrt(mean_squared_error(y_val, Model.predict(X_val)))}')
            first_rmse = np.sqrt(mean_squared_error(y_val, Model.predict(X_val)))
            params = lgbm_params.copy()
            
            for i in range(1, 8):
                if i >2:    
                    
                    # reducing regularizing params if 
                    
                    params['reg_lambda'] *= 0.9
                    params['reg_alpha'] *= 0.9
                    params['num_leaves'] += 40
                    
                params['learning_rate'] = 0.003
                Model = LGBMRegressor(**params).fit(X_tr, y_tr, eval_set=[(X_val, y_val)],
                          eval_metric=['rmse'],
                          early_stopping_rounds=200, 
                          categorical_feature=[0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
                          #callbacks=[optuna.integration.LightGBMPruningCallback(trial, metric='rmse')],
                          verbose=0,
                          init_model=Model)
                
                preds_list_all.append(Model.predict(X_test))
                print(f'RMSE for Incremental trial {i} model is {np.sqrt(mean_squared_error(y_val, Model.predict(X_val)))}')
            last_rmse = np.sqrt(mean_squared_error(y_val, Model.predict(X_val)))
            print('',end='\n\n')
            print(f'Improvement of : {first_rmse - last_rmse}')
            print('-' * 100)
            preds_list_final_iteration.append(Model.predict(X_test))

RMSE for Base model is 0.8417931738282095
RMSE for Incremental trial 1 model is 0.8417915156558853
RMSE for Incremental trial 2 model is 0.8417684019997734
RMSE for Incremental trial 3 model is 0.8417660944976237
RMSE for Incremental trial 4 model is 0.8417479412660887
RMSE for Incremental trial 5 model is 0.8417466948697506
RMSE for Incremental trial 6 model is 0.8417400362504809
RMSE for Incremental trial 7 model is 0.8417363457753021


Improvement of : 5.682805290740944e-05
----------------------------------------------------------------------------------------------------
RMSE for Base model is 0.8414227087407661
RMSE for Incremental trial 1 model is 0.8414146015100101
RMSE for Incremental trial 2 model is 0.8414112084375642
RMSE for Incremental trial 3 model is 0.8414093495351639
RMSE for Incremental trial 4 model is 0.8414090703491379
RMSE for Incremental trial 5 model is 0.8414084284185339
RMSE for Incremental trial 6 model is 0.8414080942426032
RMSE for Incremental trial 7 mode

Great!! we can see that we have observed some further improvement in all the folds. Lets point out few findings:

* The first few iterations are just using very low learning_rate.. after the 2nd iteration we can see that there are iterations with very good improvement, observed by reducing regularization.


* There are also iterations where loss increased at later iterations slightly compared to previous iteration, showing that we have reached the limit in few iterations before the max iteration.


* If you try setting verbose=1, you will observe that these improvements are observed only in first few trees created... after that loss starts to increase, LGBM keeps the best model. But reducing regularization does improve loss for first few trees!!!!

I have 3 different sets of predictions, one for only the base model and one for all the predictions done and last one for only final iteration.

* `y_preds_base` : **0.84196 - 0.84199** (keeps jumping between these)


* `y_preds_all` : **0.84195 - 0.84196**


* `y_preds_final_iteration` : **0.84193**

In [15]:
y_preds_base = np.array(preds_list_base).mean(axis=0)
y_preds_base

array([7.61265617, 7.78756299, 7.60367127, ..., 7.53725979, 7.50144961,
       7.27430572])

In [16]:
y_preds_all = np.array(preds_list_all).mean(axis=0)
y_preds_all

array([7.61422684, 7.78663938, 7.60104474, ..., 7.5370382 , 7.50130437,
       7.27273157])

In [17]:
y_preds_final_iteration = np.array(preds_list_final_iteration).mean(axis=0)
y_preds_final_iteration

array([7.61448924, 7.78590933, 7.5991315 , ..., 7.53653038, 7.50137286,
       7.27124171])

In [18]:
submission = pd.DataFrame({'id':test.id,
              'target':y_preds_final_iteration})

In [19]:
submission.to_csv('submission.csv', index=False)

In [20]:
pd.read_csv('submission.csv')

Unnamed: 0,id,target
0,0,7.614489
1,5,7.785909
2,15,7.599132
3,16,7.526873
4,17,7.258177
...,...,...
199995,499987,7.492011
199996,499990,7.247003
199997,499991,7.536530
199998,499994,7.501373


Finally, i am still working on and experimenting why this actually works... 

**Although a small trick this work has been a hardwork of few days, so if you like the work and find it useful, show your support by upvoting!!** 