# Spaceship. Part 5. (continued)
## Hyperparameters tuning 

Here we'll proceed with hyperparameters searching process described in ['05_hyperparameters.ipynb'](05_hyperparameters.ipynb).\

This notebook can be re-run over and over to continue searching.

Choose running time:

In [210]:
HOURS = 0
MINUTES = 20
SECONDS = 0

RUNNING_TIME = HOURS * 3600 + MINUTES * 60 + SECONDS

Let's load our data, our Optuna study and define all the nesessary functions.

We'll put n_estimators in the search to 90 for speed. Greater numbers may increase scores. For the scores table and submissions we'll use 500 estimators.

In [211]:
# Random seed for reproducibility
SEED = 123

import pandas as pd
import numpy as np
from sklearn.metrics import roc_auc_score
from sklearn.model_selection import StratifiedKFold
from sklearn.metrics import roc_auc_score
from sklearn.model_selection import StratifiedKFold
from sklearn.ensemble import RandomForestClassifier
import joblib
import optuna
import optuna.visualization as vis

train = pd.read_csv('04_train_prepared.csv', index_col=0)
test =  pd.read_csv('04_test_prepared.csv', index_col=0)
scores_df = pd.read_csv('05_scores_df.csv', index_col=0)
test_Ids = pd.read_csv('test_Ids.csv', index_col=0).reset_index(drop=True)

train['Transported'] = [1 if i else 0 for i in train['Transported']]

study = joblib.load("05_RF.pkl")
total_seconds = pd.read_csv('05_total_seconds.csv', index_col=0)

print('Before current session: ')
print("Best trial:", study.best_trial.number)
print("Best average cross-validation ROC AUC:", study.best_trial.value)
print("Best hyperparameters:", study.best_params)


def train_evaluate(params):
    '''
    This function takes  parameters for a classifier.
    
    It returns average cross-validated ROC AUC score.
    '''
    
    # Prepare our best estimator for training
    from sklearn.ensemble import RandomForestClassifier
    model = RandomForestClassifier(random_state=SEED,
                               n_estimators= 100,
                               n_jobs=-1
                               )


    # Set parameters for the model
    model.set_params(**params)
    
    # Create a StratifiedKFold object (6 splits with equal proportion of positive target values)
    skf = StratifiedKFold(n_splits=6, shuffle=True, random_state=SEED)
    
    # An empty list for collecting scores
    test_roc_auc_scores = []
    
    # Iterate through folds
    for train_index, cv_index in skf.split(train.drop('Transported', axis=1), train['Transported']):
        # Obtain training and testing folds
        cv_train, cv_test = train.iloc[train_index], train.iloc[cv_index]
        
        # Fit the model
        model.fit(cv_train.drop('Transported', axis=1), cv_train['Transported']) 
        
        # Calculate ROC AUC score and append to the scores lists
        test_pred_proba = model.predict_proba(cv_test.drop('Transported', axis=1))[:, 1]
        test_roc_auc_scores.append(roc_auc_score(cv_test['Transported'], test_pred_proba))
        
    return np.mean(test_roc_auc_scores)
        

Before current session: 
Best trial: 11
Best average cross-validation ROC AUC: 0.8822049875726843
Best hyperparameters: {'criterion': 'gini', 'max_depth': 48, 'max_features': 8, 'max_leaf_nodes': 433, 'min_impurity_decrease': 5.23655263053275e-08, 'min_samples_leaf': 2, 'ccp_alpha': 1.1113424607141138e-07, 'max_samples': 0.686348320267558}


In [212]:
def objective(trial):
    params = {
        # 'n_estimators': optuna.distributions.IntDistribution(100, 1000),
        # 'criterion': optuna.distributions.CategoricalDistribution(['log_loss', 'entropy']),
        'criterion': trial.suggest_categorical('criterion', ['log_loss', 'gini']),
        'max_depth': trial.suggest_int('max_depth', 2, 50),
        'max_features': trial.suggest_int('max_features', 1, 16),
        'max_leaf_nodes': trial.suggest_int('max_leaf_nodes', 20, 500),
        "min_impurity_decrease": trial.suggest_float("min_impurity_decrease", 1e-9, 1e-1, log=True),
        'min_samples_leaf': trial.suggest_int('min_samples_leaf', 2, 30),
        'ccp_alpha': trial.suggest_float('ccp_alpha', 1e-7, 4e-1, log=True),
        'max_samples': trial.suggest_float('max_samples', 0.3, 1)
             
         }
    return train_evaluate(params)

In [213]:
def get_cv_scores(train, test, model, scores_df, comment = "", verbose=False, prepare_submission=False):
    
    '''
    This function takes train and test sets, as well as a model for cross validation and a DataFrame with previous scores.
    It also takes an optional comment string to comment changes.
    
    Setting verbose to True makes function printing out updated scores.

    
    It returns:
        
        -) Updated DataFrame with new:
            1) Average training ROC AUC score.
            2) Average cross-validation ROC AUC score.
            3) Average training accuracy score. 
            4) Average cross-validation accuracy score.
        
        -) A dataset for a new submission, if prepare_submission is True
    '''
    
    # Create a StratifiedKFold object (6 splits with equal proportion of positive target values)
    skf = StratifiedKFold(n_splits=6, shuffle=True, random_state=SEED)
    
    # Empty lists for collecting scores
    train_roc_auc_scores = []
    cv_roc_auc_scores = []
    train_accuracy_scores = []
    cv_accuracy_scores = []
    
    # Iterate through folds
    for train_index, cv_index in skf.split(train.drop('Transported', axis=1), train['Transported']):
        # Obtain training and testing folds
        cv_train, cv_test = train.iloc[train_index], train.iloc[cv_index]
        
        # Fit the model
        model.fit(cv_train.drop('Transported', axis=1), cv_train['Transported']) 
        
        # Calculate scores and append to the scores lists
        train_pred_proba = model.predict_proba(cv_train.drop('Transported', axis=1))[:, 1]
        train_roc_auc_scores.append(roc_auc_score(cv_train['Transported'], train_pred_proba))
        cv_pred_proba = model.predict_proba(cv_test.drop('Transported', axis=1))[:, 1]
        cv_roc_auc_scores.append(roc_auc_score(cv_test['Transported'], cv_pred_proba))
        train_accuracy_scores.append(model.score(cv_train.drop('Transported', axis=1), cv_train['Transported']))
        cv_accuracy_scores.append(model.score(cv_test.drop('Transported', axis=1), cv_test['Transported']))
        

    # Update the scores DataFrame with average scores:
    
    scores_df.loc[len(scores_df)] = [comment, np.mean(train_roc_auc_scores), np.mean(cv_roc_auc_scores), \
                                     np.mean(train_accuracy_scores), np.mean(cv_accuracy_scores), np.nan]
    #scores_df.index = scores_df.index + 1
    #scores_df.sort_index()
    
    # Print the updated scores DataFrame
    if verbose:
        print(scores_df)
        
    submission = "prepare_submission=False"
        
    if prepare_submission:
    
        # Prepare the submission DataFrame
        test_pred = model.predict(test)
        test_pred = ["True" if i == 1 else "False" for i in test_pred]
        test_pred = pd.DataFrame(test_pred, columns=['Transported'])
        submission = pd.concat([test_Ids, test_pred], axis=1)

    
    return submission
                         

Now, let's optimize and observe results:

In [214]:
study.optimize(objective, timeout=RUNNING_TIME, n_jobs=-1)

[I 2023-07-29 01:04:44,749] Trial 24 finished with value: 0.8776490815321253 and parameters: {'criterion': 'gini', 'max_depth': 41, 'max_features': 3, 'max_leaf_nodes': 369, 'min_impurity_decrease': 1.841738033030017e-05, 'min_samples_leaf': 10, 'ccp_alpha': 4.865219190746255e-07, 'max_samples': 0.8439509818935584}. Best is trial 11 with value: 0.8822049875726843.
[I 2023-07-29 01:04:44,812] Trial 33 finished with value: 0.8771272570375244 and parameters: {'criterion': 'gini', 'max_depth': 42, 'max_features': 3, 'max_leaf_nodes': 393, 'min_impurity_decrease': 3.331390255795582e-05, 'min_samples_leaf': 11, 'ccp_alpha': 9.614421064144038e-07, 'max_samples': 0.8316424785532691}. Best is trial 11 with value: 0.8822049875726843.
[I 2023-07-29 01:04:44,828] Trial 28 finished with value: 0.8777031502934793 and parameters: {'criterion': 'gini', 'max_depth': 41, 'max_features': 3, 'max_leaf_nodes': 367, 'min_impurity_decrease': 1.984728601238456e-05, 'min_samples_leaf': 11, 'ccp_alpha': 7.17793

[I 2023-07-29 01:05:04,263] Trial 46 finished with value: 0.8818410961438468 and parameters: {'criterion': 'gini', 'max_depth': 35, 'max_features': 8, 'max_leaf_nodes': 433, 'min_impurity_decrease': 2.0970516246097184e-07, 'min_samples_leaf': 2, 'ccp_alpha': 2.303808588442855e-07, 'max_samples': 0.7699001345114612}. Best is trial 45 with value: 0.8835212825865123.
[I 2023-07-29 01:05:25,919] Trial 51 finished with value: 0.8829669716882819 and parameters: {'criterion': 'log_loss', 'max_depth': 36, 'max_features': 10, 'max_leaf_nodes': 324, 'min_impurity_decrease': 3.150593958074969e-08, 'min_samples_leaf': 8, 'ccp_alpha': 1.291804922191193e-05, 'max_samples': 0.7113891255805488}. Best is trial 45 with value: 0.8835212825865123.
[I 2023-07-29 01:05:25,945] Trial 53 finished with value: 0.8837114704354191 and parameters: {'criterion': 'log_loss', 'max_depth': 25, 'max_features': 10, 'max_leaf_nodes': 307, 'min_impurity_decrease': 1.3675604692244532e-08, 'min_samples_leaf': 7, 'ccp_alpha'

[I 2023-07-29 01:05:48,057] Trial 70 finished with value: 0.8829884242729545 and parameters: {'criterion': 'log_loss', 'max_depth': 30, 'max_features': 12, 'max_leaf_nodes': 276, 'min_impurity_decrease': 5.0415542770867115e-09, 'min_samples_leaf': 5, 'ccp_alpha': 6.0746397850187525e-06, 'max_samples': 0.6351118869803005}. Best is trial 69 with value: 0.884385498561835.
[I 2023-07-29 01:05:50,439] Trial 71 finished with value: 0.8830504566869951 and parameters: {'criterion': 'log_loss', 'max_depth': 21, 'max_features': 12, 'max_leaf_nodes': 281, 'min_impurity_decrease': 5.001083241930563e-09, 'min_samples_leaf': 5, 'ccp_alpha': 6.356312948463186e-06, 'max_samples': 0.6418473823014179}. Best is trial 69 with value: 0.884385498561835.
[I 2023-07-29 01:06:10,560] Trial 73 finished with value: 0.8829378781502997 and parameters: {'criterion': 'log_loss', 'max_depth': 30, 'max_features': 13, 'max_leaf_nodes': 278, 'min_impurity_decrease': 6.172278102464618e-09, 'min_samples_leaf': 4, 'ccp_alp

[I 2023-07-29 01:06:34,602] Trial 90 finished with value: 0.8845241859863947 and parameters: {'criterion': 'log_loss', 'max_depth': 17, 'max_features': 14, 'max_leaf_nodes': 212, 'min_impurity_decrease': 1.3701025608403912e-09, 'min_samples_leaf': 3, 'ccp_alpha': 4.028860775121203e-06, 'max_samples': 0.6017294789550123}. Best is trial 86 with value: 0.8849882932341754.
[I 2023-07-29 01:06:38,615] Trial 94 finished with value: 0.8845197469778968 and parameters: {'criterion': 'log_loss', 'max_depth': 16, 'max_features': 14, 'max_leaf_nodes': 214, 'min_impurity_decrease': 2.015037456535638e-09, 'min_samples_leaf': 3, 'ccp_alpha': 3.7317529155376906e-06, 'max_samples': 0.5888900349337702}. Best is trial 86 with value: 0.8849882932341754.
[I 2023-07-29 01:06:39,700] Trial 95 finished with value: 0.8840532646854448 and parameters: {'criterion': 'log_loss', 'max_depth': 14, 'max_features': 14, 'max_leaf_nodes': 212, 'min_impurity_decrease': 1.402836980631062e-09, 'min_samples_leaf': 3, 'ccp_a

[I 2023-07-29 01:07:22,332] Trial 108 finished with value: 0.8822644683742346 and parameters: {'criterion': 'log_loss', 'max_depth': 7, 'max_features': 15, 'max_leaf_nodes': 253, 'min_impurity_decrease': 3.021808949766333e-09, 'min_samples_leaf': 2, 'ccp_alpha': 0.0001617943418072297, 'max_samples': 0.5741375198992603}. Best is trial 100 with value: 0.8854398755282276.
[I 2023-07-29 01:07:22,352] Trial 116 finished with value: 0.8847725270115271 and parameters: {'criterion': 'log_loss', 'max_depth': 10, 'max_features': 16, 'max_leaf_nodes': 160, 'min_impurity_decrease': 2.97992858739369e-09, 'min_samples_leaf': 2, 'ccp_alpha': 0.00024235840441154833, 'max_samples': 0.5515669989044942}. Best is trial 100 with value: 0.8854398755282276.
[I 2023-07-29 01:07:22,432] Trial 117 finished with value: 0.8835585773294415 and parameters: {'criterion': 'log_loss', 'max_depth': 8, 'max_features': 16, 'max_leaf_nodes': 246, 'min_impurity_decrease': 2.947122930242282e-09, 'min_samples_leaf': 2, 'ccp_

[I 2023-07-29 01:08:07,780] Trial 141 finished with value: 0.8849354343550101 and parameters: {'criterion': 'log_loss', 'max_depth': 15, 'max_features': 13, 'max_leaf_nodes': 127, 'min_impurity_decrease': 8.656150355158794e-09, 'min_samples_leaf': 4, 'ccp_alpha': 8.914524892838188e-06, 'max_samples': 0.5004696359923675}. Best is trial 129 with value: 0.885858223679406.
[I 2023-07-29 01:08:07,840] Trial 139 finished with value: 0.8844473608691832 and parameters: {'criterion': 'log_loss', 'max_depth': 16, 'max_features': 13, 'max_leaf_nodes': 110, 'min_impurity_decrease': 8.634643432182545e-09, 'min_samples_leaf': 4, 'ccp_alpha': 2.9023071078023915e-05, 'max_samples': 0.5130822049016862}. Best is trial 129 with value: 0.885858223679406.
[I 2023-07-29 01:08:07,845] Trial 133 finished with value: 0.8852701480878812 and parameters: {'criterion': 'log_loss', 'max_depth': 10, 'max_features': 13, 'max_leaf_nodes': 146, 'min_impurity_decrease': 1.635644038785075e-09, 'min_samples_leaf': 3, 'ccp

[I 2023-07-29 01:08:49,075] Trial 163 finished with value: 0.8841921206800624 and parameters: {'criterion': 'log_loss', 'max_depth': 11, 'max_features': 14, 'max_leaf_nodes': 131, 'min_impurity_decrease': 3.848050433562675e-09, 'min_samples_leaf': 5, 'ccp_alpha': 0.0014200729711111126, 'max_samples': 0.47425167313651273}. Best is trial 129 with value: 0.885858223679406.
[I 2023-07-29 01:08:49,079] Trial 158 finished with value: 0.8839712226836928 and parameters: {'criterion': 'log_loss', 'max_depth': 12, 'max_features': 14, 'max_leaf_nodes': 88, 'min_impurity_decrease': 4.2536199679642495e-09, 'min_samples_leaf': 2, 'ccp_alpha': 0.0008182698394672754, 'max_samples': 0.47185186128562345}. Best is trial 129 with value: 0.885858223679406.
[I 2023-07-29 01:08:49,084] Trial 159 finished with value: 0.8847261238955118 and parameters: {'criterion': 'log_loss', 'max_depth': 11, 'max_features': 14, 'max_leaf_nodes': 93, 'min_impurity_decrease': 2.7849776434579716e-08, 'min_samples_leaf': 2, 'cc

[I 2023-07-29 01:09:27,880] Trial 184 finished with value: 0.8851305890586004 and parameters: {'criterion': 'log_loss', 'max_depth': 13, 'max_features': 12, 'max_leaf_nodes': 153, 'min_impurity_decrease': 1.567546580963382e-08, 'min_samples_leaf': 3, 'ccp_alpha': 0.0005024941284990409, 'max_samples': 0.5620107160896307}. Best is trial 129 with value: 0.885858223679406.
[I 2023-07-29 01:09:27,972] Trial 180 finished with value: 0.8849550303368704 and parameters: {'criterion': 'log_loss', 'max_depth': 9, 'max_features': 13, 'max_leaf_nodes': 150, 'min_impurity_decrease': 1.5734414442277777e-08, 'min_samples_leaf': 3, 'ccp_alpha': 0.0005332748982591627, 'max_samples': 0.5073570916864638}. Best is trial 129 with value: 0.885858223679406.
[I 2023-07-29 01:09:27,996] Trial 186 finished with value: 0.8848181874882325 and parameters: {'criterion': 'log_loss', 'max_depth': 13, 'max_features': 13, 'max_leaf_nodes': 137, 'min_impurity_decrease': 1.2935740260144172e-08, 'min_samples_leaf': 3, 'ccp

[I 2023-07-29 01:10:06,646] Trial 207 finished with value: 0.8851723883922741 and parameters: {'criterion': 'log_loss', 'max_depth': 13, 'max_features': 11, 'max_leaf_nodes': 164, 'min_impurity_decrease': 2.0984680314754283e-08, 'min_samples_leaf': 4, 'ccp_alpha': 0.00036790598606554044, 'max_samples': 0.5417880249207658}. Best is trial 129 with value: 0.885858223679406.
[I 2023-07-29 01:10:06,682] Trial 206 finished with value: 0.8848845975149477 and parameters: {'criterion': 'log_loss', 'max_depth': 12, 'max_features': 11, 'max_leaf_nodes': 166, 'min_impurity_decrease': 5.5099213213290137e-08, 'min_samples_leaf': 4, 'ccp_alpha': 0.0004049075115701091, 'max_samples': 0.5238746757367101}. Best is trial 129 with value: 0.885858223679406.
[I 2023-07-29 01:10:06,771] Trial 208 finished with value: 0.8849971644966329 and parameters: {'criterion': 'log_loss', 'max_depth': 12, 'max_features': 11, 'max_leaf_nodes': 164, 'min_impurity_decrease': 4.688736884920996e-08, 'min_samples_leaf': 4, 'c

[I 2023-07-29 01:10:36,600] Trial 227 finished with value: 0.8839218878951648 and parameters: {'criterion': 'log_loss', 'max_depth': 13, 'max_features': 10, 'max_leaf_nodes': 147, 'min_impurity_decrease': 1.8956951572320025e-08, 'min_samples_leaf': 10, 'ccp_alpha': 0.00024317852060954176, 'max_samples': 0.5591510359692583}. Best is trial 221 with value: 0.8859088287265645.
[I 2023-07-29 01:10:45,399] Trial 228 finished with value: 0.8850353115183767 and parameters: {'criterion': 'log_loss', 'max_depth': 13, 'max_features': 10, 'max_leaf_nodes': 147, 'min_impurity_decrease': 2.0631661697770304e-08, 'min_samples_leaf': 5, 'ccp_alpha': 0.000234811823436681, 'max_samples': 0.5746062120440457}. Best is trial 221 with value: 0.8859088287265645.
[I 2023-07-29 01:10:45,832] Trial 231 finished with value: 0.8851761814484657 and parameters: {'criterion': 'log_loss', 'max_depth': 16, 'max_features': 10, 'max_leaf_nodes': 136, 'min_impurity_decrease': 5.230429726936818e-07, 'min_samples_leaf': 3, 

[I 2023-07-29 01:11:08,917] Trial 249 finished with value: 0.8850314184780714 and parameters: {'criterion': 'log_loss', 'max_depth': 10, 'max_features': 9, 'max_leaf_nodes': 136, 'min_impurity_decrease': 3.069610441072461e-07, 'min_samples_leaf': 3, 'ccp_alpha': 0.00016904546549486034, 'max_samples': 0.5139060306143073}. Best is trial 221 with value: 0.8859088287265645.
[I 2023-07-29 01:11:13,762] Trial 250 finished with value: 0.885670213813845 and parameters: {'criterion': 'log_loss', 'max_depth': 10, 'max_features': 9, 'max_leaf_nodes': 129, 'min_impurity_decrease': 5.203937293749383e-06, 'min_samples_leaf': 2, 'ccp_alpha': 0.0001280707575589542, 'max_samples': 0.5141040007089552}. Best is trial 221 with value: 0.8859088287265645.
[I 2023-07-29 01:11:14,810] Trial 251 finished with value: 0.8847476430152654 and parameters: {'criterion': 'log_loss', 'max_depth': 10, 'max_features': 9, 'max_leaf_nodes': 132, 'min_impurity_decrease': 5.843223963781953e-07, 'min_samples_leaf': 2, 'ccp_a

[I 2023-07-29 01:11:45,459] Trial 270 finished with value: 0.8839776088747904 and parameters: {'criterion': 'gini', 'max_depth': 11, 'max_features': 9, 'max_leaf_nodes': 114, 'min_impurity_decrease': 1.1830487619443309e-07, 'min_samples_leaf': 2, 'ccp_alpha': 6.692673452476163e-05, 'max_samples': 0.5050118326882354}. Best is trial 221 with value: 0.8859088287265645.
[I 2023-07-29 01:11:47,765] Trial 272 finished with value: 0.8793878396571598 and parameters: {'criterion': 'log_loss', 'max_depth': 11, 'max_features': 9, 'max_leaf_nodes': 112, 'min_impurity_decrease': 1.7287972350713112e-07, 'min_samples_leaf': 29, 'ccp_alpha': 6.820212410563348e-05, 'max_samples': 0.5031319491306496}. Best is trial 221 with value: 0.8859088287265645.
[I 2023-07-29 01:11:47,915] Trial 273 finished with value: 0.879573159183558 and parameters: {'criterion': 'log_loss', 'max_depth': 11, 'max_features': 9, 'max_leaf_nodes': 115, 'min_impurity_decrease': 8.033511095615153e-07, 'min_samples_leaf': 29, 'ccp_al

[I 2023-07-29 01:12:22,653] Trial 294 finished with value: 0.8832795783903334 and parameters: {'criterion': 'log_loss', 'max_depth': 9, 'max_features': 8, 'max_leaf_nodes': 128, 'min_impurity_decrease': 2.1781689599687845e-06, 'min_samples_leaf': 13, 'ccp_alpha': 0.00016846637632789447, 'max_samples': 0.5332843002912797}. Best is trial 221 with value: 0.8859088287265645.
[I 2023-07-29 01:12:22,702] Trial 295 finished with value: 0.8850537549272706 and parameters: {'criterion': 'log_loss', 'max_depth': 9, 'max_features': 10, 'max_leaf_nodes': 127, 'min_impurity_decrease': 2.0723990912865127e-06, 'min_samples_leaf': 3, 'ccp_alpha': 0.00016883099369320988, 'max_samples': 0.5320526598937028}. Best is trial 221 with value: 0.8859088287265645.
[I 2023-07-29 01:12:24,515] Trial 296 finished with value: 0.8796031586948551 and parameters: {'criterion': 'log_loss', 'max_depth': 9, 'max_features': 8, 'max_leaf_nodes': 125, 'min_impurity_decrease': 1.267094280094095e-06, 'min_samples_leaf': 23, 'c

[I 2023-07-29 01:13:00,324] Trial 315 finished with value: 0.8848763484383789 and parameters: {'criterion': 'log_loss', 'max_depth': 14, 'max_features': 8, 'max_leaf_nodes': 148, 'min_impurity_decrease': 3.5250922404817506e-06, 'min_samples_leaf': 2, 'ccp_alpha': 0.00010564101723634625, 'max_samples': 0.5544387061951694}. Best is trial 221 with value: 0.8859088287265645.
[I 2023-07-29 01:13:00,617] Trial 317 finished with value: 0.8852510704752828 and parameters: {'criterion': 'log_loss', 'max_depth': 12, 'max_features': 7, 'max_leaf_nodes': 146, 'min_impurity_decrease': 1.3412761151207904e-08, 'min_samples_leaf': 2, 'ccp_alpha': 9.72578877823008e-05, 'max_samples': 0.5517690130123483}. Best is trial 221 with value: 0.8859088287265645.
[I 2023-07-29 01:13:00,768] Trial 318 finished with value: 0.8858486620867475 and parameters: {'criterion': 'log_loss', 'max_depth': 12, 'max_features': 7, 'max_leaf_nodes': 146, 'min_impurity_decrease': 2.3772037787651044e-07, 'min_samples_leaf': 2, 'cc

[I 2023-07-29 01:13:36,772] Trial 338 finished with value: 0.8842529647943754 and parameters: {'criterion': 'log_loss', 'max_depth': 13, 'max_features': 5, 'max_leaf_nodes': 136, 'min_impurity_decrease': 6.362301679390778e-07, 'min_samples_leaf': 3, 'ccp_alpha': 5.098753420588599e-05, 'max_samples': 0.6987916689443013}. Best is trial 328 with value: 0.8860749577460977.
[I 2023-07-29 01:13:36,986] Trial 339 finished with value: 0.8853139479961672 and parameters: {'criterion': 'log_loss', 'max_depth': 15, 'max_features': 10, 'max_leaf_nodes': 135, 'min_impurity_decrease': 1.3297201445648013e-05, 'min_samples_leaf': 3, 'ccp_alpha': 3.658530997643262e-05, 'max_samples': 0.5695425109989055}. Best is trial 328 with value: 0.8860749577460977.
[I 2023-07-29 01:13:37,599] Trial 343 finished with value: 0.8840050224171025 and parameters: {'criterion': 'log_loss', 'max_depth': 15, 'max_features': 6, 'max_leaf_nodes': 159, 'min_impurity_decrease': 1.4775327309856427e-05, 'min_samples_leaf': 3, 'cc

[I 2023-07-29 01:14:14,213] Trial 360 finished with value: 0.8852229876347034 and parameters: {'criterion': 'log_loss', 'max_depth': 10, 'max_features': 11, 'max_leaf_nodes': 297, 'min_impurity_decrease': 3.854814276727711e-08, 'min_samples_leaf': 2, 'ccp_alpha': 0.0005990084139834332, 'max_samples': 0.5444535162898186}. Best is trial 328 with value: 0.8860749577460977.
[I 2023-07-29 01:14:16,106] Trial 361 finished with value: 0.8851519168400861 and parameters: {'criterion': 'log_loss', 'max_depth': 10, 'max_features': 11, 'max_leaf_nodes': 118, 'min_impurity_decrease': 4.8435914307866655e-08, 'min_samples_leaf': 2, 'ccp_alpha': 0.0006188702854747041, 'max_samples': 0.7150705496734362}. Best is trial 328 with value: 0.8860749577460977.
[I 2023-07-29 01:14:16,479] Trial 362 finished with value: 0.8851406603209898 and parameters: {'criterion': 'log_loss', 'max_depth': 10, 'max_features': 11, 'max_leaf_nodes': 152, 'min_impurity_decrease': 4.0876411897275544e-08, 'min_samples_leaf': 2, '

[I 2023-07-29 01:14:50,583] Trial 383 finished with value: 0.8850366401600583 and parameters: {'criterion': 'log_loss', 'max_depth': 11, 'max_features': 11, 'max_leaf_nodes': 180, 'min_impurity_decrease': 1.0019237933963095e-07, 'min_samples_leaf': 4, 'ccp_alpha': 0.0011134989303843979, 'max_samples': 0.7998797075097436}. Best is trial 373 with value: 0.8860990165845456.
[I 2023-07-29 01:14:54,444] Trial 384 finished with value: 0.882639767375193 and parameters: {'criterion': 'log_loss', 'max_depth': 11, 'max_features': 11, 'max_leaf_nodes': 45, 'min_impurity_decrease': 1.1518963191875882e-07, 'min_samples_leaf': 4, 'ccp_alpha': 0.0010850822321188491, 'max_samples': 0.814447176913557}. Best is trial 373 with value: 0.8860990165845456.
[I 2023-07-29 01:14:56,797] Trial 387 finished with value: 0.8713134604294169 and parameters: {'criterion': 'log_loss', 'max_depth': 6, 'max_features': 11, 'max_leaf_nodes': 133, 'min_impurity_decrease': 0.003957425155902376, 'min_samples_leaf': 4, 'ccp_a

[I 2023-07-29 01:15:21,582] Trial 405 finished with value: 0.8854632223572176 and parameters: {'criterion': 'log_loss', 'max_depth': 12, 'max_features': 10, 'max_leaf_nodes': 109, 'min_impurity_decrease': 3.3567753895721374e-07, 'min_samples_leaf': 3, 'ccp_alpha': 0.0004237621676185546, 'max_samples': 0.7801033011434569}. Best is trial 373 with value: 0.8860990165845456.
[I 2023-07-29 01:15:25,232] Trial 406 finished with value: 0.8855135704263889 and parameters: {'criterion': 'log_loss', 'max_depth': 12, 'max_features': 10, 'max_leaf_nodes': 110, 'min_impurity_decrease': 6.309698554383345e-08, 'min_samples_leaf': 3, 'ccp_alpha': 8.941097474563899e-07, 'max_samples': 0.8265061356490151}. Best is trial 373 with value: 0.8860990165845456.
[I 2023-07-29 01:15:29,937] Trial 407 finished with value: 0.7860753955085751 and parameters: {'criterion': 'log_loss', 'max_depth': 12, 'max_features': 12, 'max_leaf_nodes': 107, 'min_impurity_decrease': 0.0358824256442957, 'min_samples_leaf': 3, 'ccp_

[I 2023-07-29 01:16:01,479] Trial 426 finished with value: 0.8853165094092423 and parameters: {'criterion': 'log_loss', 'max_depth': 14, 'max_features': 10, 'max_leaf_nodes': 138, 'min_impurity_decrease': 2.8335026772321723e-08, 'min_samples_leaf': 5, 'ccp_alpha': 0.000673184030581831, 'max_samples': 0.7553842647686797}. Best is trial 414 with value: 0.8861271794965077.
[I 2023-07-29 01:16:01,544] Trial 427 finished with value: 0.8849847485018127 and parameters: {'criterion': 'log_loss', 'max_depth': 14, 'max_features': 10, 'max_leaf_nodes': 97, 'min_impurity_decrease': 2.8147344892107956e-05, 'min_samples_leaf': 5, 'ccp_alpha': 0.0006546742887033856, 'max_samples': 0.7500022091427875}. Best is trial 414 with value: 0.8861271794965077.
[I 2023-07-29 01:16:02,895] Trial 429 finished with value: 0.8846920905452 and parameters: {'criterion': 'log_loss', 'max_depth': 13, 'max_features': 10, 'max_leaf_nodes': 102, 'min_impurity_decrease': 0.0010097399525512327, 'min_samples_leaf': 5, 'ccp_a

[I 2023-07-29 01:16:44,944] Trial 449 finished with value: 0.8846396135016567 and parameters: {'criterion': 'log_loss', 'max_depth': 11, 'max_features': 13, 'max_leaf_nodes': 184, 'min_impurity_decrease': 0.00014408765747663508, 'min_samples_leaf': 4, 'ccp_alpha': 3.253902473970931e-05, 'max_samples': 0.8543810521270686}. Best is trial 414 with value: 0.8861271794965077.
[I 2023-07-29 01:16:46,167] Trial 450 finished with value: 0.8854470096340116 and parameters: {'criterion': 'log_loss', 'max_depth': 11, 'max_features': 13, 'max_leaf_nodes': 130, 'min_impurity_decrease': 2.1753857867723195e-05, 'min_samples_leaf': 4, 'ccp_alpha': 2.0190331231278864e-06, 'max_samples': 0.8508274403552955}. Best is trial 414 with value: 0.8861271794965077.
[I 2023-07-29 01:16:46,557] Trial 451 finished with value: 0.8857066225082905 and parameters: {'criterion': 'log_loss', 'max_depth': 11, 'max_features': 13, 'max_leaf_nodes': 130, 'min_impurity_decrease': 0.00013148726573331278, 'min_samples_leaf': 4,

[I 2023-07-29 01:17:29,020] Trial 472 finished with value: 0.8852923817368108 and parameters: {'criterion': 'log_loss', 'max_depth': 13, 'max_features': 11, 'max_leaf_nodes': 193, 'min_impurity_decrease': 0.00011152731098918471, 'min_samples_leaf': 6, 'ccp_alpha': 1.4605873390669065e-06, 'max_samples': 0.8717860011888886}. Best is trial 414 with value: 0.8861271794965077.
[I 2023-07-29 01:17:29,704] Trial 471 finished with value: 0.8848421152347848 and parameters: {'criterion': 'log_loss', 'max_depth': 13, 'max_features': 11, 'max_leaf_nodes': 176, 'min_impurity_decrease': 0.0003461604734885361, 'min_samples_leaf': 6, 'ccp_alpha': 1.090155528188874e-06, 'max_samples': 0.8819037422195087}. Best is trial 414 with value: 0.8861271794965077.
[I 2023-07-29 01:17:31,213] Trial 476 finished with value: 0.884960816090908 and parameters: {'criterion': 'log_loss', 'max_depth': 13, 'max_features': 12, 'max_leaf_nodes': 194, 'min_impurity_decrease': 0.0003507208927464804, 'min_samples_leaf': 3, 'c

[I 2023-07-29 01:18:11,953] Trial 493 finished with value: 0.8854029743059999 and parameters: {'criterion': 'log_loss', 'max_depth': 12, 'max_features': 10, 'max_leaf_nodes': 217, 'min_impurity_decrease': 7.956868151482073e-05, 'min_samples_leaf': 4, 'ccp_alpha': 1.969902373323784e-06, 'max_samples': 0.8531598632254851}. Best is trial 414 with value: 0.8861271794965077.
[I 2023-07-29 01:18:12,146] Trial 494 finished with value: 0.8834128043210668 and parameters: {'criterion': 'log_loss', 'max_depth': 12, 'max_features': 4, 'max_leaf_nodes': 162, 'min_impurity_decrease': 4.3671455443840596e-05, 'min_samples_leaf': 3, 'ccp_alpha': 1.6257237776963578e-06, 'max_samples': 0.8577296339797587}. Best is trial 414 with value: 0.8861271794965077.
[I 2023-07-29 01:18:13,221] Trial 495 finished with value: 0.8843710252730307 and parameters: {'criterion': 'log_loss', 'max_depth': 28, 'max_features': 10, 'max_leaf_nodes': 168, 'min_impurity_decrease': 4.089827438819164e-05, 'min_samples_leaf': 3, 'c

[I 2023-07-29 01:18:44,803] Trial 515 finished with value: 0.8847957579092792 and parameters: {'criterion': 'log_loss', 'max_depth': 50, 'max_features': 10, 'max_leaf_nodes': 179, 'min_impurity_decrease': 1.8195773343728405e-05, 'min_samples_leaf': 5, 'ccp_alpha': 3.6031870496861573e-06, 'max_samples': 0.9807681764137722}. Best is trial 414 with value: 0.8861271794965077.
[I 2023-07-29 01:18:51,862] Trial 516 finished with value: 0.8853337414386363 and parameters: {'criterion': 'log_loss', 'max_depth': 11, 'max_features': 10, 'max_leaf_nodes': 183, 'min_impurity_decrease': 2.3023196773310468e-05, 'min_samples_leaf': 5, 'ccp_alpha': 7.173197998321821e-06, 'max_samples': 0.9731340223702152}. Best is trial 414 with value: 0.8861271794965077.
[I 2023-07-29 01:18:55,010] Trial 517 finished with value: 0.8843019254728041 and parameters: {'criterion': 'log_loss', 'max_depth': 8, 'max_features': 10, 'max_leaf_nodes': 230, 'min_impurity_decrease': 2.6624460094799727e-05, 'min_samples_leaf': 5, 

[I 2023-07-29 01:19:21,492] Trial 537 finished with value: 0.8862078671602917 and parameters: {'criterion': 'log_loss', 'max_depth': 10, 'max_features': 10, 'max_leaf_nodes': 157, 'min_impurity_decrease': 5.825859189299612e-05, 'min_samples_leaf': 2, 'ccp_alpha': 4.932520399650189e-06, 'max_samples': 0.9762788525621701}. Best is trial 526 with value: 0.8862751783084271.
[I 2023-07-29 01:19:24,732] Trial 538 finished with value: 0.8862246078779021 and parameters: {'criterion': 'log_loss', 'max_depth': 10, 'max_features': 10, 'max_leaf_nodes': 159, 'min_impurity_decrease': 0.00045919750794625683, 'min_samples_leaf': 2, 'ccp_alpha': 4.808096233726262e-06, 'max_samples': 0.9780049659522325}. Best is trial 526 with value: 0.8862751783084271.
[I 2023-07-29 01:19:30,899] Trial 539 finished with value: 0.8851393272115725 and parameters: {'criterion': 'log_loss', 'max_depth': 33, 'max_features': 10, 'max_leaf_nodes': 206, 'min_impurity_decrease': 0.0003793607906012372, 'min_samples_leaf': 2, 'c

[I 2023-07-29 01:20:02,135] Trial 560 finished with value: 0.8800937570533054 and parameters: {'criterion': 'log_loss', 'max_depth': 6, 'max_features': 11, 'max_leaf_nodes': 167, 'min_impurity_decrease': 5.599385534126904e-05, 'min_samples_leaf': 3, 'ccp_alpha': 1.8150716486765468e-07, 'max_samples': 0.9565344826498499}. Best is trial 526 with value: 0.8862751783084271.
[I 2023-07-29 01:20:03,477] Trial 561 finished with value: 0.8814863363342357 and parameters: {'criterion': 'log_loss', 'max_depth': 9, 'max_features': 11, 'max_leaf_nodes': 169, 'min_impurity_decrease': 0.0016157831154333976, 'min_samples_leaf': 3, 'ccp_alpha': 1.117844834518654e-07, 'max_samples': 0.9460613986512889}. Best is trial 526 with value: 0.8862751783084271.
[I 2023-07-29 01:20:06,718] Trial 562 finished with value: 0.8852920050616202 and parameters: {'criterion': 'log_loss', 'max_depth': 10, 'max_features': 11, 'max_leaf_nodes': 166, 'min_impurity_decrease': 7.860877675307511e-05, 'min_samples_leaf': 3, 'ccp

[I 2023-07-29 01:20:42,929] Trial 582 finished with value: 0.8475066339335461 and parameters: {'criterion': 'log_loss', 'max_depth': 3, 'max_features': 10, 'max_leaf_nodes': 144, 'min_impurity_decrease': 4.0141922207094795e-05, 'min_samples_leaf': 2, 'ccp_alpha': 1.2719576810025294e-05, 'max_samples': 0.9173395584063523}. Best is trial 526 with value: 0.8862751783084271.
[I 2023-07-29 01:20:43,017] Trial 584 finished with value: 0.8841941620692704 and parameters: {'criterion': 'log_loss', 'max_depth': 8, 'max_features': 10, 'max_leaf_nodes': 147, 'min_impurity_decrease': 1.1992858084056712e-07, 'min_samples_leaf': 2, 'ccp_alpha': 4.2132920107922e-06, 'max_samples': 0.6142671387111642}. Best is trial 526 with value: 0.8862751783084271.
[I 2023-07-29 01:20:44,431] Trial 585 finished with value: 0.5 and parameters: {'criterion': 'log_loss', 'max_depth': 18, 'max_features': 10, 'max_leaf_nodes': 186, 'min_impurity_decrease': 3.576602235541203e-05, 'min_samples_leaf': 2, 'ccp_alpha': 0.3414

[I 2023-07-29 01:21:23,235] Trial 604 finished with value: 0.8846204185179815 and parameters: {'criterion': 'log_loss', 'max_depth': 12, 'max_features': 9, 'max_leaf_nodes': 160, 'min_impurity_decrease': 1.966635838838464e-08, 'min_samples_leaf': 4, 'ccp_alpha': 3.788822545943792e-07, 'max_samples': 0.8190931249016368}. Best is trial 526 with value: 0.8862751783084271.
[I 2023-07-29 01:21:23,239] Trial 607 finished with value: 0.8845509853829278 and parameters: {'criterion': 'log_loss', 'max_depth': 11, 'max_features': 9, 'max_leaf_nodes': 159, 'min_impurity_decrease': 9.074985555150033e-06, 'min_samples_leaf': 4, 'ccp_alpha': 0.0014047521546236273, 'max_samples': 0.8140938018775598}. Best is trial 526 with value: 0.8862751783084271.
[I 2023-07-29 01:21:24,917] Trial 609 finished with value: 0.8849411648113658 and parameters: {'criterion': 'log_loss', 'max_depth': 11, 'max_features': 12, 'max_leaf_nodes': 159, 'min_impurity_decrease': 1.908137061092409e-08, 'min_samples_leaf': 4, 'ccp_

[I 2023-07-29 01:22:06,713] Trial 629 finished with value: 0.8761496862223973 and parameters: {'criterion': 'log_loss', 'max_depth': 9, 'max_features': 10, 'max_leaf_nodes': 175, 'min_impurity_decrease': 0.0024622221751548815, 'min_samples_leaf': 3, 'ccp_alpha': 9.07144491913942e-06, 'max_samples': 0.9540445646437996}. Best is trial 526 with value: 0.8862751783084271.
[I 2023-07-29 01:22:06,822] Trial 627 finished with value: 0.8857746567883374 and parameters: {'criterion': 'log_loss', 'max_depth': 13, 'max_features': 10, 'max_leaf_nodes': 176, 'min_impurity_decrease': 0.00019553763975697363, 'min_samples_leaf': 3, 'ccp_alpha': 1.4236999668695879e-06, 'max_samples': 0.9063460626016403}. Best is trial 526 with value: 0.8862751783084271.
[I 2023-07-29 01:22:06,895] Trial 630 finished with value: 0.885467242375593 and parameters: {'criterion': 'log_loss', 'max_depth': 9, 'max_features': 10, 'max_leaf_nodes': 136, 'min_impurity_decrease': 0.0004826244046377761, 'min_samples_leaf': 3, 'ccp_

[I 2023-07-29 01:22:48,732] Trial 650 finished with value: 0.885204885101642 and parameters: {'criterion': 'log_loss', 'max_depth': 11, 'max_features': 11, 'max_leaf_nodes': 150, 'min_impurity_decrease': 1.493386146898684e-07, 'min_samples_leaf': 2, 'ccp_alpha': 5.562728979738141e-06, 'max_samples': 0.7893778319802116}. Best is trial 647 with value: 0.8863131526614118.
[I 2023-07-29 01:22:49,580] Trial 654 finished with value: 0.8853563236889213 and parameters: {'criterion': 'log_loss', 'max_depth': 11, 'max_features': 10, 'max_leaf_nodes': 443, 'min_impurity_decrease': 1.7283399965606076e-07, 'min_samples_leaf': 2, 'ccp_alpha': 6.321958960527319e-06, 'max_samples': 0.793143560097482}. Best is trial 647 with value: 0.8863131526614118.
[I 2023-07-29 01:22:49,583] Trial 655 finished with value: 0.8643311197523277 and parameters: {'criterion': 'log_loss', 'max_depth': 10, 'max_features': 1, 'max_leaf_nodes': 150, 'min_impurity_decrease': 2.1273173714743964e-07, 'min_samples_leaf': 2, 'ccp

[I 2023-07-29 01:23:29,578] Trial 672 finished with value: 0.8828197732725753 and parameters: {'criterion': 'log_loss', 'max_depth': 7, 'max_features': 10, 'max_leaf_nodes': 109, 'min_impurity_decrease': 4.4688332335661297e-07, 'min_samples_leaf': 3, 'ccp_alpha': 4.832163388054761e-07, 'max_samples': 0.9818104537736471}. Best is trial 660 with value: 0.8864565126932223.
[I 2023-07-29 01:23:31,106] Trial 673 finished with value: 0.8844191341203258 and parameters: {'criterion': 'log_loss', 'max_depth': 8, 'max_features': 10, 'max_leaf_nodes': 122, 'min_impurity_decrease': 2.3557039303205507e-07, 'min_samples_leaf': 3, 'ccp_alpha': 0.0003767009725667108, 'max_samples': 0.9827096871069315}. Best is trial 660 with value: 0.8864565126932223.
[I 2023-07-29 01:23:31,230] Trial 675 finished with value: 0.8843387239340852 and parameters: {'criterion': 'log_loss', 'max_depth': 8, 'max_features': 10, 'max_leaf_nodes': 110, 'min_impurity_decrease': 5.956524351158507e-07, 'min_samples_leaf': 3, 'ccp

[I 2023-07-29 01:24:06,981] Trial 695 finished with value: 0.8858435582023446 and parameters: {'criterion': 'log_loss', 'max_depth': 10, 'max_features': 10, 'max_leaf_nodes': 137, 'min_impurity_decrease': 7.17244100152268e-08, 'min_samples_leaf': 2, 'ccp_alpha': 1.4090949076438867e-07, 'max_samples': 0.9307145048121876}. Best is trial 660 with value: 0.8864565126932223.
[I 2023-07-29 01:24:11,586] Trial 699 finished with value: 0.8542710174400403 and parameters: {'criterion': 'log_loss', 'max_depth': 10, 'max_features': 10, 'max_leaf_nodes': 138, 'min_impurity_decrease': 0.010770457214148517, 'min_samples_leaf': 2, 'ccp_alpha': 0.001014492145925096, 'max_samples': 0.9977974815806222}. Best is trial 660 with value: 0.8864565126932223.
[I 2023-07-29 01:24:11,636] Trial 698 finished with value: 0.8862487201053381 and parameters: {'criterion': 'log_loss', 'max_depth': 10, 'max_features': 10, 'max_leaf_nodes': 137, 'min_impurity_decrease': 1.2296353174894056e-06, 'min_samples_leaf': 2, 'ccp

[I 2023-07-29 01:24:35,811] Trial 717 finished with value: 0.883195420632899 and parameters: {'criterion': 'log_loss', 'max_depth': 11, 'max_features': 11, 'max_leaf_nodes': 203, 'min_impurity_decrease': 1.4314806849547522e-06, 'min_samples_leaf': 16, 'ccp_alpha': 3.3716571519253484e-07, 'max_samples': 0.6988768799402443}. Best is trial 660 with value: 0.8864565126932223.
[I 2023-07-29 01:24:36,208] Trial 718 finished with value: 0.8857327675584975 and parameters: {'criterion': 'log_loss', 'max_depth': 11, 'max_features': 11, 'max_leaf_nodes': 420, 'min_impurity_decrease': 0.0004298232646595405, 'min_samples_leaf': 3, 'ccp_alpha': 4.1959050266561425e-07, 'max_samples': 0.9677327607024072}. Best is trial 660 with value: 0.8864565126932223.


Save our study back to the file:

In [215]:
total_seconds.iloc[0, 0] = total_seconds.iloc[0, 0] + RUNNING_TIME
joblib.dump(study, "05_RF.pkl")
total_seconds.to_csv('05_total_seconds.csv')

In [222]:
# Plotting Optimization History
optimization_history_plot = vis.plot_optimization_history(study, error_bar=True)
optimization_history_plot.show()

In [217]:
# Plotting Parameter Importance
param_importance_plot = vis.plot_param_importances(study)
param_importance_plot.show()

In [218]:
# Plotting a Contour Plot
contour_plot = vis.plot_contour(study, params=["max_depth", "min_samples_leaf"])
contour_plot.show()

In [219]:
print('After current session: ')
print("Best trial:", study.best_trial.number)
print("Best average cross-validation ROC AUC:", study.best_trial.value)
print("Best hyperparameters:", study.best_params)
total_hours = round(total_seconds.iloc[0, 0] / 3600, 3)
print("Total running time (hours):", total_hours)

After current session: 
Best trial: 660
Best average cross-validation ROC AUC: 0.8864565126932223
Best hyperparameters: {'criterion': 'log_loss', 'max_depth': 10, 'max_features': 10, 'max_leaf_nodes': 122, 'min_impurity_decrease': 2.1506533498950537e-07, 'min_samples_leaf': 2, 'ccp_alpha': 0.0005324037333940486, 'max_samples': 0.9810947769872445}
Total running time (hours): 0.336


Now, let's test our best model with greater number of estimators, put scores in the table with a comment "optuna_(number of hours)" and prepare a submission file:

In [220]:
%%time

model_for_tests = RandomForestClassifier(random_state=SEED,
                               n_estimators= 500,
                               n_jobs=-1,
                               **study.best_params
                               )

print(model_for_tests)

submission = get_cv_scores(train, test, model_for_tests, scores_df,
                              comment= "optuna_{}".format(total_hours),
                              prepare_submission=True)

scores_df

RandomForestClassifier(ccp_alpha=0.0005324037333940486, criterion='log_loss',
                       max_depth=10, max_features=10, max_leaf_nodes=122,
                       max_samples=0.9810947769872445,
                       min_impurity_decrease=2.1506533498950537e-07,
                       min_samples_leaf=2, n_estimators=500, n_jobs=-1,
                       random_state=123)
CPU times: total: 42 s
Wall time: 13 s


Unnamed: 0,Changes:,Train ROC AUC,Cross-val ROC AUC,Train Accuracy,Cross-val Accuracy,Test accuracy
0,Unprocessed numeric features,0.891046,0.847367,0.830047,0.790751,0.80056
1,GroupSize,0.89933,0.854434,0.828759,0.792477,
2,FamilySize,0.900522,0.854035,0.828989,0.791787,
3,- GroupSize,0.896124,0.851555,0.829334,0.792938,
4,1 + Deck_enc,0.927033,0.877194,0.831174,0.791672,0.79682
5,+ HomePlanet,0.930021,0.880737,0.834349,0.794893,
6,+ Destination,0.930903,0.882469,0.835431,0.796503,
7,+ CryoSleep,0.931807,0.882508,0.838951,0.799264,
8,+ VIP,0.931491,0.882599,0.83879,0.799724,
9,+ Side,0.93437,0.886205,0.844196,0.799379,


In [221]:
# FOR SUBMISSION

submission.to_csv('05_submission_13.csv', index=False)

scores_df.loc[13, 'Test accuracy'] = 0.79565

scores_df

Unnamed: 0,Changes:,Train ROC AUC,Cross-val ROC AUC,Train Accuracy,Cross-val Accuracy,Test accuracy
0,Unprocessed numeric features,0.891046,0.847367,0.830047,0.790751,0.80056
1,GroupSize,0.89933,0.854434,0.828759,0.792477,
2,FamilySize,0.900522,0.854035,0.828989,0.791787,
3,- GroupSize,0.896124,0.851555,0.829334,0.792938,
4,1 + Deck_enc,0.927033,0.877194,0.831174,0.791672,0.79682
5,+ HomePlanet,0.930021,0.880737,0.834349,0.794893,
6,+ Destination,0.930903,0.882469,0.835431,0.796503,
7,+ CryoSleep,0.931807,0.882508,0.838951,0.799264,
8,+ VIP,0.931491,0.882599,0.83879,0.799724,
9,+ Side,0.93437,0.886205,0.844196,0.799379,


After running this notebook several times we reached situation when we stopped seeing improvements in Cross-val ROC AUC score for several hundred of trials.


So we can move on to the next part, which will be Model Ensembling: ['06_ensembling.ipynb'](06_ensembling.ipynb).