# Gradient Boost Classifier (GBC) as model
*by Max*

In this notebook we will fit our data via a GBC model. In this specific case we choose the Classifier from [XGBoost](https://xgboost.readthedocs.io/en/stable/index.html). Here we use both, the raw band information and the calculated spectral indices. We start by loading the needed modules and the data. The folder structure differs depending on wether we are on Google Colab or in the local repository.

In [5]:
# set the directory of the data
# depending on colab or vscode environment
on_colabs = True
import sys
if on_colabs:
  from google.colab import drive
  drive.mount("/content/drive")
  DATA_DIR = "/content/drive/MyDrive/Radiant_Earth_Spot_Crop/data"
  sys.path.append("/content/drive/MyDrive/Radiant_Earth_Spot_Crop/src")
else:
  DATA_DIR = "../data"
  sys.path.append("../src/")

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


We import the modules and set the plot style for the notebook.

In [80]:
# import the needed modules
import numpy as np
import pandas as pd

# import the machine learning modules
from imblearn.over_sampling import SMOTE
from xgboost import XGBClassifier, DMatrix, cv
from sklearn.model_selection import GridSearchCV 
from hyperopt import STATUS_OK, Trials, fmin, hp, tpe
import hyperopt.pyll.stochastic
from sklearn.metrics import (
    accuracy_score, 
    f1_score, 
    log_loss, 
    confusion_matrix, 
    classification_report
    )

# ignore warnings
import warnings
warnings.filterwarnings("ignore")

# import plotting modules and set the style
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline
sns.set_theme(
    context="notebook", 
    style="darkgrid", 
    palette="crest", 
    font="helvetica"
    )
cmap = sns.color_palette("crest", 6) # six colors are created this way
sns.set(rc = {"figure.dpi":300})
sns.set(rc = {"figure.figsize":(6,3)})
sns.set(font_scale = 0.5)

# import own modules from the scr folder
from train_test_function import train_test_split_fields
from eda_functions import (
    plot_confusion_matrix, 
    get_label_accuracies, 
    plot_label_accuracy, 
    plot_feature_importance
    )

# set a random seed
RSEED = 42
np.random.seed(RSEED)

In [3]:
# test the chosen color palette
sns.color_palette("crest", 6)

We load the data from the CSV file. 

In [9]:
# load the base data from the CSV files
df = pd.read_csv(f"{DATA_DIR}/Train_Dataset1.csv")
df

Unnamed: 0,PVR_10,PVR_11,PVR_5,PVR_6,PVR_7,PVR_8,PVR_9,WET_10,WET_11,WET_5,...,B03_9,B08_10,B08_11,B08_5,B08_6,B08_7,B08_8,B08_9,field_id,label
0,-0.005364,-0.000850,-0.033435,-0.010824,0.001783,0.018957,-0.008199,53.176735,22.464776,-19.515831,...,113.575857,151.401564,161.515013,145.355150,124.672382,101.565843,147.924192,133.651591,48430,1
1,-0.039430,-0.003874,-0.052316,-0.001140,0.012105,0.008404,0.018920,16.995187,16.736298,6.843541,...,92.403915,133.392572,151.910135,123.901280,121.141322,132.623240,223.915735,155.827740,52348,1
2,-0.005213,0.005378,0.034862,0.003047,0.038948,-0.032969,-0.019573,0.200949,20.883196,23.623624,...,64.803644,124.744880,190.619060,117.808556,138.071800,98.675743,123.877833,91.205263,29170,1
3,0.008804,-0.030144,-0.024488,-0.027446,0.007889,0.030013,0.008628,53.541541,-6.709399,32.743858,...,212.659087,190.719415,152.954487,112.698434,115.525959,141.739330,137.945593,224.600760,37336,1
4,0.000000,-0.074008,-0.044944,-0.500000,0.019620,0.048255,0.009624,105.453601,-24.609463,2.977214,...,54.233334,255.000000,33.980000,79.146670,142.465555,181.005183,133.071853,111.056665,6768,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
8032,0.029023,0.030749,0.001512,0.010392,0.030504,0.039467,0.006055,85.516015,4.673031,42.242381,...,48.458332,234.237500,100.450000,193.722915,245.470840,159.085417,129.014580,76.429170,16533,9
8033,0.001205,0.012000,-0.043830,0.003866,0.024319,0.000726,-0.015411,36.403566,68.191348,-44.094572,...,133.241021,174.172380,218.299393,93.092575,136.297070,199.303855,128.249999,138.247256,110309,9
8034,-0.035873,0.012926,-0.023463,-0.006519,0.005603,-0.013781,0.008300,6.663884,16.823552,-12.688685,...,163.504360,172.507748,192.034880,112.656977,136.412790,172.677330,127.555813,165.161338,30488,9
8035,0.026071,-0.033823,0.011961,0.000000,0.014866,-0.000012,-0.058351,83.229167,-4.600129,70.175008,...,48.584322,241.878860,154.655581,177.897860,254.969120,212.230400,176.916860,76.036817,102607,9


---
## Train-test split, validation split & resampling
After loading the modules and data, we perform the train-test split and also the validation split. Validation split means we create a second "train" and "validation" dataset from the actual training data. This is done to train and test our model on this validation data before we apply the model to the real test data. 

In [10]:
# do the train-test-split
df_train, df_test = train_test_split_fields(
    df, train_size=0.7, random_state=RSEED
    )

We set the features (X), i.e. our bands and indices, and also set the target (y). For the moment, only on the validation train and test data. Moreover we set the class labels to 0 - 8 instead of 1 - 9, since XGBClassifier cannot handle class labels starting at 1.

In [11]:
# get X for the train and validation data
X_train = df_train.drop(columns=["label", "field_id"])
X_val = df_test.drop(columns=["label", "field_id"])

# get y for the train and validation data
y_train = df_train["label"]
y_train = y_train.astype(int)
y_val = df_test["label"]
y_val = y_val.astype(int)

# set the class labels from 0 to 8 
y_train = y_train-1
y_val = y_val-1

---
## Modelling with XGBoost
We will start this process by using the XGBClassifier with the default hyperparameters. We only set the objective, evaluation metric, the random seed, enable support of all cores and also GPU support. 
- for the `objective` we only have two choices: multi:softprob and multi:softmax, which only differentiated in that the softmax gives back the actual classes, whereas the softprob gives back the probabilities for each class label. For the multi:... objective we also have to set the `num_class` to the number of available classes in the target. And we have to `disable_default_eval_metric=1` for it to actually work.
- for the `random_state` we set our RSEED value.
- for the `eval_metric`, we decide to use the mlogloss because the logloss (cross entropy) is our main target along with the F1-score. The mlogloss is the logloss for multiple classes.
- `n_jobs=-1` enables all CPU cores
- for GPU support we need to set the `tree_method='gpu_hist'` and `gpu_id=0`

We initialize the model and train it.

In [12]:
# initialize a default GradientBoostingClassifier
xgb = XGBClassifier(
    objective='multi:softprob', 
    num_class='9',
    eval_metric='mlogloss', 
    disable_default_eval_metric=1, 
    random_state=RSEED,
    n_jobs=-1, 
    tree_method='gpu_hist', 
    gpu_id=0
    )
xgb.fit(X_train, y_train)

XGBClassifier(disable_default_eval_metric=1, eval_metric='mlogloss', gpu_id=0,
              n_jobs=-1, num_class='9', objective='multi:softprob',
              random_state=42, tree_method='gpu_hist')

Then we do the predictions on the validation train (`X_train`) and validation test data (`X_val`). 

In [13]:
# predict the absolute classes and probabilities
y_pred_train = xgb.predict(X_train)
y_pred_val = xgb.predict(X_val)

# predict the probabilities for each  class
y_proba_train = xgb.predict_proba(X_train)
y_proba_val = xgb.predict_proba(X_val)

For the `evaluation metrics` we will mainly focus on the `F1-score`, since we want to classify as many labels right as possible. We do not care for an especially high recall or precision. In addition, we look at the `cross-entropy (log loss)`, as this was the target metric of the [Zindi challenge](https://zindi.africa/competitions/radiant-earth-spot-the-crop-challenge/data) from which we got the data. They were interested in the probability of the crop type of each field. 

In [14]:
print("---" * 12)
print(f"Accuracy on train data: {round(accuracy_score(y_train, y_pred_train), 3)}")
print(f"Accuracy on test data: {round(accuracy_score(y_val, y_pred_val), 3)}")
print("---" * 12)
print(f'F1-score on train data: {round(f1_score(y_train, y_pred_train, average="macro"), 3)}')
print(f'F1-score on test data: {round(f1_score(y_val, y_pred_val, average="macro"), 3)}')
print("---" * 12)
print(f"Cross-entropy on train data: {round(log_loss(y_train, y_proba_train), 3)}")
print(f"Cross-entropy on test data: {round(log_loss(y_val, y_proba_val), 3)}")
print("---" * 12)

------------------------------------
Accuracy on train data: 0.606
Accuracy on test data: 0.425
------------------------------------
F1-score on train data: 0.596
F1-score on test data: 0.403
------------------------------------
Cross-entropy on train data: 1.311
Cross-entropy on test data: 1.608
------------------------------------


We see the the model performs not bad. However, there is still room for improvement. An accuracy and F1-score of 0.6 is ok for the beginning. Nevertheless we need to push the cross-entropy below 1.0.

---
## Hyperparameter tuning via Bayesian optimization
The next step is the tuning of the hyperparameters. We will try a new method, the Bayesian optimization via the [Hyperopt module](http://hyperopt.github.io/hyperopt/). Bayesian optimization is a sequential design strategy for global optimization of black-box functions that does not assume any functional forms. It is usually employed to optimize expensive-to-evaluate functions ([wikipedia](https://en.wikipedia.org/wiki/Bayesian_optimization)).

The first step to using `Hyperopt` is to create a `space of hyperparameters` that you want to optimize. The program offers several options to randomly select a value for each hyperparameter. 
The here used options are:
- `hp.choice`: this just makes a random choice between the given arguments
- `hp.uniform`: this chooses a float value between a given start and a given end value
- `hp.quniform`: selects a value between a given start and end value that is a multiple of the last given value. For example, if you use 1 as the last value you will get an integer as result.

It is important that for each of the hp. functions you first enter the name of the hyperparameter and then the values from which the function can choose.

In [20]:
# hp.uniform for float number
# hp.quniform for int that are a multiple from the last number, 
# for example 3, 15, 1 means any whole number between 3 and 15
space={
    'objective': hp.choice('objective', ('multi:softmax', 'multi:softprob')),
    'n_estimators': 150,
    'random_state': RSEED,
    'disable_default_eval_metric': 1,
    'gpu_id':0,
    'tree_method': 'gpu_hist',
    'max_depth': hp.quniform('max_depth', 3, 10, 1),
    'min_child_weight' : hp.quniform('min_child_weight', 0, 3, 1),
    'gamma': hp.uniform ('gamma', 0, 0.3),
    'colsample_bytree': hp.uniform('colsample_bytree', 0.5, 0.9),
    'learning_rate': hp.uniform('learning_rate', 0, 1)
    }
print(hyperopt.pyll.stochastic.sample(space))

{'colsample_bytree': 0.5475438561700039, 'disable_default_eval_metric': 1, 'gamma': 0.2805190265883611, 'gpu_id': 0, 'learning_rate': 0.46825996269262615, 'max_depth': 8.0, 'min_child_weight': 2.0, 'n_estimators': 1500, 'objective': 'multi:softprob', 'random_state': 42, 'tree_method': 'gpu_hist'}


The next step defines our `objective function`. 
- This function should contain the `initialization of the model`, which passes the hyperparameter space (dictionary) to each hyperparameter. 
- It must also contain the `training of the model` and the `calculation of the evaluation`. 
- The function must `return the score to be optimized` by Hyperopt. Since Hyperopt tries to minimize scores, a minus must be put in front of the scores like the F1 score. 

In [21]:
def objective(space):
    clf = XGBClassifier(
        objective=space['objective'],
        n_estimators=space['n_estimators'],
        random_state=space['random_state'],
        disable_default_eval_metric=space['disable_default_eval_metric'],
        gpu_id=space['gpu_id'],
        tree_method=space['tree_method'],
        max_depth=int(space['max_depth']),
        min_child_weight=int(space['min_child_weight']),
        gamma=space['gamma'],
        colsample_bytree=int(space['colsample_bytree']),
        learning_rate=space['learning_rate']
    )
    
    evaluation = [
                  ( X_train, y_train), 
                  ( X_val, y_val)
    ]
                
    clf.fit(
        X_train, y_train,
        eval_set=evaluation, 
        eval_metric="mlogloss",
        early_stopping_rounds=10,
        verbose=False
    )
    
    y_pred_val = clf.predict(X_val)
    f1 = f1_score(y_val, y_pred_val, average="macro")
    print ("SCORE:", f1)
    return {'loss': -f1, 'status': STATUS_OK }

Then we can start the `optimization process` via hyperopts `Trials class`. This takes minimizes (fmin) the objective function on the hyperparameter space. The number of optimization steps can be set via `max_evals`. 

In [None]:
trials = Trials()

best_hyperparams = fmin(fn = objective,
                        space = space,
                        algo = tpe.suggest,
                        max_evals = 50,
                        trials = trials)

Finally we can get the best parameters from the optimization process.

In [None]:
print("The best hyperparameters are : ","\n")
print(best_hyperparams)

The best hyperparameters are :  

{'colsample_bytree': 0.655349597562582, 'gamma': 0.09019240188259454, 'max_depth': 9.0, 'min_child_weight': 2.0, 'objective': 0}


---
## Train_Dataset1 - improvement via Bayesian optimization



In [32]:
# hp.uniform for float number
# hp.quniform for int that are a multiple from the last number, 
# for example 3, 15, 1 means any whole number between 3 and 15
space={
    'objective': 'multi:softmax',
    'n_estimators': hp.quniform('n_estimators', 300, 1500, 10),
    'random_state': RSEED,
    'disable_default_eval_metric': 1,
    'gpu_id':0,
    'tree_method': 'gpu_hist',
    'max_depth': hp.quniform('max_depth', 3, 10, 1),
    'min_child_weight' : hp.quniform('min_child_weight', 1, 10, 1),
    'gamma': hp.uniform ('gamma', 1, 10),
    'colsample_bytree': hp.uniform('colsample_bytree', 0.5, 0.9),
    'learning_rate': hp.uniform('learning_rate', 0, 5),
    'reg_lambda': hp.uniform('reg_lambda', 0, 5),
    'subsample': hp.uniform('subsample', 0, 1)
    }
print(hyperopt.pyll.stochastic.sample(space))

{'colsample_bytree': 0.5127681889816192, 'disable_default_eval_metric': 1, 'gamma': 9.43139466852835, 'gpu_id': 0, 'learning_rate': 4.7326487395508385, 'max_depth': 8.0, 'min_child_weight': 4.0, 'n_estimators': 1220.0, 'objective': 'multi:softmax', 'random_state': 42, 'reg_lambda': 1.3329037958361605, 'subsample': 0.6840815750386858, 'tree_method': 'gpu_hist'}


In [29]:
def objective(space):
    clf = XGBClassifier(
        objective=space['objective'],
        n_estimators=int(space['n_estimators']),
        random_state=space['random_state'],
        disable_default_eval_metric=space['disable_default_eval_metric'],
        gpu_id=space['gpu_id'],
        tree_method=space['tree_method'],
        max_depth=int(space['max_depth']),
        min_child_weight=int(space['min_child_weight']),
        gamma=space['gamma'],
        colsample_bytree=space['colsample_bytree'],
        learning_rate=space['learning_rate'],
        reg_lambda=space['reg_lambda'],
        subsample=space['subsample'],
    )
    
    evaluation = [( X_train, y_train), ( X_val, y_val)]
    
    clf.fit(
        X_train, y_train,
        eval_set=evaluation, 
        eval_metric="mlogloss",
        early_stopping_rounds=10,
        verbose=False
    )
    
    y_pred_val = clf.predict(X_val)
    f1 = f1_score(y_val, y_pred_val, average="macro")
    print ("SCORE:", f1)
    return {'loss': -f1, 'status': STATUS_OK }

In [33]:
trials = Trials()

best_hyperparams = fmin(fn = objective,
                        space = space,
                        algo = tpe.suggest,
                        max_evals = 50,
                        trials = trials)

SCORE:
0.3330447342180414
SCORE:
0.27671285885109265
SCORE:
0.29796796379860596
SCORE:
0.28359678535141714
SCORE:
0.28553084036812604
SCORE:
0.2813203449023183
SCORE:
0.34714919451471893
SCORE:
0.31543201205117244
SCORE:
0.3112146485038943
SCORE:
0.2969286708870544
SCORE:
0.30369726094091626
SCORE:
0.2836977280558252
SCORE:
0.18296261995017105
SCORE:
0.30533899599898673
SCORE:
0.2678121932821719
SCORE:
0.2887507221730283
SCORE:
0.29644825716553574
SCORE:
0.3077048186122202
SCORE:
0.3177736498948291
SCORE:
0.3484087575096322
SCORE:
0.42801080310642486
SCORE:
0.46073627787107685
SCORE:
0.46364300492703947
SCORE:
0.431685749286946
SCORE:
0.43402813632866266
SCORE:
0.29504612913064915
SCORE:
0.39850246376338966
SCORE:
0.28321516596354107
SCORE:
0.46157540918629714
SCORE:
0.32494547882423613
SCORE:
0.3779749751506038
SCORE:
0.2699773849126581
SCORE:
0.3646546292037977
SCORE:
0.30650258120190926
SCORE:
0.2728533189889844
SCORE:
0.2802082128889906
SCORE:
0.28521480255561393
SCORE:
0.283849278

In [34]:
print("The best hyperparameters are : ","\n")
print(best_hyperparams)

The best hyperparameters are :  

{'colsample_bytree': 0.7080513299687701, 'gamma': 2.8969223387577334, 'learning_rate': 0.05344896807566896, 'max_depth': 10.0, 'min_child_weight': 4.0, 'n_estimators': 660.0, 'reg_lambda': 3.559068059042053, 'subsample': 0.8691468873631047}


In [35]:
# initialize a default GradientBoostingClassifier
xgb = XGBClassifier(
    objective='multi:softmax', 
    n_estimators=660,
    random_state=RSEED,
    disable_default_eval_metric=1,
    gpu_id=0,
    tree_method='gpu_hist',
    max_depth=10,
    min_child_weight=4,
    gamma=2.8969223387577334,
    colsample_bytree=0.7080513299687701,
    learning_rate=0.05344896807566896,
    reg_lambda=3.559068059042053,
    subsample=0.8691468873631047
    )
xgb.fit(X_train, y_train)

XGBClassifier(colsample_bytree=0.7080513299687701,
              disable_default_eval_metric=1, gamma=2.8969223387577334, gpu_id=0,
              learning_rate=0.05344896807566896, max_depth=10,
              min_child_weight=4, n_estimators=660, objective='multi:softprob',
              random_state=42, reg_lambda=3.559068059042053,
              subsample=0.8691468873631047, tree_method='gpu_hist')

In [37]:
# predict the absolute classes and probabilities
y_pred_train = xgb.predict(X_train)
y_pred_val = xgb.predict(X_val)

# predict the probabilities for each  class
y_proba_train = xgb.predict_proba(X_train)
y_proba_val = xgb.predict_proba(X_val)

print("---" * 12)
print(f"Accuracy on train data: {round(accuracy_score(y_train, y_pred_train), 3)}")
print(f"Accuracy on test data: {round(accuracy_score(y_val, y_pred_val), 3)}")
print("---" * 12)
print(f'F1-score on train data: {round(f1_score(y_train, y_pred_train, average="macro"), 3)}')
print(f'F1-score on test data: {round(f1_score(y_val, y_pred_val, average="macro"), 3)}')
print("---" * 12)
print(f"Cross-entropy on train data: {round(log_loss(y_train, y_proba_train), 3)}")
print(f"Cross-entropy on test data: {round(log_loss(y_val, y_proba_val), 3)}")
print("---" * 12)

------------------------------------
Accuracy on train data: 1.0
Accuracy on test data: 0.487
------------------------------------
F1-score on train data: 1.0
F1-score on test data: 0.475
------------------------------------
Cross-entropy on train data: 0.047
Cross-entropy on test data: 1.492
------------------------------------


---
## Train_Dataset2 - improvement via Bayesian optimization

In [38]:
# load the base data from the CSV files
df = pd.read_csv(f"{DATA_DIR}/Train_Dataset2.csv")

# do the train-test-split
df_train, df_test = train_test_split_fields(
    df, train_size=0.7, random_state=RSEED
    )

# get X for the train and validation data
X_train = df_train.drop(columns=["label", "field_id"])
X_val = df_test.drop(columns=["label", "field_id"])

# get y for the train and validation data
y_train = df_train["label"]
y_train = y_train.astype(int)
y_val = df_test["label"]
y_val = y_val.astype(int)

# set the class labels from 0 to 8 
y_train = y_train-1
y_val = y_val-1

In [39]:
# hp.uniform for float number
# hp.quniform for int that are a multiple from the last number, 
# for example 3, 15, 1 means any whole number between 3 and 15
space={
    'objective': 'multi:softmax',
    'n_estimators': hp.quniform('n_estimators', 300, 1500, 10),
    'random_state': RSEED,
    'disable_default_eval_metric': 1,
    'gpu_id':0,
    'tree_method': 'gpu_hist',
    'max_depth': hp.quniform('max_depth', 3, 10, 1),
    'min_child_weight' : hp.quniform('min_child_weight', 1, 10, 1),
    'gamma': hp.uniform ('gamma', 1, 10),
    'colsample_bytree': hp.uniform('colsample_bytree', 0.5, 0.9),
    'learning_rate': hp.uniform('learning_rate', 0, 5),
    'reg_lambda': hp.uniform('reg_lambda', 0, 5),
    'subsample': hp.uniform('subsample', 0, 1)
    }
print(hyperopt.pyll.stochastic.sample(space))

{'colsample_bytree': 0.8579954303249263, 'disable_default_eval_metric': 1, 'gamma': 7.285388496491766, 'gpu_id': 0, 'learning_rate': 3.6199582932212553, 'max_depth': 5.0, 'min_child_weight': 4.0, 'n_estimators': 790.0, 'objective': 'multi:softmax', 'random_state': 42, 'reg_lambda': 3.947581994890837, 'subsample': 0.1360486160266129, 'tree_method': 'gpu_hist'}


In [40]:
def objective(space):
    clf = XGBClassifier(
        objective=space['objective'],
        n_estimators=int(space['n_estimators']),
        random_state=space['random_state'],
        disable_default_eval_metric=space['disable_default_eval_metric'],
        gpu_id=space['gpu_id'],
        tree_method=space['tree_method'],
        max_depth=int(space['max_depth']),
        min_child_weight=int(space['min_child_weight']),
        gamma=space['gamma'],
        colsample_bytree=space['colsample_bytree'],
        learning_rate=space['learning_rate'],
        reg_lambda=space['reg_lambda'],
        subsample=space['subsample'],
    )
    
    evaluation = [( X_train, y_train), ( X_val, y_val)]
    
    clf.fit(
        X_train, y_train,
        eval_set=evaluation, 
        eval_metric="mlogloss",
        early_stopping_rounds=10,
        verbose=False
    )
    
    y_pred_val = clf.predict(X_val)
    f1 = f1_score(y_val, y_pred_val, average="macro")
    print ("SCORE:", f1)
    return {'loss': -f1, 'status': STATUS_OK }

In [41]:
trials = Trials()

best_hyperparams = fmin(fn = objective,
                        space = space,
                        algo = tpe.suggest,
                        max_evals = 50,
                        trials = trials)

SCORE:
0.3985208725146039
SCORE:
0.4165671859579858
SCORE:
0.30188985637083454
SCORE:
0.28909021770452814
SCORE:
0.284240181362489
SCORE:
0.2641404170754551
SCORE:
0.3612935773830682
SCORE:
0.4140793501378072
SCORE:
0.2685676007602218
SCORE:
0.30894383663137925
SCORE:
0.3001733669485265
SCORE:
0.2641875131494559
SCORE:
0.3336472178293498
SCORE:
0.26389312588953523
SCORE:
0.242765815509089
SCORE:
0.32219228201456346
SCORE:
0.2653172845073895
SCORE:
0.3384799966368607
SCORE:
0.2859317929846735
SCORE:
0.26957051559838113
SCORE:
0.40257607191520056
SCORE:
0.2824712350338115
SCORE:
0.33897209205926937
SCORE:
0.2951346430835813
SCORE:
0.2787906878778601
SCORE:
0.40093257555750955
SCORE:
0.41943775577899545
SCORE:
0.38370348177260416
SCORE:
0.37829310225903423
SCORE:
0.386566754216148
SCORE:
0.2898793492650764
SCORE:
0.2963058231501237
SCORE:
0.3176441482741039
SCORE:
0.30436567463730807
SCORE:
0.23528073952758038
SCORE:
0.2887338854858634
SCORE:
0.3910373422895107
SCORE:
0.4431383179220155
S

In [43]:
print("The best hyperparameters are : ","\n")
print(best_hyperparams)

The best hyperparameters are :  

{'colsample_bytree': 0.510393400793516, 'gamma': 7.6025294182804, 'learning_rate': 0.1943296251965943, 'max_depth': 7.0, 'min_child_weight': 7.0, 'n_estimators': 650.0, 'reg_lambda': 0.7751337747552518, 'subsample': 0.7600404286570743}


In [44]:
# initialize a default GradientBoostingClassifier
xgb = XGBClassifier(
    objective='multi:softmax', 
    n_estimators=650,
    random_state=RSEED,
    disable_default_eval_metric=1,
    gpu_id=0,
    tree_method='gpu_hist',
    max_depth=7,
    min_child_weight=7,
    gamma=7.6025294182804,
    colsample_bytree=0.510393400793516,
    learning_rate=0.1943296251965943,
    reg_lambda=0.7751337747552518,
    subsample=0.7600404286570743
    )
xgb.fit(X_train, y_train)

XGBClassifier(colsample_bytree=0.510393400793516, disable_default_eval_metric=1,
              gamma=7.6025294182804, gpu_id=0, learning_rate=0.1943296251965943,
              max_depth=7, min_child_weight=7, n_estimators=650,
              objective='multi:softprob', random_state=42,
              reg_lambda=0.7751337747552518, subsample=0.7600404286570743,
              tree_method='gpu_hist')

In [45]:
# predict the absolute classes and probabilities
y_pred_train = xgb.predict(X_train)
y_pred_val = xgb.predict(X_val)

# predict the probabilities for each  class
y_proba_train = xgb.predict_proba(X_train)
y_proba_val = xgb.predict_proba(X_val)

print("---" * 12)
print(f"Accuracy on train data: {round(accuracy_score(y_train, y_pred_train), 3)}")
print(f"Accuracy on test data: {round(accuracy_score(y_val, y_pred_val), 3)}")
print("---" * 12)
print(f'F1-score on train data: {round(f1_score(y_train, y_pred_train, average="macro"), 3)}')
print(f'F1-score on test data: {round(f1_score(y_val, y_pred_val, average="macro"), 3)}')
print("---" * 12)
print(f"Cross-entropy on train data: {round(log_loss(y_train, y_proba_train), 3)}")
print(f"Cross-entropy on test data: {round(log_loss(y_val, y_proba_val), 3)}")
print("---" * 12)

------------------------------------
Accuracy on train data: 1.0
Accuracy on test data: 0.524
------------------------------------
F1-score on train data: 1.0
F1-score on test data: 0.473
------------------------------------
Cross-entropy on train data: 0.02
Cross-entropy on test data: 1.506
------------------------------------


---
## Train_Dataset3 - improvement via Bayesian optimization

In [46]:
# load the base data from the CSV files
df = pd.read_csv(f"{DATA_DIR}/Train_Dataset3.csv")

# do the train-test-split
df_train, df_test = train_test_split_fields(
    df, train_size=0.7, random_state=RSEED
    )

# get X for the train and validation data
X_train = df_train.drop(columns=["label", "field_id"])
X_val = df_test.drop(columns=["label", "field_id"])

# get y for the train and validation data
y_train = df_train["label"]
y_train = y_train.astype(int)
y_val = df_test["label"]
y_val = y_val.astype(int)

# set the class labels from 0 to 8 
y_train = y_train-1
y_val = y_val-1

In [47]:
# hp.uniform for float number
# hp.quniform for int that are a multiple from the last number, 
# for example 3, 15, 1 means any whole number between 3 and 15
space={
    'objective': 'multi:softmax',
    'n_estimators': hp.quniform('n_estimators', 300, 1500, 10),
    'random_state': RSEED,
    'disable_default_eval_metric': 1,
    'gpu_id':0,
    'tree_method': 'gpu_hist',
    'max_depth': hp.quniform('max_depth', 3, 10, 1),
    'min_child_weight' : hp.quniform('min_child_weight', 1, 10, 1),
    'gamma': hp.uniform ('gamma', 1, 10),
    'colsample_bytree': hp.uniform('colsample_bytree', 0.5, 0.9),
    'learning_rate': hp.uniform('learning_rate', 0, 5),
    'reg_lambda': hp.uniform('reg_lambda', 0, 5),
    'subsample': hp.uniform('subsample', 0, 1)
    }
print(hyperopt.pyll.stochastic.sample(space))

{'colsample_bytree': 0.8671532107563265, 'disable_default_eval_metric': 1, 'gamma': 2.471480196311035, 'gpu_id': 0, 'learning_rate': 3.1142609140461914, 'max_depth': 9.0, 'min_child_weight': 3.0, 'n_estimators': 330.0, 'objective': 'multi:softmax', 'random_state': 42, 'reg_lambda': 4.611418683846853, 'subsample': 0.6871262319229096, 'tree_method': 'gpu_hist'}


In [48]:
def objective(space):
    clf = XGBClassifier(
        objective=space['objective'],
        n_estimators=int(space['n_estimators']),
        random_state=space['random_state'],
        disable_default_eval_metric=space['disable_default_eval_metric'],
        gpu_id=space['gpu_id'],
        tree_method=space['tree_method'],
        max_depth=int(space['max_depth']),
        min_child_weight=int(space['min_child_weight']),
        gamma=space['gamma'],
        colsample_bytree=space['colsample_bytree'],
        learning_rate=space['learning_rate'],
        reg_lambda=space['reg_lambda'],
        subsample=space['subsample'],
    )
    
    evaluation = [( X_train, y_train), ( X_val, y_val)]
    
    clf.fit(
        X_train, y_train,
        eval_set=evaluation, 
        eval_metric="mlogloss",
        early_stopping_rounds=10,
        verbose=False
    )
    
    y_pred_val = clf.predict(X_val)
    f1 = f1_score(y_val, y_pred_val, average="macro")
    print ("SCORE:", f1)
    return {'loss': -f1, 'status': STATUS_OK }

In [49]:
trials = Trials()

best_hyperparams = fmin(fn = objective,
                        space = space,
                        algo = tpe.suggest,
                        max_evals = 50,
                        trials = trials)

SCORE:
0.30783852040654813
SCORE:
0.4650679028133131
SCORE:
0.293073177613521
SCORE:
0.3470514730332405
SCORE:
0.26870569708616393
SCORE:
0.32464679699020155
SCORE:
0.3340562229286096
SCORE:
0.4454822023542072
SCORE:
0.316643376074439
SCORE:
0.27604661646495315
SCORE:
0.32987167446350063
SCORE:
0.35061535726687876
SCORE:
0.2762013237015496
SCORE:
0.3355949320084463
SCORE:
0.2831311708840129
SCORE:
0.315106784010651
SCORE:
0.32150222904853654
SCORE:
0.3208961814834879
SCORE:
0.3125922072377684
SCORE:
0.40634845095620953
SCORE:
0.5187639142101939
SCORE:
0.5095107719653881
SCORE:
0.4571913728791007
SCORE:
0.307616913623962
SCORE:
0.39076401806544414
SCORE:
0.2893023495240249
SCORE:
0.48249833593333263
SCORE:
0.5014352683943626
SCORE:
0.32701493402638815
SCORE:
0.3957468449713779
SCORE:
0.2989066763970489
SCORE:
0.47598950384551686
SCORE:
0.2676371823911199
SCORE:
0.4598007449365526
SCORE:
0.3900092326841133
SCORE:
0.3211881286684886
SCORE:
0.48912270628383336
SCORE:
0.32057943716364684
SC

In [50]:
print("The best hyperparameters are : ","\n")
print(best_hyperparams)

The best hyperparameters are :  

{'colsample_bytree': 0.7556160538037393, 'gamma': 7.162654022047084, 'learning_rate': 0.03670299557431567, 'max_depth': 8.0, 'min_child_weight': 10.0, 'n_estimators': 1460.0, 'reg_lambda': 3.111859332195362, 'subsample': 0.42625809589781993}


In [51]:
# initialize a default GradientBoostingClassifier
xgb = XGBClassifier(
    objective='multi:softmax', 
    n_estimators=1460,
    random_state=RSEED,
    disable_default_eval_metric=1,
    gpu_id=0,
    tree_method='gpu_hist',
    max_depth=8,
    min_child_weight=10,
    gamma=7.162654022047084,
    colsample_bytree=0.7556160538037393,
    learning_rate=0.03670299557431567,
    reg_lambda=3.111859332195362,
    subsample=0.42625809589781993
    )
xgb.fit(X_train, y_train)

XGBClassifier(colsample_bytree=0.7556160538037393,
              disable_default_eval_metric=1, gamma=7.162654022047084, gpu_id=0,
              learning_rate=0.03670299557431567, max_depth=8,
              min_child_weight=10, n_estimators=1460,
              objective='multi:softprob', random_state=42,
              reg_lambda=3.111859332195362, subsample=0.42625809589781993,
              tree_method='gpu_hist')

In [52]:
# predict the absolute classes and probabilities
y_pred_train = xgb.predict(X_train)
y_pred_val = xgb.predict(X_val)

# predict the probabilities for each  class
y_proba_train = xgb.predict_proba(X_train)
y_proba_val = xgb.predict_proba(X_val)

print("---" * 12)
print(f"Accuracy on train data: {round(accuracy_score(y_train, y_pred_train), 3)}")
print(f"Accuracy on test data: {round(accuracy_score(y_val, y_pred_val), 3)}")
print("---" * 12)
print(f'F1-score on train data: {round(f1_score(y_train, y_pred_train, average="macro"), 3)}')
print(f'F1-score on test data: {round(f1_score(y_val, y_pred_val, average="macro"), 3)}')
print("---" * 12)
print(f"Cross-entropy on train data: {round(log_loss(y_train, y_proba_train), 3)}")
print(f"Cross-entropy on test data: {round(log_loss(y_val, y_proba_val), 3)}")
print("---" * 12)

------------------------------------
Accuracy on train data: 1.0
Accuracy on test data: 0.532
------------------------------------
F1-score on train data: 1.0
F1-score on test data: 0.527
------------------------------------
Cross-entropy on train data: 0.181
Cross-entropy on test data: 1.309
------------------------------------


---
## Train_Dataset4 - improvement via Bayesian optimization

In [90]:
# load the base data from the CSV files
df = pd.read_csv(f"{DATA_DIR}/Train_Dataset4.csv")

# do the train-test-split
df_train, df_test = train_test_split_fields(
    df, train_size=0.7, random_state=RSEED
    )

# get X for the train and validation data
X_train = df_train.drop(columns=["label", "field_id"])
X_val = df_test.drop(columns=["label", "field_id"])

# get y for the train and validation data
y_train = df_train["label"]
y_train = y_train.astype(int)
y_val = df_test["label"]
y_val = y_val.astype(int)

# set the class labels from 0 to 8 
y_train = y_train-1
y_val = y_val-1

In [54]:
# hp.uniform for float number
# hp.quniform for int that are a multiple from the last number, 
# for example 3, 15, 1 means any whole number between 3 and 15
space={
    'objective': 'multi:softmax',
    'n_estimators': hp.quniform('n_estimators', 300, 1500, 10),
    'random_state': RSEED,
    'disable_default_eval_metric': 1,
    'gpu_id':0,
    'tree_method': 'gpu_hist',
    'max_depth': hp.quniform('max_depth', 3, 10, 1),
    'min_child_weight' : hp.quniform('min_child_weight', 1, 10, 1),
    'gamma': hp.uniform ('gamma', 1, 10),
    'colsample_bytree': hp.uniform('colsample_bytree', 0.5, 0.9),
    'learning_rate': hp.uniform('learning_rate', 0, 5),
    'reg_lambda': hp.uniform('reg_lambda', 0, 5),
    'subsample': hp.uniform('subsample', 0, 1)
    }
print(hyperopt.pyll.stochastic.sample(space))

{'colsample_bytree': 0.5989815401912315, 'disable_default_eval_metric': 1, 'gamma': 5.914189814537089, 'gpu_id': 0, 'learning_rate': 1.7482159364409144, 'max_depth': 4.0, 'min_child_weight': 6.0, 'n_estimators': 780.0, 'objective': 'multi:softmax', 'random_state': 42, 'reg_lambda': 4.87018888781927, 'subsample': 0.9103542566853604, 'tree_method': 'gpu_hist'}


In [55]:
def objective(space):
    clf = XGBClassifier(
        objective=space['objective'],
        n_estimators=int(space['n_estimators']),
        random_state=space['random_state'],
        disable_default_eval_metric=space['disable_default_eval_metric'],
        gpu_id=space['gpu_id'],
        tree_method=space['tree_method'],
        max_depth=int(space['max_depth']),
        min_child_weight=int(space['min_child_weight']),
        gamma=space['gamma'],
        colsample_bytree=space['colsample_bytree'],
        learning_rate=space['learning_rate'],
        reg_lambda=space['reg_lambda'],
        subsample=space['subsample'],
    )
    
    evaluation = [( X_train, y_train), ( X_val, y_val)]
    
    clf.fit(
        X_train, y_train,
        eval_set=evaluation, 
        eval_metric="mlogloss",
        early_stopping_rounds=10,
        verbose=False
    )
    
    y_pred_val = clf.predict(X_val)
    f1 = f1_score(y_val, y_pred_val, average="macro")
    print ("SCORE:", f1)
    return {'loss': -f1, 'status': STATUS_OK }

In [56]:
trials = Trials()

best_hyperparams = fmin(fn = objective,
                        space = space,
                        algo = tpe.suggest,
                        max_evals = 50,
                        trials = trials)

SCORE:
0.34643107397357753
SCORE:
0.5360176697209509
SCORE:
0.292151485418789
SCORE:
0.4050152697563985
SCORE:
0.27853661671079316
SCORE:
0.5119933037498012
SCORE:
0.33519567457168475
SCORE:
0.3399387627351633
SCORE:
0.3079591477481439
SCORE:
0.2980101212486797
SCORE:
0.32216390781981485
SCORE:
0.32920541599276965
SCORE:
0.31429296749503893
SCORE:
0.5010692251527626
SCORE:
0.3377100267977815
SCORE:
0.4042455226319631
SCORE:
0.3811679718939065
SCORE:
0.3137345568185034
SCORE:
0.2999450614739912
SCORE:
0.42292658916199843
SCORE:
0.5527071621801593
SCORE:
0.5495009039682119
SCORE:
0.5514897560042271
SCORE:
0.37660188890686197
SCORE:
0.2713540036339094
SCORE:
0.3595858320420521
SCORE:
0.5554324034135321
SCORE:
0.49323755227221083
SCORE:
0.3416298463264593
SCORE:
0.335464220705064
SCORE:
0.3788892676171516
SCORE:
0.5191410928120437
SCORE:
0.5128003639442578
SCORE:
0.34816191776771277
SCORE:
0.46180307037766355
SCORE:
0.49962439181755086
SCORE:
0.3511002325950956
SCORE:
0.27980433159205065
S

In [57]:
print("The best hyperparameters are : ","\n")
print(best_hyperparams)

The best hyperparameters are :  

{'colsample_bytree': 0.8987815246829209, 'gamma': 2.640992068400642, 'learning_rate': 0.052351134057586624, 'max_depth': 9.0, 'min_child_weight': 2.0, 'n_estimators': 970.0, 'reg_lambda': 3.3882363637518, 'subsample': 0.7294824918577887}


In [91]:
# initialize a default GradientBoostingClassifier
xgb = XGBClassifier(
    objective='multi:softmax', 
    n_estimators=970,
    random_state=RSEED,
    disable_default_eval_metric=1,
    gpu_id=0,
    tree_method='gpu_hist',
    max_depth=9,
    min_child_weight=2,
    gamma=2.640992068400642,
    colsample_bytree=0.8987815246829209,
    learning_rate=0.052351134057586624,
    reg_lambda=3.3882363637518,
    subsample=0.7294824918577887
    )
xgb.fit(X_train, y_train)

XGBClassifier(colsample_bytree=0.8987815246829209,
              disable_default_eval_metric=1, gamma=2.640992068400642, gpu_id=0,
              learning_rate=0.052351134057586624, max_depth=9,
              min_child_weight=2, n_estimators=970, objective='multi:softprob',
              random_state=42, reg_lambda=3.3882363637518,
              subsample=0.7294824918577887, tree_method='gpu_hist')

In [92]:
# predict the absolute classes and probabilities
y_pred_train = xgb.predict(X_train)
y_pred_val = xgb.predict(X_val)

# predict the probabilities for each  class
y_proba_train = xgb.predict_proba(X_train)
y_proba_val = xgb.predict_proba(X_val)

print("---" * 12)
print(f"Accuracy on train data: {round(accuracy_score(y_train, y_pred_train), 3)}")
print(f"Accuracy on test data: {round(accuracy_score(y_val, y_pred_val), 3)}")
print("---" * 12)
print(f'F1-score on train data: {round(f1_score(y_train, y_pred_train, average="macro"), 3)}')
print(f'F1-score on test data: {round(f1_score(y_val, y_pred_val, average="macro"), 3)}')
print("---" * 12)
print(f"Cross-entropy on train data: {round(log_loss(y_train, y_proba_train), 3)}")
print(f"Cross-entropy on test data: {round(log_loss(y_val, y_proba_val), 3)}")
print("---" * 12)

------------------------------------
Accuracy on train data: 1.0
Accuracy on test data: 0.565
------------------------------------
F1-score on train data: 1.0
F1-score on test data: 0.562
------------------------------------
Cross-entropy on train data: 0.035
Cross-entropy on test data: 1.228
------------------------------------


---
## Train_Dataset3 on Test Data - improvement via Bayesian optimization

In [98]:
# load the base data from the CSV files
df_train = pd.read_csv(f"{DATA_DIR}/Train_Dataset3.csv")
df_test = pd.read_csv(f"{DATA_DIR}/Test_Dataset.csv")

# get X for the train and validation data
X_train = df_train.drop(columns=["label", "field_id"])
X_val = df_test.drop(columns=["label", "field_id"])

# get y for the train and validation data
y_train = df_train["label"]
y_train = y_train.astype(int)
y_val = df_test["label"]
y_val = y_val.astype(int)

# set the class labels from 0 to 8 
y_train = y_train-1
y_val = y_val-1

In [95]:
# hp.uniform for float number
# hp.quniform for int that are a multiple from the last number, 
# for example 3, 15, 1 means any whole number between 3 and 15
space={
    'objective': 'multi:softmax',
    'n_estimators': hp.quniform('n_estimators', 300, 2000, 10),
    'random_state': RSEED,
    'disable_default_eval_metric': 1,
    'gpu_id': 0,
    'tree_method': 'gpu_hist',
    'max_depth': hp.quniform('max_depth', 3, 10, 1),
    'min_child_weight' : hp.quniform('min_child_weight', 0, 10, 1),
    'gamma': hp.uniform ('gamma', 0, 10),
    'colsample_bytree': hp.uniform('colsample_bytree', 0.5, 0.9),
    'learning_rate': hp.uniform('learning_rate', 0.05, 5),
    'reg_lambda': hp.uniform('reg_lambda', 0, 5),
    'reg_alpha': hp.uniform('reg_alpha', 0, 5),
    'subsample': hp.uniform('subsample', 0, 1),
    'scale_pos_weight': 1
    }
print(hyperopt.pyll.stochastic.sample(space))

{'colsample_bytree': 0.6775849399231475, 'disable_default_eval_metric': 1, 'gamma': 0.49697921579185733, 'gpu_id': 0, 'learning_rate': 2.611585418376815, 'max_depth': 7.0, 'min_child_weight': 5.0, 'n_estimators': 1960.0, 'objective': 'multi:softmax', 'random_state': 42, 'reg_alpha': 1.8703402864833358, 'reg_lambda': 4.57938953369484, 'scale_pos_weight': 1, 'subsample': 0.0068336991126506375, 'tree_method': 'gpu_hist'}


In [103]:
def objective(space):
    clf = XGBClassifier(
        objective=space['objective'],
        n_estimators=int(space['n_estimators']),
        random_state=space['random_state'],
        disable_default_eval_metric=space['disable_default_eval_metric'],
        gpu_id=space['gpu_id'],
        tree_method=space['tree_method'],
        max_depth=int(space['max_depth']),
        min_child_weight=int(space['min_child_weight']),
        gamma=space['gamma'],
        colsample_bytree=space['colsample_bytree'],
        learning_rate=space['learning_rate'],
        reg_lambda=space['reg_lambda'],
        reg_alpha=space['reg_alpha'],
        subsample=space['subsample'],
        scale_pos_weight=space['scale_pos_weight']
    )
    
    evaluation = [(X_train, y_train), (X_val, y_val)]
    
    clf.fit(
        X_train, y_train,
        eval_set=evaluation, 
        eval_metric="mlogloss",
        early_stopping_rounds=10,
        verbose=False
    )
    
    y_proba_val = clf.predict_proba(X_val)
    cross_entro = log_loss(y_val, y_proba_val)
    print ("SCORE:", cross_entro)
    return {'loss': cross_entro, 'status': STATUS_OK }

In [104]:
trials = Trials()

best_hyperparams = fmin(fn = objective,
                        space = space,
                        algo = tpe.suggest,
                        max_evals = 50,
                        trials = trials)

SCORE:
1.1578421325857238
SCORE:
2.6522025289466558
SCORE:
2.122361275941684
SCORE:
1.3673499799864717
SCORE:
2.608584632778757
SCORE:
2.989292738391222
SCORE:
1.6694741971093336
SCORE:
2.3631067817824642
SCORE:
2.703266986526837
SCORE:
2.514105122596805
SCORE:
2.389772725876462
SCORE:
2.029575622242586
SCORE:
2.4036502573170764
SCORE:
2.093598382958912
SCORE:
3.2466177853586027
SCORE:
2.942461497758224
SCORE:
3.2389339874746343
SCORE:
1.3342373654342696
SCORE:
1.2362901904964068
SCORE:
1.0698207817539096
SCORE:
1.3910277226399894
SCORE:
1.2045662746075494
SCORE:
1.1019840084673516
SCORE:
1.9473524292697801
SCORE:
1.592393891273478
SCORE:
1.2560812975149873
SCORE:
1.7896765693599022
SCORE:
2.027268303037683
SCORE:
1.1611735158178584
SCORE:
1.4288960547769523
SCORE:
1.1666158155339645
SCORE:
1.0525662775290818
SCORE:
1.2417019841411423
SCORE:
1.7854699723036727
SCORE:
3.1569573249921623
SCORE:
1.527659825930947
SCORE:
1.1172209586177841
SCORE:
2.0731287476588225
SCORE:
1.331769448170479

In [105]:
print("The best hyperparameters are : ","\n")
print(best_hyperparams)

The best hyperparameters are :  

{'colsample_bytree': 0.8187178956154779, 'gamma': 8.20900749033383, 'learning_rate': 0.05546595192509067, 'max_depth': 9.0, 'min_child_weight': 5.0, 'n_estimators': 1350.0, 'reg_alpha': 1.5910514197684538, 'reg_lambda': 2.2355702263218005, 'subsample': 0.680482927094127}


In [106]:
# initialize a default GradientBoostingClassifier
xgb = XGBClassifier(
    objective='multi:softmax', 
    n_estimators=1350,
    random_state=RSEED,
    disable_default_eval_metric=1,
    gpu_id=0,
    tree_method='gpu_hist',
    max_depth=9,
    min_child_weight=5,
    gamma=8.20900749033383,
    colsample_bytree=0.8187178956154779,
    learning_rate=0.05546595192509067,
    reg_lambda=2.2355702263218005,
    reg_alpha=1.5910514197684538,
    subsample=0.680482927094127
    )
xgb.fit(X_train, y_train)

XGBClassifier(colsample_bytree=0.8187178956154779,
              disable_default_eval_metric=1, gamma=8.20900749033383, gpu_id=0,
              learning_rate=0.05546595192509067, max_depth=9,
              min_child_weight=5, n_estimators=1350, objective='multi:softprob',
              random_state=42, reg_alpha=1.5910514197684538,
              reg_lambda=2.2355702263218005, subsample=0.680482927094127,
              tree_method='gpu_hist')

In [107]:
# predict the absolute classes and probabilities
y_pred_train = xgb.predict(X_train)
y_pred_val = xgb.predict(X_val)

# predict the probabilities for each  class
y_proba_train = xgb.predict_proba(X_train)
y_proba_val = xgb.predict_proba(X_val)

print("---" * 12)
print(f"Accuracy on train data: {round(accuracy_score(y_train, y_pred_train), 3)}")
print(f"Accuracy on test data: {round(accuracy_score(y_val, y_pred_val), 3)}")
print("---" * 12)
print(f'F1-score on train data: {round(f1_score(y_train, y_pred_train, average="macro"), 3)}')
print(f'F1-score on test data: {round(f1_score(y_val, y_pred_val, average="macro"), 3)}')
print("---" * 12)
print(f"Cross-entropy on train data: {round(log_loss(y_train, y_proba_train), 3)}")
print(f"Cross-entropy on test data: {round(log_loss(y_val, y_proba_val), 3)}")
print("---" * 12)

------------------------------------
Accuracy on train data: 1.0
Accuracy on test data: 0.64
------------------------------------
F1-score on train data: 1.0
F1-score on test data: 0.579
------------------------------------
Cross-entropy on train data: 0.069
Cross-entropy on test data: 1.036
------------------------------------



------------------------------------

- Accuracy on train data: 1.0
- Accuracy on test data: 0.56

------------------------------------

- F1-score on train data: 1.0
- F1-score on test data: 0.557

------------------------------------

- Cross-entropy on train data: 0.05
- Cross-entropy on test data: 1.229

------------------------------------