logloss issue... #90

caprone · 2018-11-01T16:37:24Z

HI HunterMcGushion!

when I use 'log_loss' as metric in Enviroment,
ther's a way to tell optimizer to predict probabilities? it seems not... (maybe from "model_extra_params" ??).

For example, also if I set 'logloss' as metric in xgb, probably,
all optimizer's predictions are-- binary values--, then logloss scores become totally bogus

thanks

HunterMcGushion · 2018-11-01T20:55:01Z

Hi, @caprone, thanks for opening this issue! Have you tried providing the do_predict_proba kwarg on initialization of :class:environment.Environment. If that doesn't resolve it, can you provide a minimal code example to recreate the problem?

caprone · 2018-11-02T13:59:27Z

HI @HunterMcGushion , thanks for answer.

Unfortunately this don"t resolve problem,
in this toy script -- binary classification --, we compare the same algorithm RF, but different scores "size/scale" computed
(but the same issue occurs with others algorithms, like xgbclassifier, ....)

EDIT: probably issue can be that we can"t specify the positive class in Enviroment....?
`
import os
os.environ['KERAS_BACKEND'] = 'tensorflow'

from hyperparameter_hunter import Environment, Integer, Real, ExtraTreesOptimization
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import log_loss
from sklearn.model_selection import train_test_split
from sklearn.datasets.samples_generator import make_classification
from pandas import DataFrame

x, y = make_classification(n_samples=100, n_features=10)
x_train1 = DataFrame(dict(x=x[:, 0], y=x[:, 1], target=y))
x_train1.info()

hunter_path = '../HyperparameterHunterAssets'

def execute():

env = Environment(
    train_dataset=x_train1,
    target_column="target",
    root_results_path=hunter_path,
    # metrics_map=["log_loss"],
    do_predict_proba=True,
    metrics_map=dict(log_loss=lambda t, p: -log_loss(t, p)),
    cross_validation_type='KFold',
    cross_validation_params=dict(n_splits=5,  shuffle=True, random_state=32),
    runs=1
)

optimizer = ExtraTreesOptimization(
                                    iterations=10, read_experiments=True, random_state=None)

optimizer.set_experiment_guidelines(
    model_initializer=RandomForestClassifier,
    model_init_params=dict(
        n_estimators=10,
        n_jobs=6,
        max_depth=Integer(4, 6)

    ),
)

optimizer.go()
print()
print()
print("start second randomForest model with TRUE logloss values predicted:")
label = x_train1['target'].values

x_train = x_train1.drop('target', 1)
X_train, X_test, y_train, y_test = train_test_split(x_train, label, test_size=0.30, random_state=0)

def randomForest():

    clf = RandomForestClassifier(max_depth=4, n_estimators=10)
    clf.fit(X_train, y_train)
    y_pred = clf.predict_proba(X_test)
    # here we select "1 prob predictions" as positive class -- >>>> probably issue here??
    y_pred = y_pred[:, 1]

    print("logloss score is: {}".format(log_loss(y_test, y_pred)))

randomForest()

if name == 'main':
execute()
`

HunterMcGushion · 2018-11-07T02:51:50Z

Sorry for the delayed response, @caprone!

In the provided example, the comparison being made wouldn’t work, since HyperparameterHunter is automatically performing KFold cross-validation with the given parameters to produce the log_loss score. However, the classifier (clf) that is trained without HyperparameterHunter doesn’t undergo a fitting process even near the one specified in env. Therefore, we can’t really expect the results of those two tests to be similar, much less identical, since HyperparameterHunter is told to evaluate with 5 KFold splits, whereas clf is given a train_test_split of 0.30. Additionally, the random seeds are different, which produces further different results.

I’ve attempted to make a closer comparison; however, there is a lot that is handled by HyperparameterHunter behind the scenes, so we can’t make a true comparison. That said, I’ve modified your script, and come up with the following to replicate a little bit of the core functionality of HyperparameterHunter:

from hyperparameter_hunter import Environment, CrossValidationExperiment

import numpy as np
import pandas as pd
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import log_loss
from sklearn.model_selection import KFold
from sklearn.datasets import make_classification

CV_PARAMS = dict(n_splits=3, shuffle=True, random_state=32)
MODEL_INIT_PARAMS = dict(n_estimators=10, max_depth=4, random_state=32)

INPUT, TARGET = make_classification(n_samples=100, n_features=10, n_classes=2, random_state=32)
TRAIN_DF = pd.DataFrame(data=INPUT, columns=range(INPUT.shape[1]))
TRAIN_DF["target"] = TARGET


def run_hyperparameter_hunter():
    env = Environment(
        train_dataset=TRAIN_DF.copy(),
        root_results_path="HyperparameterHunterAssets",
        do_predict_proba=False,
        metrics_map=dict(log_loss=lambda t, p: -log_loss(t, p)),
        cross_validation_type="KFold",
        cross_validation_params=CV_PARAMS,
    )

    experiment = CrossValidationExperiment(
        model_initializer=RandomForestClassifier,
        model_init_params=MODEL_INIT_PARAMS
    )

    return experiment


def run_normal(random_seeds):
    #################### Result Placeholders ####################
    oof_predictions = np.zeros_like(TARGET)
    oof_predictions_proba_0 = np.zeros_like(TARGET)
    oof_predictions_proba_1 = np.zeros_like(TARGET)
    oof_scores = []
    oof_scores_proba_0 = []
    oof_scores_proba_1 = []

    for fold, (train_index, validation_index) in enumerate(KFold(**CV_PARAMS).split(INPUT, TARGET)):
        np.random.seed(random_seeds[fold][0])

        #################### Split Data ####################
        train_input, validation_input = INPUT[train_index], INPUT[validation_index]
        train_target, validation_target = TARGET[train_index], TARGET[validation_index]

        #################### Fit Classifier ####################
        classifier = RandomForestClassifier(
            **dict(MODEL_INIT_PARAMS, **dict(random_state=random_seeds[fold][0]))
        )
        classifier.fit(train_input, train_target)

        #################### Make Predictions ####################
        validation_predictions = classifier.predict(validation_input)
        validation_predictions_proba = classifier.predict_proba(validation_input)

        #################### Calculate Score ####################
        validation_score = -log_loss(validation_target, validation_predictions)
        validation_score_proba_0 = -log_loss(validation_target, validation_predictions_proba[:, 0])
        validation_score_proba_1 = -log_loss(validation_target, validation_predictions_proba[:, 1])

        #################### Collect Results ####################
        oof_scores.append(validation_score)
        oof_scores_proba_0.append(validation_score_proba_0)
        oof_scores_proba_1.append(validation_score_proba_1)

        oof_predictions[validation_index] = validation_predictions
        oof_predictions_proba_0[validation_index] = validation_predictions_proba[:, 0]
        oof_predictions_proba_1[validation_index] = validation_predictions_proba[:, 1]

        print(" - F{}:     {}     {}     {}".format(
            fold, validation_score, validation_score_proba_0, validation_score_proba_1
        ))

    print("FINAL:     {}     {}     {}".format(
        np.average(oof_scores), np.average(oof_scores_proba_0), np.average(oof_scores_proba_1)
    ))


def execute():
    exp = run_hyperparameter_hunter()
    print("#" * 80)
    run_normal(exp.experiment_params["random_seeds"][0])


if __name__ == "__main__":
    execute()

If we run the above script as-is, with Environment.do_predict_proba=False, the HyperparameterHunter experiment produces validation negative log_loss scores of [-3.04754, -5.23322, -2.09326] over three folds, with a final score of -3.45390

These scores match the scores produces by run_normal in its validation_score and oof_scores variables, which were calculated using classifier.predict (which reflects the do_predict_proba=False given earlier to Environment)

If, instead, we slightly modify the script, and run it with Environment.do_predict_proba=True, the HyperparameterHunter experiment produces validation negative log_loss scores of [-8.18999, -4.69224, -3.89328] over three folds, with a final score of -5.61782

These scores match the scores produces by run_normal in its validation_score_proba_0 and oof_scores_proba_0 variables, which were calculated using classifier.predict_proba(…)[:, 0] (which reflects the do_predict_proba=True given earlier to Environment)

TL;DR:
So if I’m understanding your question correctly, there seem to be two problems:

The original comparison in your first example script doesn’t evaluate the classifiers in the same way, so we can’t expect similar results, and
Environment.do_predict_proba should be set according to however you would predict your values if you weren’t using HyperparameterHunter, as can be seen by the difference between the two script executions above

I hope this clears things up for you, but please let me know if you have any other questions, and thanks again for opening this issue!

caprone · 2018-11-07T13:27:14Z

HI @HunterMcGushion!
your comparision is very helpful THANKS!

@HunterMcGushion
"""Environment.do_predict_proba should be set according to however you would predict your values if you weren’t using HyperparameterHunter"""

Yes, finally the problem is that in "Environment.do_predict_proba"
it is not possible set the "positive"(for example 1 and not zero) class, then in binary classification
automatically Hunter compute pobability of zero class (right?);

HunterMcGushion · 2018-11-07T20:36:19Z

Glad to hear it!

You are correct. If a a model's predictions are not one-dimensional, the default is to use the column at index 0. This takes place in hyperparameter_hunter.models.Model.predict:

hyperparameter_hunter/hyperparameter_hunter/models.py

Lines 174 to 196 in f3a5bd0

    
               def predict(self, input_data): 
        
                   """Generate model predictions for `input_data` 
        
                   Parameters 
        
                   ---------- 
        
                   input_data: Array-like 
        
                       Data containing the same number of features as were trained on, for which the model will 
        
                       predict output values""" 
        
                   if input_data is None: 
        
                       return None 
        
                   try: 
        
                       if self.do_predict_proba is True: 
        
                           prediction = self.model.predict_proba(input_data) 
        
                       else: 
        
                           prediction = self.model.predict(input_data) 
        
                   except Exception as _ex: 
        
                       raise _ex 
        
                   with suppress(IndexError): 
        
                       prediction = prediction[:, 0] 
        
                   return prediction

Specifically, I think you'll be interested in line 194.

Do you think it would be helpful to be able to specify the column index selected when using do_predict_proba=True?

caprone · 2018-11-08T12:35:26Z

okk @HunterMcGushion !, perfect!

Yes I think it would be very helpful to be able to specify the column index in "proba_predictions", because often algorithms return positive class's probability(usually 1 for binary classification) in the second matrix's column.

thanks again!!

HunterMcGushion · 2018-11-08T22:26:29Z

Great point! I'm thinking the easiest way to get this done would be to allow the do_predict_proba parameter to be an integer, as well as a boolean.

If it's a boolean and False, then the model's predict method is called (default). If it's a boolean and True, the model's predict_proba method is called and the default index of 0 is used to select the column.

The new part is if do_predict_proba is an integer, then do_predict_proba is interpreted as having been True, but the integer passed as is now used as the column index. So in your example, you would pass do_predict_proba=1 to get the second column.

HunterMcGushion · 2018-11-08T22:27:58Z

Although, there may be a bit of confusion since it is popular to pass truthy or falsey values, like 0 and 1, in place of actual booleans, which could produce unexpected results in this case.

@caprone, could you check out #95 to see if it does what you need?

caprone · 2018-11-09T13:32:15Z

HI @HunterMcGushion!!
for me Your solutiuon is very effective, great!!

@ doc
"""
do_predict_proba: Boolean, or int, default=False
* If False, :meth:.models.Model.fit will call :meth:models.Model.model.predict
* If True, it will call :meth:models.Model.model.predict_proba, and the values in the
first column (index 0) will be used as the actual prediction values
* If do_predict_proba is an int, :meth:.models.Model.fit will call
:meth:models.Model.model.predict_proba, as is the case when do_predict_proba is
True, but the int supplied as do_predict_proba declares the column index to use as
the actual prediction value
"""
I also agree that:
@HunterMcGushion
""" there may be a bit of confusion since it is popular to pass truthy or falsey values,
like 0 and 1, in place of actual booleans"""

especially if someone do not read the documentation / or comment;

one solution can be make "model.do_predict_proba" only boolean and create new integer class attribute
that specifies index column of positive class, for ex: self.model.do_pos_class=1;

however I think that your solution is very useful!!
thanks again!!

HunterMcGushion · 2018-11-11T00:01:53Z

That's a great idea. For now, I'd prefer not to add too many more parameters, but if it looks like others are having problems, we should revisit your idea! Let me know if the problems persist after merging, and thanks again for opening the issue!

HunterMcGushion mentioned this issue Nov 9, 2018

Feat/predict proba index #95

Merged

HunterMcGushion closed this as completed in #95 Nov 11, 2018

chwang1991 mentioned this issue Sep 17, 2019

logloss issue with multiclass task #197

Open

HunterMcGushion mentioned this issue Mar 20, 2020

ROC AUC scores don't match to those of sklearn #211

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

logloss issue... #90

logloss issue... #90

caprone commented Nov 1, 2018 •

edited

Loading

HunterMcGushion commented Nov 1, 2018

caprone commented Nov 2, 2018 •

edited

Loading

HunterMcGushion commented Nov 7, 2018

caprone commented Nov 7, 2018 •

edited

Loading

HunterMcGushion commented Nov 7, 2018

caprone commented Nov 8, 2018

HunterMcGushion commented Nov 8, 2018

HunterMcGushion commented Nov 8, 2018 •

edited

Loading

caprone commented Nov 9, 2018 •

edited

Loading

HunterMcGushion commented Nov 11, 2018

logloss issue... #90

logloss issue... #90

Comments

caprone commented Nov 1, 2018 • edited Loading

HunterMcGushion commented Nov 1, 2018

caprone commented Nov 2, 2018 • edited Loading

HunterMcGushion commented Nov 7, 2018

caprone commented Nov 7, 2018 • edited Loading

HunterMcGushion commented Nov 7, 2018

caprone commented Nov 8, 2018

HunterMcGushion commented Nov 8, 2018

HunterMcGushion commented Nov 8, 2018 • edited Loading

caprone commented Nov 9, 2018 • edited Loading

HunterMcGushion commented Nov 11, 2018

caprone commented Nov 1, 2018 •

edited

Loading

caprone commented Nov 2, 2018 •

edited

Loading

caprone commented Nov 7, 2018 •

edited

Loading

HunterMcGushion commented Nov 8, 2018 •

edited

Loading

caprone commented Nov 9, 2018 •

edited

Loading