Replace balanced_accuracy with macro-averaged recall from sklearn #108

rhiever · 2016-03-08T18:54:19Z

From conversations with @amueller, we discovered that "balanced accuracy" (as we've called it) is also known as "macro-averaged recall" as implemented in sklearn. As such, we don't need our own custom implementation of balanced_accuracy in TPOT. Let's refactor TPOT to replace balanced_accuracy with recall_score.

The correct call is:

recall_score(y_test, predictions, average='macro')

where y_test is class and predictions is guess in our case.

Here's some code that compares the two and confirms that they're the same:

from sklearn.datasets import load_digits
from sklearn.ensemble import RandomForestClassifier
from sklearn.cross_validation import train_test_split
from sklearn.metrics import recall_score
import numpy as np
import pandas as pd

digits = load_digits(10)
features, labels = digits['data'], digits['target']

X_train, X_test, y_train, y_test = train_test_split(features, labels, train_size=0.75, test_size=0.25)

clf = RandomForestClassifier(n_estimators=100, n_jobs=-1)
clf.fit(X_train, y_train)

def balanced_accuracy(result):
    all_classes = list(set(result['class'].values))
    all_class_accuracies = []
    for this_class in all_classes:
        this_class_accuracy = len(result[(result['guess'] == this_class) & (result['class'] == this_class)])\
            / float(len(result[result['class'] == this_class]))
        all_class_accuracies.append(this_class_accuracy)

    balanced_accuracy = np.mean(all_class_accuracies)

    return balanced_accuracy

predictions = clf.predict(X_test)

print('Macro-averaged recall:\t', recall_score(y_test, predictions, average='macro'))

data = pd.DataFrame({'class': y_test,
                     'guess': predictions})

print('Balanced accuracy:\t', balanced_accuracy(data))

The text was updated successfully, but these errors were encountered:

rhiever · 2016-03-08T19:14:23Z

Possibly not true after further validation. Closing this issue until we figure it out.

amueller · 2016-05-02T18:50:38Z

What's the definition of balanced accuracy? Is it 1 - balanced error rate? Then this should be true.

rhiever · 2016-06-09T13:31:54Z

I'm reopening this issue now that I'm unsure again. The primary difference seems to be that our implementation of balanced accuracy also takes into account TNR, whereas other implementations only take into account TPR (recall).

I don't quite understand the intuition in not including TNR in the multiclass case. I understand that in the binary classification case, TPR for class 0 = TNR for class 1. In the multiclass case that becomes muddled: TPR for class 0 = TNR for all other classes.

amueller · 2016-06-09T17:40:57Z

In your formula, len(result[result['class'] == this_class] is just np.sum(result['class'] == this_class], right?

So you compute for each class

TP / (TP + FN)

which is recall.

and then average over classes, right? That's what your code says and that's what wikipedia suggests, I think (though only for the two-class case https://en.wikipedia.org/wiki/Accuracy_and_precision)

Computing recall and averaging over all classes is macro average recall.

I'm not sure what you mean by not including TNR. Your definition as in the code above doesn't include it, right. Do you have a reference for that being the semantics of balanced accuracy in the multi-class case?

I don't think arguing about if it is a good metric or should be changed is a good idea if you use a name that already has particular semantics. In the multi-class case they don't seem to be that established, but it would be good to know what other people mean.

rhiever · 2016-06-17T14:50:51Z

See scikit-learn/scikit-learn#6747 (comment) for a detailed discussion of balanced accuracy. I think there is consensus to add this metric to sklearn now.

amueller · 2016-06-17T17:31:39Z

so the one you are using now is different from the one you posted above, right?
I wouldn't say there is consensus but we can discuss there.

rhiever · 2016-06-17T18:23:06Z

so the one you are using now is different from the one you posted above, right?

Yes that's correct.

kegl · 2017-07-21T17:37:34Z

I went through this thread and the related sklearn thread and it's not clear to me what the consensus is. Somebody asked me to used balanced accuracy from here, scoring_program/libscores.py, line 187. Should I clean this up or I can use recall_score(..., average='macro') from sklearn?

amueller · 2017-07-21T17:48:29Z

@kegl I would have hoped you could tell us ;) There is multiple definitions of balanced accuracy, and one of them is recall_score(..., average='macro') and another is something different, see scikit-learn/scikit-learn#8066
It looks like https://github.com/ch-imad/AutoMl_Challenge/blob/2353ec0/Starting_kit/scoring_program/libscores.py#L187 implements recall_score(..., average='macro') see @jnothman's comment. Whoever told you to use this metric should have given you a paper reference or use a more specific name ;)

amueller · 2017-07-21T17:52:00Z

@kegl are you doing binary classification? Then it's pretty clear and using the macro average should be fine. If it's multi-class, it's a bit less clear.

kegl · 2017-07-21T17:59:33Z

No, it's multiclass.

weixuanfu · 2017-07-21T18:01:57Z

@kegl you may try the balanced_accuracy in tpot.metrics

kegl · 2017-07-21T18:02:20Z

OK, thanks!

amueller · 2017-07-21T18:03:20Z

@kegl the one in that toolkit is "adjusted for chance" though, and the one in TPOT is not. So that toolkit does macro average recall but adjusted for chance.

amueller · 2017-07-21T18:04:24Z

while tpot.metrics does macro-average accuracy.

kegl · 2017-08-28T15:31:43Z

OK, so the TPOT version is exactly sklearn.metrics.recall_score(y_true, y_pred, average='macro'), and the AutoML score adjusts this by https://github.com/ch-imad/AutoMl_Challenge/blob/2353ec0/Starting_kit/scoring_program/libscores.py#L210, right?

amueller · 2017-08-28T15:32:50Z

No, the TPOT version is something else entirely. It's macro-average accuracy, not macro-average recall.

kegl · 2017-08-28T15:53:09Z

OK, got it, thanks.

Also adding label_names in `scores.classifier_base` so __call__ can use without falling back to `y_true` or `y_pred` which may not have all the labels.

rhiever added enhancement easy labels Mar 8, 2016

rhiever changed the title ~~Replaced balanced_accuracy with macro-averaged recall from sklearn~~ Replace balanced_accuracy with macro-averaged recall from sklearn Mar 8, 2016

rhiever closed this as completed Mar 8, 2016

amueller mentioned this issue May 2, 2016

add balanced accuracy metric scikit-learn/scikit-learn#6747

Closed

rhiever reopened this Jun 9, 2016

rhiever added this to the 0.4 release milestone Jun 9, 2016

rhiever closed this as completed Jun 17, 2016

AIAdventures mentioned this issue Jun 6, 2017

Titanic example -problem with 2nd last cell. #492

Closed

saddy001 mentioned this issue Mar 20, 2018

Segfault on optimization process #676

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replace balanced_accuracy with macro-averaged recall from sklearn #108

Replace balanced_accuracy with macro-averaged recall from sklearn #108

rhiever commented Mar 8, 2016

rhiever commented Mar 8, 2016

amueller commented May 2, 2016

rhiever commented Jun 9, 2016

amueller commented Jun 9, 2016

rhiever commented Jun 17, 2016

amueller commented Jun 17, 2016

rhiever commented Jun 17, 2016 •

edited

Loading

kegl commented Jul 21, 2017

amueller commented Jul 21, 2017

amueller commented Jul 21, 2017

kegl commented Jul 21, 2017

weixuanfu commented Jul 21, 2017 •

edited

Loading

kegl commented Jul 21, 2017

amueller commented Jul 21, 2017

amueller commented Jul 21, 2017

kegl commented Aug 28, 2017 •

edited

Loading

amueller commented Aug 28, 2017

kegl commented Aug 28, 2017

Replace balanced_accuracy with macro-averaged recall from sklearn #108

Replace balanced_accuracy with macro-averaged recall from sklearn #108

Comments

rhiever commented Mar 8, 2016

rhiever commented Mar 8, 2016

amueller commented May 2, 2016

rhiever commented Jun 9, 2016

amueller commented Jun 9, 2016

rhiever commented Jun 17, 2016

amueller commented Jun 17, 2016

rhiever commented Jun 17, 2016 • edited Loading

kegl commented Jul 21, 2017

amueller commented Jul 21, 2017

amueller commented Jul 21, 2017

kegl commented Jul 21, 2017

weixuanfu commented Jul 21, 2017 • edited Loading

kegl commented Jul 21, 2017

amueller commented Jul 21, 2017

amueller commented Jul 21, 2017

kegl commented Aug 28, 2017 • edited Loading

amueller commented Aug 28, 2017

kegl commented Aug 28, 2017

rhiever commented Jun 17, 2016 •

edited

Loading

weixuanfu commented Jul 21, 2017 •

edited

Loading

kegl commented Aug 28, 2017 •

edited

Loading