-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace balanced_accuracy with macro-averaged recall from sklearn #108
Comments
Possibly not true after further validation. Closing this issue until we figure it out. |
What's the definition of balanced accuracy? Is it 1 - balanced error rate? Then this should be true. |
I'm reopening this issue now that I'm unsure again. The primary difference seems to be that our implementation of balanced accuracy also takes into account TNR, whereas other implementations only take into account TPR (recall). I don't quite understand the intuition in not including TNR in the multiclass case. I understand that in the binary classification case, TPR for class |
In your formula, So you compute for each class TP / (TP + FN) which is recall. and then average over classes, right? That's what your code says and that's what wikipedia suggests, I think (though only for the two-class case https://en.wikipedia.org/wiki/Accuracy_and_precision) Computing recall and averaging over all classes is macro average recall. I'm not sure what you mean by not including TNR. Your definition as in the code above doesn't include it, right. Do you have a reference for that being the semantics of balanced accuracy in the multi-class case? I don't think arguing about if it is a good metric or should be changed is a good idea if you use a name that already has particular semantics. In the multi-class case they don't seem to be that established, but it would be good to know what other people mean. |
See scikit-learn/scikit-learn#6747 (comment) for a detailed discussion of balanced accuracy. I think there is consensus to add this metric to sklearn now. |
so the one you are using now is different from the one you posted above, right? |
Yes that's correct. |
I went through this thread and the related sklearn thread and it's not clear to me what the consensus is. Somebody asked me to used balanced accuracy from here, |
@kegl I would have hoped you could tell us ;) There is multiple definitions of balanced accuracy, and one of them is |
@kegl are you doing binary classification? Then it's pretty clear and using the macro average should be fine. If it's multi-class, it's a bit less clear. |
No, it's multiclass. |
@kegl you may try the |
OK, thanks! |
@kegl the one in that toolkit is "adjusted for chance" though, and the one in TPOT is not. So that toolkit does macro average recall but adjusted for chance. |
while tpot.metrics does macro-average accuracy. |
OK, so the TPOT version is exactly |
No, the TPOT version is something else entirely. It's macro-average accuracy, not macro-average recall. |
OK, got it, thanks. |
Also adding label_names in `scores.classifier_base` so __call__ can use without falling back to `y_true` or `y_pred` which may not have all the labels.
From conversations with @amueller, we discovered that "balanced accuracy" (as we've called it) is also known as "macro-averaged recall" as implemented in sklearn. As such, we don't need our own custom implementation of
balanced_accuracy
in TPOT. Let's refactor TPOT to replacebalanced_accuracy
with recall_score.The correct call is:
where
y_test
isclass
andpredictions
isguess
in our case.Here's some code that compares the two and confirms that they're the same:
The text was updated successfully, but these errors were encountered: