-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Accuracy scorer for BERT-based classifier #28
Comments
So if we return the F1, Precision, and Recall for each category, then we would have to strip off the labels "macro", as "macro" is a reference to the averaging strategy that one uses to get a "total score" for a multilabel model(have a look at the While we can return the individual f1, precision and recall scores for each category, I think providing the macro averaged version is enough(we can always add things later). As I said in #29 , eventually, we'd like to support multilabel, but for now, since the I have a feeling I'm not being totally clear, so please feel free to ask more. |
I suggest to get both micro and macro-accuracy values. |
Hey guys, I quote an answer from StackExchange: "In a multi-class setting micro-averaged precision and recall are always the same," which means micro f1=micro precision=micro recall, which I believe is = accuracy. UPDATE: Here's one directly from Sklearn's documentation: "Micro average (averaging the total true positives, false negatives and false positives) is only shown for multi-label or multi-class with a subset of classes, because it corresponds to accuracy otherwise" We're not doing multilabel classification, so why should we add micro values? |
Also, I just noticed that what Alyssa actually implemented is not micro averaged metrics, but it is metrics for each class separately. I definitely understand the confusion of Alyssa as someone who's not doing machine learning all the time. Is that what you had in mind @monajalal ? Another note to @lentil-soup is that the code right now doesn't work (it has a reference to self in a staticmethod), and even if it did, it would break other parts of the codebase. I wanna setup continuous integration to make sure that we don't introduce breaking changes to the code base, and I'm reverting this commit until we decide that we want per class metrics. |
Also, thanks @lentil-soup for pointing out that I a forgot to pass in the labels to |
Given a validation set and a fine-tuned BERT-based (binary) classifier, return the following:
{
"macro_f1": float,
"macro_precision": float,
"macro_recall": float,
"accuracy": float
}
May need to do this for each category if we're doing one binary classifier per category.
The text was updated successfully, but these errors were encountered: