Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Accuracy scorer for BERT-based classifier #28

Closed
asmithh opened this issue Jun 13, 2020 · 5 comments · Fixed by #139
Closed

Accuracy scorer for BERT-based classifier #28

asmithh opened this issue Jun 13, 2020 · 5 comments · Fixed by #139
Assignees
Labels
feature A new feature help wanted Extra attention is needed supervised Regarding the supervised classification part of the project

Comments

@asmithh
Copy link
Collaborator

asmithh commented Jun 13, 2020

Given a validation set and a fine-tuned BERT-based (binary) classifier, return the following:
{
"macro_f1": float,
"macro_precision": float,
"macro_recall": float,
"accuracy": float
}

May need to do this for each category if we're doing one binary classifier per category.

@asmithh asmithh added help wanted Extra attention is needed supervised Regarding the supervised classification part of the project feature A new feature labels Jun 13, 2020
@dnaaun
Copy link
Owner

dnaaun commented Jun 13, 2020

So if we return the F1, Precision, and Recall for each category, then we would have to strip off the labels "macro", as "macro" is a reference to the averaging strategy that one uses to get a "total score" for a multilabel model(have a look at the average argument's doc in this sklearn function to calculate F1 )

While we can return the individual f1, precision and recall scores for each category, I think providing the macro averaged version is enough(we can always add things later).

As I said in #29 , eventually, we'd like to support multilabel, but for now, since the run_glue.py script is doing multiclass, it's fine if you just get a multiclass classifier working. And in that case, it turns out that all of the above stats (macro_f1, macro_precision, macro_recall, accuracy) will have the same value. Just a heads up, since that surprised me the first time I ran into it.

I have a feeling I'm not being totally clear, so please feel free to ask more.

@monajalal
Copy link
Collaborator

I suggest to get both micro and macro-accuracy values.

@dnaaun
Copy link
Owner

dnaaun commented Jul 4, 2020

Hey guys, I quote an answer from StackExchange: "In a multi-class setting micro-averaged precision and recall are always the same," which means micro f1=micro precision=micro recall, which I believe is = accuracy.

UPDATE: Here's one directly from Sklearn's documentation: "Micro average (averaging the total true positives, false negatives and false positives) is only shown for multi-label or multi-class with a subset of classes, because it corresponds to accuracy otherwise"

We're not doing multilabel classification, so why should we add micro values?

@dnaaun
Copy link
Owner

dnaaun commented Jul 4, 2020

Also, I just noticed that what Alyssa actually implemented is not micro averaged metrics, but it is metrics for each class separately. I definitely understand the confusion of Alyssa as someone who's not doing machine learning all the time.

Is that what you had in mind @monajalal ?

Another note to @lentil-soup is that the code right now doesn't work (it has a reference to self in a staticmethod), and even if it did, it would break other parts of the codebase.

I wanna setup continuous integration to make sure that we don't introduce breaking changes to the code base, and I'm reverting this commit until we decide that we want per class metrics.

@dnaaun
Copy link
Owner

dnaaun commented Jul 4, 2020

Also, thanks @lentil-soup for pointing out that I a forgot to pass in the labels to classification_report().

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature A new feature help wanted Extra attention is needed supervised Regarding the supervised classification part of the project
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants