Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update classifier accuracy/CM stats at times besides successful training #429

Open
StephenChan opened this issue Dec 6, 2021 · 0 comments

Comments

@StephenChan
Copy link
Member

StephenChan commented Dec 6, 2021

A source's classifier accuracy stats and confusion matrix data only refresh after training and accepting a new classifier. This can lead to the stats being outdated. This was mentioned in issue #159 (this comment) and also in issue #150 ('Miscellaneous' point in this comment), but it should probably get an issue of its own.

Examples of situations where this leads to the data being outdated in some way:

  1. More images have been annotated since the last classifier acceptance. It's possible to reject a few classifiers in a row due to lack of improvement, so the current annotated-image count may get much higher than the classifier stats' image count.

  2. The labelset has been changed, and the previous classifiers have been wiped, but no new classifier has been trained yet. In this case, the stats will still refer to the latest accepted classifier among the ones that were wiped. If a label was removed from the labelset, this can lead to the following server error (as seen today for example):

    DoesNotExist at /source/<id>/backend/
    LocalLabel matching query does not exist.
    
  3. There's also the case where annotations have been changed (instead of new ones being added) since the last classifier acceptance, especially if the source owner decides that a bunch of points were mislabeled as A and should be B instead. Though, in this situation we generally don't see retraining either, so it's not just an issue of updating the classifier stats.

Regarding implementation, I believe I recall that the accuracy stats and confusion matrix data are pulled from valresults files or something similar, which are currently only generated when a new classifier is trained. So we may have to rework how this data is stored and retrieved (related: issue #288).

It may be desirable to keep historical accuracy/CM (particularly accuracy) from the first time a classifier was trained, like for giving different options to graph classifier improvement over time. That would involve creating a new field on the Classifier model.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant