what is the bench mark saying that my model is good to apply cleanlab. #592

Giriteja · 2023-01-06T17:08:11Z

Giriteja
Jan 6, 2023

Since we need to train a model before applying cleanlab to find label errors, what is the bench mark saying that my model is good enough to apply cleanlab.

jwmueller · 2023-01-06T17:41:04Z

jwmueller
Jan 6, 2023
Maintainer

Our recommendation is just try to produce the most accurate model you can (measured according to your noisy given labels).
You assess how "accurate" your model's held-out predictions are via whatever evaluation metric makes most sense for your application, eg: accuracy, balanced_accuracy, log_loss, F1, etc.

Our research papers (unsurprisingly) show that cleanlab is generally able to better detect label errors with a more-accurate model, even when accuracy is measured with respect to given noisy labels.

Model-Agnostic Label Quality Scoring to Detect Real-World Label Errors

Confident Learning: Estimating Uncertainty in Dataset Labels

There is no hard threshold for what is "good enough", except your model should certainly be better than a dummy predictor which either emits uniform-random predictions or always predicts the class that is most common overall in the dataset. With a better model, you will get better label error detection with cleanlab. You can then address these label errors and retrain your same model to get an even better version of the model!

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

what is the bench mark saying that my model is good to apply cleanlab. #592

{{title}}

Replies: 1 comment

{{title}}

Select a reply

what is the bench mark saying that my model is good to apply cleanlab. #592

Giriteja Jan 6, 2023

Replies: 1 comment

jwmueller Jan 6, 2023 Maintainer

Giriteja
Jan 6, 2023

jwmueller
Jan 6, 2023
Maintainer