Skip to content

cleanlab/token-label-error-benchmarks

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

24 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Benchmarking methods for label error detection in token classification data

Code to reproduce results from the paper:

Detecting Label Errors in Token Classification Data
NeurIPS 2022 Workshop on Interactive Learning for Natural Language Processing (InterNLP)

This repository is only for intended for scientific purposes. To find label errors in your own token classification data, you should instead use the implementation from the official cleanlab library.

Install Cleanlab Package


Install the Cleanlab version used for our experiments: pip install ./cleanlab

Download Datasets


CoNLL-2003:

Experiments


token-classification-benchmark.ipynb: We implement 11 different methods of aggregating the label quality scores for each token to obtain an overall score per sentence, and evaluate the precision-recall curve and related label-error detection metrics for each method. We consider the named entity recognition dataset CoNLL-2003, and use CoNLL++ as the ground truth.

token-level.ipynb: We examine the token-level label errors for the same dataset (rather than sentence-level). We examine the distribution of the label errors by class, and evaluate different label quality scoring methods on the token-level.