Discover, Explain, Improve: An Automatic Slice Detection Benchmark for Natural Language Processing

Abstract

Pretrained natural language processing (NLP) models have achieved high overall performance, but they still make systematic errors. Instead of manual error analysis, research on slice detection models (SDM), which automatically identify underperforming groups of datapoints, has caught escalated attention in Computer Vision for both understanding model behaviors and providing insights for future model training and designing. However, little research on SDM and quantitative evaluation of their effectiveness have been conducted on NLP tasks. Our paper fills the gap by proposing a benchmark named ``Discover, Explain, Improve (DEIM)" for classification NLP tasks along with a new SDM Edisa. Edisa discovers coherent and underperforming groups of datapoints; DEIM then unites them under human-understandable concepts and provides comprehensive evaluation tasks and corresponding quantitative metrics. The evaluation in DEIM shows that Edisa can accurately select error-prone datapoints with informative semantic features that summarize error patterns. Detecting difficult datapoints directly boosts model performance without tuning any original model parameters, showing that discovered slices are actionable for users.

code under organization

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
base_model		base_model
slicer		slicer
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

base_model

base_model

slicer

slicer

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

Repository files navigation

Discover, Explain, Improve: An Automatic Slice Detection Benchmark for Natural Language Processing

Abstract

About

Releases

Packages

Languages

License

Wenyueh/DEIM

Folders and files

Latest commit

History

Repository files navigation

Discover, Explain, Improve: An Automatic Slice Detection Benchmark for Natural Language Processing

Abstract

About

Resources

License

Stars

Watchers

Forks

Languages