Annotation Curricula to Implicitly Train Non-Expert Annotators

Ji-Ung Lee``, Jan-Christoph Klie``, and Iryna Gurevych

UKP Lab, TU Darmstadt

* Both authors contributed equally.

Source code and user models from our experiments of our CL paper.

Abstract: Annotation studies often require annotators to familiarize themselves with the task, its annotation scheme, and the data domain. This can be overwhelming in the beginning, mentally taxing, and induce errors into the resulting annotations; especially in citizen science or crowd sourcing scenarios where domain expertise is not required and only annotation guidelines are provided. To alleviate these issues, we propose annotation curricula, a novel approach to implicitly train annotators. We gradually introduce annotators into the task by ordering instances that are annotated according to a learning curriculum. To do so, we first formalize annotation curricula for sentence- and paragraph-level annotation tasks, define an ordering strategy, and identify well-performing heuristics and interactively trained models on three existing English datasets. We then conduct a user study with 40 voluntary participants who are asked to identify the most fitting misconception for English tweets about the Covid-19 pandemic. Our results show that using a simple heuristic to order instances can already significantly reduce the total annotation time while preserving a high annotation quality. Annotation curricula thus can provide a novel way to improve data collection. To facilitate future research, we further share our code and data consisting of 2,400 annotations.

Contact
- Jan-Christoph Klie (klie@ukp.informatik.tu-darmstadt.de)
- Ji-Ung Lee (lee@ukp.informatik.tu-darmstadt.de)
- UKP Lab: http://www.ukp.tu-darmstadt.de/
- TU Darmstadt: http://www.tu-darmstadt.de/

Drop us a line or report an issue if something is broken (and shouldn't be) or if you have any questions.

For license information, please see the LICENSE and README files.

This repository contains experimental software and is published for the sole purpose of giving additional background details on the respective publication.

Project structure

experiments — Code for running our experiments from section 4 (Evaluation with Existing Datasets)
user_study — Code for running the user study from section 5 (Human Evaluation)

Setting up the experiments

pip install -r requirements.txt

Running the experiments

Please refer to the respective README.md files in the subfolders.

Data

The collected data from our study is provided on tu-datalib under a CC-by 4.0 license.

Citing the paper

Please cite our paper as:

@article{10.1162/coli_a_00436,
    author = {Lee, Ji-Ung and Klie, Jan-Christoph and Gurevych, Iryna},
    title = "{Annotation Curricula to Implicitly Train Non-Expert Annotators}",
    journal = {Computational Linguistics},
    volume = {48},
    number = {2},
    pages = {343-373},
    year = {2022},
    month = {06},
    issn = {0891-2017},
    doi = {10.1162/coli_a_00436},
    url = {https://doi.org/10.1162/coli\_a\_00436},
    eprint = {https://direct.mit.edu/coli/article-pdf/48/2/343/2029108/coli\_a\_00436.pdf},
}

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
experiments		experiments
user_study		user_study
LICENSE.txt		LICENSE.txt
NOTICE.txt		NOTICE.txt
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

experiments

experiments

user_study

user_study

LICENSE.txt

LICENSE.txt

NOTICE.txt

NOTICE.txt

README.md

README.md

Repository files navigation

Annotation Curricula to Implicitly Train Non-Expert Annotators

Ji-Ung Lee``, Jan-Christoph Klie``, and Iryna Gurevych

UKP Lab, TU Darmstadt

Project structure

Setting up the experiments

Running the experiments

Data

Citing the paper

About

Releases

Packages

Contributors 2

Languages

License

UKPLab/cl2022-annotation-curriculum

Folders and files

Latest commit

History

Repository files navigation

Annotation Curricula to Implicitly Train Non-Expert Annotators

Ji-Ung Lee*, Jan-Christoph Klie*, and Iryna Gurevych

Project structure

Setting up the experiments

Running the experiments

Data

Citing the paper

About

Resources

License

Stars

Watchers

Forks

Languages

Ji-Ung Lee``, Jan-Christoph Klie``, and Iryna Gurevych