CoAnnotating: Uncertainty-Guided Work Allocation between Human and Large Language Models for Data Annotation

This repository contains the code implementation for the paper titled "CoAnnotating: Uncertainty-Guided Work Allocation between Human and Large Language Models for Data Annotation".

Abstract

Annotated data plays a critical role in Natural Language Processing (NLP) in training models and evaluating their performance. Given recent developments in Large Language Models (LLMs), models such as ChatGPT demonstrate zero-shot capability on many text-annotation tasks, comparable with or even exceeding human annotators. Such LLMs can serve as alternatives for manual annotation, due to lower costs and higher scalability. However, limited work has leveraged LLMs as a complementary annotator nor explored how annotation work is best allocated among humans and LLMs to achieve both quality and cost objectives. We propose CoAnnotating, a novel paradigm for Human-LLM co-annotation of unstructured texts at scale. Under this framework, we utilize uncertainty to estimate LLMs' annotation capability. Our empirical study shows CoAnnotating to be an effective means to allocate work from results on different datasets.

Usage

This section explains how you can apply human--LLM collaboration for data annotation.

Prompt LLM several times using different types of prompts.
Detailed implementation in LLM_Inferencing.py
Postprocess the output and calculate entropy in responses.
Detailed implementation in Expertise_Estimation.py
Sort instances by entropy in ascending order and allocate top x% for LLM for annotation. Finetune a pretrained classifier using data that is created with human--LLM collaboration.
Detailed implementation in Work_Allocation.ipynb

Citation and Contact

If you find this repository helpful, please cite our paper.

@inproceedings{li2023coannotating,
    title={CoAnnotating: Uncertainty-Guided Work Allocation between Human and Large Language Models for Data Annotation},
    author={Minzhi Li and Taiwei Shi and Caleb Ziems and Min-Yen Kan and Nancy F. Chen and Zhengyuan Liu and Diyi Yang},
    booktitle={Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing},
    year={2023},
}

Feel free to contact Minzhi at li.minzhi@u.nus.edu, if you have any questions about the paper.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
img		img
Confidence_based_allocation_agnews.ipynb		Confidence_based_allocation_agnews.ipynb
Entropy_based_allocation_agnews.ipynb		Entropy_based_allocation_agnews.ipynb
Expertise_Estimation.py		Expertise_Estimation.py
LLM_Inferencing.py		LLM_Inferencing.py
README.md		README.md
Random_allocation_agnews.ipynb		Random_allocation_agnews.ipynb
Work_Allocation.ipynb		Work_Allocation.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

img

img

Confidence_based_allocation_agnews.ipynb

Confidence_based_allocation_agnews.ipynb

Entropy_based_allocation_agnews.ipynb

Entropy_based_allocation_agnews.ipynb

Expertise_Estimation.py

Expertise_Estimation.py

LLM_Inferencing.py

LLM_Inferencing.py

README.md

README.md

Random_allocation_agnews.ipynb

Random_allocation_agnews.ipynb

Work_Allocation.ipynb

Work_Allocation.ipynb

Repository files navigation

CoAnnotating: Uncertainty-Guided Work Allocation between Human and Large Language Models for Data Annotation

Abstract

Usage

Citation and Contact

About

Releases

Packages

Contributors 2

Languages

SALT-NLP/CoAnnotating

Folders and files

Latest commit

History

Repository files navigation

CoAnnotating: Uncertainty-Guided Work Allocation between Human and Large Language Models for Data Annotation

Abstract

Usage

Citation and Contact

About

Resources

Stars

Watchers

Forks

Languages