This repository contains the code example from our paper How to do human evaluation: A brief introduction to user studies in NLP.
The notebook analysis.ipynb contains R code corresponding to the toy example presented in Section 8.6 of our paper on our introduction to user studies in natural language processing.
It demonstrates the application of a Friedmann test and a Nemenyi post hoc test. We consider a fictional comparison of three chat bot systems that are compared with respect to user trust ratings.
This software is a research prototype, solely developed for and published as part of the publication cited above. It will neither be maintained nor monitored in any way.
This code is open-sourced under the MIT license. See the LICENSE file for details.
You can cite our paper using:
@article{schuff_vanderlyn_adel_vu_2023,
title={How to do human evaluation: A brief introduction to user studies in NLP},
DOI={10.1017/S1351324922000535}, journal={Natural Language Engineering},
publisher={Cambridge University Press},
author={Schuff, Hendrik and Vanderlyn, Lindsey and Adel, Heike and Vu, Ngoc Thang},
year={2023},
pages={1–24}}