Context Does Matter: Implications for Crowdsourced Evaluation Labels in Task-Oriented Dialogue Systems, NAACL'24 Findings
The repository consists of data used for analysis to understand how different context sizes and types influence the consistency of human evaluation labels.