You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Increasing Diversity While Maintaining Accuracy: Text Data Generation
with Large Language Models and Human Interventions, John Joon Young Chung+, N/A, arXiv'23
#714
Large language models (LLMs) can be used to generate text data for trainingand evaluating other models. However, creating high-quality datasets with LLMscan be challenging. In this work, we explore human-AI partnerships tofacilitate high diversity and accuracy in LLM-based text data generation. Wefirst examine two approaches to diversify text generation: 1) logitsuppression, which minimizes the generation of languages that have already beenfrequently generated, and 2) temperature sampling, which flattens the tokensampling probability. We found that diversification approaches can increasedata diversity but often at the cost of data accuracy (i.e., text and labelsbeing appropriate for the target domain). To address this issue, we examinedtwo human interventions, 1) label replacement (LR), correcting misalignedlabels, and 2) out-of-scope filtering (OOSF), removing instances that are outof the user's domain of interest or to which no considered label applies. Withoracle studies, we found that LR increases the absolute accuracy of modelstrained with diversified datasets by 14.4%. Moreover, we found that some modelstrained with data generated with LR interventions outperformed LLM-basedfew-shot classification. In contrast, OOSF was not effective in increasingmodel accuracy, implying the need for future work in human-in-the-loop textdata generation.
AkihikoWatanabe
changed the title
あ
Increasing Diversity While Maintaining Accuracy: Text Data Generation
with Large Language Models and Human Interventions, John Joon Young Chung+, N/A, arXiv'23
Jun 16, 2023
URL
Affiliations
Abstract
Translation (by gpt-3.5-turbo)
Summary (by gpt-3.5-turbo)
The text was updated successfully, but these errors were encountered: