You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Factual consistency evaluation is often conducted using Natural LanguageInference (NLI) models, yet these models exhibit limited success in evaluatingsummaries. Previous work improved such models with synthetic training data.However, the data is typically based on perturbed human-written summaries,which often differ in their characteristics from real model-generated summariesand have limited coverage of possible factual errors. Alternatively, largelanguage models (LLMs) have recently shown promising results in directlyevaluating generative tasks, but are too computationally expensive forpractical use. Motivated by these limitations, we introduce TrueTeacher, amethod for generating synthetic data by annotating diverse model-generatedsummaries using a LLM. Unlike prior work, TrueTeacher does not rely onhuman-written summaries, and is multilingual by nature. Experiments on the TRUEbenchmark show that a student model trained using our data, substantiallyoutperforms both the state-of-the-art model with similar capacity, and the LLMteacher. In a systematic study, we compare TrueTeacher to existing syntheticdata generation methods and demonstrate its superiority and robustness todomain-shift. Using the the mFACE dataset, we also show that our methodgeneralizes to multilingual scenarios. Finally, we release a large-scalesynthetic dataset with 1.4M examples generated using TrueTeacher.
AkihikoWatanabe
changed the title
あ
TrueTeacher: Learning Factual Consistency Evaluation with Large Language
Models, Zorik Gekhman+, N/A, arXiv'23
May 20, 2023
URL
Affiliations
Abstract
Translation (by gpt-3.5-turbo)
Summary (by gpt-3.5-turbo)
The text was updated successfully, but these errors were encountered: