Asking and Answering Questions to Evaluate the Factual Consistency of Summaries, Wang, ACL'20 #1007

AkihikoWatanabe · 2023-08-20T13:49:27Z

https://aclanthology.org/2020.acl-main.450/

AkihikoWatanabe · 2023-08-20T13:50:05Z

Practical applications of abstractive summarization models are limited by frequent factual inconsistencies with respect to their input. Existing automatic evaluation metrics for summarization are largely insensitive to such errors. We propose QAGS (pronounced “kags”), an automatic evaluation protocol that is designed to identify factual inconsistencies in a generated summary. QAGS is based on the intuition that if we ask questions about a summary and its source, we will receive similar answers if the summary is factually consistent with the source. To evaluate QAGS, we collect human judgments of factual consistency on model-generated summaries for the CNN/DailyMail (Hermann et al., 2015) and XSUM (Narayan et al., 2018) summarization datasets. QAGS has substantially higher correlations with these judgments than other automatic evaluation metrics. Also, QAGS offers a natural form of interpretability: The answers and questions generated while computing QAGS indicate which tokens of a summary are inconsistent and why. We believe QAGS is a promising tool in automatically generating usable and factually consistent text. Code for QAGS will be available at https://github.com/W4ngatang/qags.

Translation (by gpt-3.5-turbo)

抽象的な要約モデルの実用的な応用は、入力に対する頻繁な事実の不整合によって制限されています。要約のための既存の自動評価指標は、このようなエラーに対してほとんど感度がありません。私たちは、生成された要約の事実の不整合を特定するために設計された自動評価プロトコルであるQAGS（発音は「kags」）を提案します。QAGSは、要約とそのソースについて質問をすると、要約がソースと事実的に整合している場合は類似の回答が得られるという直感に基づいています。QAGSを評価するために、CNN/DailyMail（Hermann et al.、2015）およびXSUM（Narayan et al.、2018）の要約データセットにおいて、モデル生成の要約の事実的整合性に関する人間の判断を収集しました。QAGSは、他の自動評価指標と比較して、これらの判断とはるかに高い相関を持っています。また、QAGSは自然な解釈可能性を提供します。QAGSを計算する際に生成される回答と質問は、要約のどのトークンが整合性がなくなっているのか、そしてなぜなのかを示しています。私たちは、QAGSが使いやすく事実的に整合したテキストを自動的に生成するための有望なツールであると考えています。QAGSのコードはhttps://github.com/W4ngatang/qagsで利用可能です。

Summary (by gpt-3.5-turbo)

要約の事実の不整合を特定するための自動評価プロトコルであるQAGSを提案する。QAGSは、要約とソースについて質問をし、整合性がある回答を得ることで要約の事実的整合性を評価する。QAGSは他の自動評価指標と比較して高い相関を持ち、自然な解釈可能性を提供する。QAGSは有望なツールであり、https://github.com/W4ngatang/qagsで利用可能。

AkihikoWatanabe · 2023-08-20T13:53:04Z

QAGS

AkihikoWatanabe · 2023-08-20T14:03:57Z

生成された要約からQuestionを生成する手法。precision-oriented

AkihikoWatanabe added DocumentSummarization Metrics NLP Evaluation Reference-free QA-based labels Aug 20, 2023

AkihikoWatanabe added Pocket translation_required labels Aug 20, 2023

AkihikoWatanabe mentioned this issue Aug 20, 2023

QuestEval: Summarization Asks for Fact-based Evaluation, Thomas Scialom+, N/A, EMNLP'21 #974

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Asking and Answering Questions to Evaluate the Factual Consistency of Summaries, Wang, ACL'20 #1007

Asking and Answering Questions to Evaluate the Factual Consistency of Summaries, Wang, ACL'20 #1007

AkihikoWatanabe commented Aug 20, 2023

AkihikoWatanabe commented Aug 20, 2023 •

edited

AkihikoWatanabe commented Aug 20, 2023

AkihikoWatanabe commented Aug 20, 2023

Asking and Answering Questions to Evaluate the Factual Consistency of Summaries, Wang, ACL'20 #1007

Asking and Answering Questions to Evaluate the Factual Consistency of Summaries, Wang, ACL'20 #1007

Comments

AkihikoWatanabe commented Aug 20, 2023

AkihikoWatanabe commented Aug 20, 2023 • edited

Translation (by gpt-3.5-turbo)

Summary (by gpt-3.5-turbo)

AkihikoWatanabe commented Aug 20, 2023

AkihikoWatanabe commented Aug 20, 2023

AkihikoWatanabe commented Aug 20, 2023 •

edited