Compression, Transduction, and Creation: A Unified Framework for Evaluating Natural Language Generation, Deng+, EMNLP''21 #964

AkihikoWatanabe · 2023-08-13T12:32:50Z

https://aclanthology.org/2021.emnlp-main.599/

AkihikoWatanabe · 2023-08-13T12:33:33Z

Natural language generation (NLG) spans a broad range of tasks, each of which serves for specific objectives and desires different properties of generated text. The complexity makes automatic evaluation of NLG particularly challenging. Previous work has typically focused on a single task and developed individual evaluation metrics based on specific intuitions. In this paper, we propose a unifying perspective based on the nature of information change in NLG tasks, including compression (e.g., summarization), transduction (e.g., text rewriting), and creation (e.g., dialog). Information alignment between input, context, and output text plays a common central role in characterizing the generation. With automatic alignment prediction models, we develop a family of interpretable metrics that are suitable for evaluating key aspects of different NLG tasks, often without need of gold reference data. Experiments show the uniformly designed metrics achieve stronger or comparable correlations with human judgement compared to state-of-the-art metrics in each of diverse tasks, including text summarization, style transfer, and knowledge-grounded dialog.

Translation (by gpt-3.5-turbo)

自然言語生成（NLG）は、さまざまなタスクを含み、それぞれ特定の目的に役立ち、生成されたテキストに異なる特性を求める。その複雑さから、NLGの自動評価は特に困難である。従来の研究では、単一のタスクに焦点を当て、特定の直感に基づいて個別の評価指標を開発してきた。本論文では、NLGタスクにおける情報変換の性質に基づいた統一的な視点を提案する。これには、圧縮（要約など）、変換（テキストの書き換えなど）、および生成（対話など）が含まれる。入力、コンテキスト、および出力テキスト間の「情報の整合性」は、生成を特徴づけるための共通の中心的な役割を果たす。自動的な整合性予測モデルを用いて、異なるNLGタスクの主要な側面を評価するために適した解釈可能な評価指標のファミリーを開発する。これにより、ゴールドリファレンスデータを必要とせずに、一貫して設計された評価指標が、テキスト要約、スタイル変換、知識に基づく対話など、さまざまなタスクの最先端の評価指標と比較して、人間の判断とより強いまたは同等の相関関係を達成することが実験で示されている。

Summary (by gpt-3.5-turbo)

本研究では、自然言語生成（NLG）タスクの評価において、情報の整合性を重視した統一的な視点を提案する。情報の整合性を評価するための解釈可能な評価指標のファミリーを開発し、ゴールドリファレンスデータを必要とせずに、さまざまなNLGタスクの評価を行うことができることを実験で示した。

AkihikoWatanabe · 2023-08-13T12:34:03Z

CTC

AkihikoWatanabe added DocumentSummarization NLP Evaluation FactualConsistency LM-based labels Aug 13, 2023

AkihikoWatanabe added translation_required Metrics labels Aug 13, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Compression, Transduction, and Creation: A Unified Framework for Evaluating Natural Language Generation, Deng+, EMNLP''21 #964

Compression, Transduction, and Creation: A Unified Framework for Evaluating Natural Language Generation, Deng+, EMNLP''21 #964

AkihikoWatanabe commented Aug 13, 2023

AkihikoWatanabe commented Aug 13, 2023 •

edited

AkihikoWatanabe commented Aug 13, 2023

Compression, Transduction, and Creation: A Unified Framework for Evaluating Natural Language Generation, Deng+, EMNLP''21 #964

Compression, Transduction, and Creation: A Unified Framework for Evaluating Natural Language Generation, Deng+, EMNLP''21 #964

Comments

AkihikoWatanabe commented Aug 13, 2023

AkihikoWatanabe commented Aug 13, 2023 • edited

Translation (by gpt-3.5-turbo)

Summary (by gpt-3.5-turbo)

AkihikoWatanabe commented Aug 13, 2023

AkihikoWatanabe commented Aug 13, 2023 •

edited