InfoLM: A New Metric to Evaluate Summarization & Data2Text Generation, Pierre Colombo+, N/A, AAAI'22 #973

AkihikoWatanabe · 2023-08-13T13:05:21Z

URL

https://arxiv.org/abs/2112.01589

Affiliations

Pierre Colombo, N/A
Chloe Clavel, N/A
Pablo Piantanida, N/A

Abstract

Assessing the quality of natural language generation systems through humanannotation is very expensive. Additionally, human annotation campaigns aretime-consuming and include non-reusable human labour. In practice, researchersrely on automatic metrics as a proxy of quality. In the last decade, manystring-based metrics (e.g., BLEU) have been introduced. However, such metricsusually rely on exact matches and thus, do not robustly handle synonyms. Inthis paper, we introduce InfoLM a family of untrained metrics that can beviewed as a string-based metric that addresses the aforementioned flaws thanksto a pre-trained masked language model. This family of metrics also makes useof information measures allowing the adaptation of InfoLM to various evaluationcriteria. Using direct assessment, we demonstrate that InfoLM achievesstatistically significant improvement and over $10$ points of correlation gainsin many configurations on both summarization and data2text generation.

Translation (by gpt-3.5-turbo)

自然言語生成システムの品質評価は、人間の注釈を通じて行うことは非常に高価です。さらに、人間の注釈キャンペーンは時間がかかり、再利用できない人的労働を含みます。実際には、研究者は品質の代理として自動評価指標に頼ることが多いです。過去10年間で、多くの文字列ベースの評価指標（例：BLEU）が導入されてきました。しかし、このような指標は通常、完全一致に依存しており、したがって同義語を堅牢に処理することができません。本論文では、事前学習されたマスクされた言語モデルを利用することで、前述の欠点に対処する文字列ベースの評価指標であるInfoLMというファミリーを紹介します。この評価指標ファミリーは情報量を利用し、InfoLMをさまざまな評価基準に適応させることができます。直接評価を用いて、InfoLMが要約とデータ生成の両方の設定で統計的に有意な改善と10ポイント以上の相関向上を達成することを示します。

Summary (by gpt-3.5-turbo)

自然言語生成システムの品質評価は高価であり、人間の注釈に頼ることが一般的です。しかし、自動評価指標を使用することもあります。本研究では、マスクされた言語モデルを使用した評価指標であるInfoLMを紹介します。この指標は同義語を処理することができ、要約やデータ生成の設定で有意な改善を示しました。

AkihikoWatanabe added the Pocket label Aug 13, 2023

AkihikoWatanabe changed the title ~~InfoLM: A New Metric to Evaluate Summarization & Data2Text Generation, Colombo+, AAAI'22~~ InfoLM: A New Metric to Evaluate Summarization & Data2Text Generation, Pierre Colombo+, N/A, arXiv'21 Aug 13, 2023

AkihikoWatanabe changed the title ~~InfoLM: A New Metric to Evaluate Summarization & Data2Text Generation, Pierre Colombo+, N/A, arXiv'21~~ InfoLM: A New Metric to Evaluate Summarization & Data2Text Generation, Pierre Colombo+, N/A, AAAI'22 Aug 13, 2023

AkihikoWatanabe added DocumentSummarization NaturalLanguageGeneration Metrics NLP Evaluation Reference-based labels Aug 13, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

InfoLM: A New Metric to Evaluate Summarization & Data2Text Generation, Pierre Colombo+, N/A, AAAI'22 #973

InfoLM: A New Metric to Evaluate Summarization & Data2Text Generation, Pierre Colombo+, N/A, AAAI'22 #973

AkihikoWatanabe commented Aug 13, 2023 •

edited

InfoLM: A New Metric to Evaluate Summarization & Data2Text Generation, Pierre Colombo+, N/A, AAAI'22 #973

InfoLM: A New Metric to Evaluate Summarization & Data2Text Generation, Pierre Colombo+, N/A, AAAI'22 #973

Comments

AkihikoWatanabe commented Aug 13, 2023 • edited

URL

Affiliations

Abstract

Translation (by gpt-3.5-turbo)

Summary (by gpt-3.5-turbo)

AkihikoWatanabe commented Aug 13, 2023 •

edited