HOLMS: Alternative Summary Evaluation with Large Language Models, Mrabet+, COLING'20 #982

AkihikoWatanabe · 2023-08-13T13:54:37Z

https://aclanthology.org/2020.coling-main.498/

AkihikoWatanabe · 2023-08-13T13:54:56Z

Efficient document summarization requires evaluation measures that can not only rank a set of systems based on an average score, but also highlight which individual summary is better than another. However, despite the very active research on summarization approaches, few works have proposed new evaluation measures in the recent years. The standard measures relied upon for the development of summarization systems are most often ROUGE and BLEU which, despite being efficient in overall system ranking, remain lexical in nature and have a limited potential when it comes to training neural networks. In this paper, we present a new hybrid evaluation measure for summarization, called HOLMS, that combines both language models pre-trained on large corpora and lexical similarity measures. Through several experiments, we show that HOLMS outperforms ROUGE and BLEU substantially in its correlation with human judgments on several extractive summarization datasets for both linguistic quality and pyramid scores.

Translation (by gpt-3.5-turbo)

効率的な文書要約には、平均スコアに基づいてシステムのセットをランク付けるだけでなく、個々の要約が他の要約よりも優れていることを強調する評価尺度が必要です。しかし、要約手法に関する非常に活発な研究にもかかわらず、最近の数年間で新しい評価尺度を提案した研究はほとんどありません。要約システムの開発に頼られる標準的な尺度は、通常ROUGEとBLEUであり、全体的なシステムのランキングには効果的ですが、語彙的な性質を持ち、ニューラルネットワークのトレーニングには限定的な可能性があります。本論文では、大規模なコーパスで事前学習された言語モデルと語彙的類似度尺度を組み合わせた新しいハイブリッド評価尺度であるHOLMSを提案します。いくつかの実験を通じて、HOLMSが言語的品質とピラミッドスコアの両方において、いくつかの抽出型要約データセットにおける人間の判断との相関でROUGEとBLEUを大幅に上回ることを示します。

Summary (by gpt-3.5-turbo)

要約手法の評価尺度として、ROUGEとBLEUが一般的に使用されているが、これらは語彙的な性質を持ち、ニューラルネットワークのトレーニングには限定的な可能性がある。本研究では、大規模なコーパスで事前学習された言語モデルと語彙的類似度尺度を組み合わせた新しい評価尺度であるHOLMSを提案する。実験により、HOLMSがROUGEとBLEUを大幅に上回り、人間の判断との相関も高いことを示した。

AkihikoWatanabe · 2023-08-13T13:58:20Z

Hybrid Lexical and MOdel-based evaluation of Summaries (HOLMS)

AkihikoWatanabe changed the title ~~HOLMS: Alternative Summary Evaluation with Large Language Models~~ HOLMS: Alternative Summary Evaluation with Large Language Models, Mrabet+, COLING'20 Aug 13, 2023

AkihikoWatanabe added DocumentSummarization Evaluation Metrics NLP translation_required Reference-based labels Aug 13, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HOLMS: Alternative Summary Evaluation with Large Language Models, Mrabet+, COLING'20 #982

HOLMS: Alternative Summary Evaluation with Large Language Models, Mrabet+, COLING'20 #982

AkihikoWatanabe commented Aug 13, 2023

AkihikoWatanabe commented Aug 13, 2023 •

edited

AkihikoWatanabe commented Aug 13, 2023

HOLMS: Alternative Summary Evaluation with Large Language Models, Mrabet+, COLING'20 #982

HOLMS: Alternative Summary Evaluation with Large Language Models, Mrabet+, COLING'20 #982

Comments

AkihikoWatanabe commented Aug 13, 2023

AkihikoWatanabe commented Aug 13, 2023 • edited

Translation (by gpt-3.5-turbo)

Summary (by gpt-3.5-turbo)

AkihikoWatanabe commented Aug 13, 2023

AkihikoWatanabe commented Aug 13, 2023 •

edited