Study of automatic evaluation metrics applied to story generation in relation to human metrics

Project created by Clémence Millet and Vinciane Desbois, 2023

In this project we were interested in performance metrics of text generation algorithms. The notebook related to our article is available under the name project_nlp_similarity.ipynb. The paper is available on OpenReview : https://openreview.net/pdf?id=b-2xX-oOmUn

Abstract :

Automatic story generation is a complex branch of NLP whose evaluation techniques have been less studied than for summarization or data-to-text. In this analysis, we will focus on the relevance of the different existing automatic metrics, both traditional and more recent, to evaluate this type of task. With the help of a dataset annotated by human evaluators, we compare automatic metrics to human metrics, look for correlations between them and observe the performance of automatic metrics in predicting some human metrics. Our results mainly show a high similarity between all automatic metrics and their difficulty in predicting human metrics, even when combined.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Study of automatic evaluation metrics applied to story generation in relation to human metrics

Abstract :

Files

README.md

Latest commit

History

README.md

File metadata and controls

Study of automatic evaluation metrics applied to story generation in relation to human metrics

Abstract :