This is a Jupyter Notebook file to calculate the 3 most popular Word Embedding-based metrics with Python to evaluate a generative conversational chatbot's answering performance for dialogue texts.
The 3 metrics implemented:
(see "EMBEDDING_METRICS_TEST_EXAMPLE")
- A Comparison of Greedy and Optimal Assessment of Natural Language Student Input Word Similarity Metrics Using Word to Word Similarity Metrics. Vasile Rus, Mihai Lintean. 2012. Proceedings of the Seventh Workshop on Building Educational Applications Using NLP, NAACL 2012.
- Bootstrapping Dialog Systems with Word Embeddings. G. Forgues, J. Pineau, J. Larcheveque, R. Tremblay. 2014. Workshop on Modern Machine Learning and Natural Language Processing, NIPS 2014.
- Sai, A. B., Mohankumar, A. K., and Khapra, M. M. (2022). A survey ofevaluation metrics used for nlg systems. ACM Computing Surveys (CSUR),55(2):1–39.