Microservice to handle automatic evaluation of neural chatbot models. Multiple automated evaluation methods (including embedding-based metrics).
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Type Name Latest commit message Commit time
Failed to load latest commit information.



Microservice of ChatEval to handle evaluation of neural chatbot models. Uses both word embeddings and Amazon Mechanical Turk to evaluate models.


The Evaluation microservice can be initialized by running source init.sh to wget the pre-trained word embeddings (configurable with an enviroment variable named EMBEDDING_FILE) and to run the Flask server at port 8001.

To run the automatic evaluation, a POST request must be made to /auto containing parameters model_responses and baseline_responses, as equal length string lists. The response is a JSON object containing keys for the evaluation metrics and their corresponding float values.

(Optional) Docker Installation

ChatEval supports the use of Docker as both a development and deployment tool.

  1. Install Docker.
  2. Configure environment variables in Dockerfile by adding ENV variable value for each environment variable.
  3. Build Docker image by using docker build -t evaluation . (this may take some time).
  4. Run Evaluation on port 8001 by using docker run evaluation
  5. Access app at localhost:8001.