Automatic evaluation script for DSTC7 Task 2

Steps:

Make sure you 'git pull' the latest changes (from October 15, 2018), including changes in ../../data_extraction.
cd to ../../data_extraction and type make. This will create the multi-reference file used by the metrics (../../data_extraction/test.refs).
Install 3rd party software as instructed below (METEOR and mteval-v14c.pl).
Run the following command, where [SUBMISSION] is the submission file you want to evaluate: (same format as the one you submitted on Oct 8.)

python dstc.py -c [SUBMISSION] --refs ../../data_extraction/test.refs

Important: the results printed by dstc.py might differ slightly from the official results, if part of your test set failed to download.

What does it do?

(Based on this repo by Sean Xiang Gao)

evaluation: calculate automated NLP metrics (BLEU, NIST, METEOR, entropy, etc...)

from metrics import nlp_metrics
nist, bleu, meteor, entropy, diversity, avg_len = nlp_metrics(
	  path_refs=["demo/ref0.txt", "demo/ref1.txt"], 
	  path_hyp="demo/hyp.txt")
	  
# nist = [1.8338, 2.0838, 2.1949, 2.1949]
# bleu = [0.4667, 0.441, 0.4017, 0.3224]
# meteor = 0.2832
# entropy = [2.5232, 2.4849, 2.1972, 1.7918]
# diversity = [0.8667, 1.000]
# avg_len = 5.0000

tokenization: clean string and deal with punctation, contraction, url, mention, tag, etc

from tokenizers import clean_str
s = " I don't know:). how about this?https://github.com"
clean_str(s)

# i do n't know :) . how about this ? __url__

Requirements

Works fine for both Python 2.7 and 3.6
Please downloads the following 3rd-party packages and save in a new folder 3rdparty:
- mteval-v14c.pl to compute NIST. You may need to install the following perl modules (e.g. by cpan install): XML:Twig, Sort:Naturally and String:Util.
- meteor-1.5 to compute METEOR. It requires Java.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Automatic evaluation script for DSTC7 Task 2

What does it do?

Requirements

Files

README.md

Latest commit

History

README.md

File metadata and controls

Automatic evaluation script for DSTC7 Task 2

What does it do?

Requirements