ComparEval - A toolkit for comparative evaluation of outputs of two or more NLP systems

Implementations of the evaluation methods decribed in:

Tuggener, Don (2017). A method for in-depth comparative evaluation: How (dis)similar are outputs of pos taggers, dependency parsers and coreference resolvers really? In: Proceedings of EACL 2017. (to appear)

TL;DR: The method compares two (or more) system outputs to a gold standard and quantifies and highlights the differences between the outputs. Also, it can create upper bounds for system output combinations.

Currently supported are: POS tagging, dependecy parsing, and coreference resolution.

Usage

Each supported task has two python scripts to perform the evaluation. To compare e.g. two POS tagger outputs, run:

python pos_tagging/compare_pos_taggers.py <gold_standard_file> <system_output1_file> <system_output2_file>

Accordingly, to calculate the upper bounds, run:

python pos_tagging/upper_bounds_pos_taggers.py <gold_standard_file> <system_output1_file> <system_output2_file> <system_output3_file> ...

The input format is CoNLL (i.e. one token per line). Set the column indexes of the relevant features (e.g. column index of the POS tags) at the beginning of the scripts if they don't match your CoNLL format (For coreference resolution, it is always the last column).

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
coreference_resolution		coreference_resolution
dependency_parsing		dependency_parsing
pos_tagging		pos_tagging
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ComparEval - A toolkit for comparative evaluation of outputs of two or more NLP systems

Usage

About

Releases

Packages

Languages

License

dtuggener/ComparEval

Folders and files

Latest commit

History

Repository files navigation

ComparEval - A toolkit for comparative evaluation of outputs of two or more NLP systems

Usage

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages