Ner Test suite

Prerequisites

Python 3.x

This test suite has 11 benchmarks. The tests take the ner output runs from the self-supervised NER link mentioned above. These output runs are two columun format files with term and entity type , with the only addition that the prediction could be two predictions with subtypes for each prediction as oppposed to just one prediction (which is the normal case).
The test sentences are generated from the two column test.tsv file (containing term and ground truth prediction ). While doing so, a specified sample of sentences are POS tagged (if this option is chosen in the config) to compare phrase spans in test set with phrase spans of a POS tagger. This is to examine if the test set tagging is consistent with a POS tagger.
The evaluation of an NER output can be done in two ways (1) standard single prediction output only by just taking the first prediction (-strict) or (2) picking the prediction of the two, that matches ground truth. The performance numbers are reported separately. The evaluation also has an otion to skip false positive reporting on sentences with just the OTHER tag (-ignore_others option; set to false by default) THis is explained in the Self-supervised post NER

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
datasets		datasets
results		results
LICENSE		LICENSE
README.md		README.md
common.py		common.py
config_utils.py		config_utils.py
eval_config.json		eval_config.json
eval_results.py		eval_results.py
extract_config.json		extract_config.json
extract_first_prediction.py		extract_first_prediction.py
extract_sentences.py		extract_sentences.py