Skip to content

This is a test set to evaluate self-supervised NER. Repository evaluates 11 preprocessed data datasets spanning biomedical domain as well as patient privacy related entities (person,location,organization)

License

Notifications You must be signed in to change notification settings

ajitrajasekharan/ner_test

Repository files navigation

Ner Test suite

Test suite for Self-supervised NER

Prerequisites

Python 3.x

Usage

  1. This test suite has 11 benchmarks. The tests take the ner output runs from the self-supervised NER link mentioned above. These output runs are two columun format files with term and entity type , with the only addition that the prediction could be two predictions with subtypes for each prediction as oppposed to just one prediction (which is the normal case).
  2. The test sentences are generated from the two column test.tsv file (containing term and ground truth prediction ). While doing so, a specified sample of sentences are POS tagged (if this option is chosen in the config) to compare phrase spans in test set with phrase spans of a POS tagger. This is to examine if the test set tagging is consistent with a POS tagger.
  3. The evaluation of an NER output can be done in two ways (1) standard single prediction output only by just taking the first prediction (-strict) or (2) picking the prediction of the two, that matches ground truth. The performance numbers are reported separately. The evaluation also has an otion to skip false positive reporting on sentences with just the OTHER tag (-ignore_others option; set to false by default) THis is explained in the Self-supervised post NER

About

This is a test set to evaluate self-supervised NER. Repository evaluates 11 preprocessed data datasets spanning biomedical domain as well as patient privacy related entities (person,location,organization)

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages