Annie the annotator

Annotators for extracting epidemiologic information from text.


Annie provides the following classes for organizing annotations.

AnnoDoc - The document being annotated. The AnnoDoc links to the tiers of annotations applied to it.

AnnoTier - A group of AnnoSpans. Generally each annotator creates a new tier of annotations.

AnnoSpan - A span of text with an annotation applied to it.


Geoname Annotator

The geoname annotator uses the dataset to resolve mentions of geonames. A classifier is used to disambiguate geonames and rule out false positives.

To use the geoname annotator run the following command to import data into an embedded sqlite3 database:

python -m annotator.sqlite_import_geonames

This annotator also requires installing the nltk name entitiy extractor.

Resolved Keyword Annotator

The resolved keyword annotator uses synonyms from the disease ontology to resolve mentions of diseases to doid uris.

To use the geoname annotator run the following command to import the disease ontology data into an embedded sqlite3 database:

python -m annotator.sqlite_import_disease_ontology

Count Annotator

The count annotator identifies counts, and case counts in particular. The count's value is extracted and parsed. Attributes such as whether the count refers to cases or deaths, or whether the value is approximate are also extracted.

JVM-NLP Annotator

The jvm_nl_annotator relies on a server from this project to create annotations using Stanford's NLP library:

The AnnoTiers it creates include tokens, sentences, pos tags and named entities.


