# Test the custom NER models

## Installing custom models

You can find download links for the `en_ner_rf_iXX_md` models in [README](https://github.com/chopeen/CORD-19/blob/master/README.md#base-model-en_core_sci_md).

Alternatively, you can install them by creating the environment `cord-19-test`:

```
$ cd test/
$ conda env create -f environment_test.yml
```

## Helpers

In [1]:
def get_entity_options():
    ents = ['ENTITY', 'RISK_FACTOR']
    colors_dict = {
        'ENTITY': '#e0d8ce',
        'RISK_FACTOR': '#cca0db'
    }
    return {'ents': ents, 'colors': colors_dict}

## Dummy input test

In [2]:
text = """The following risk factors were is scope of the analysis:
 - age,
 - gender,
 - comorbidities (HBT, leukemia, CVD),
 - BP 140/90 or higher.
We identified older age and leukemia to be the major factors."""

## Run NER

Check [README](https://github.com/chopeen/CORD-19/blob/master/README.md#model-performance) for a discussion of model performance and datasets used for iterations 1..4.

In [3]:
import spacy

from pprint import pprint
from spacy import displacy

In [4]:
def show_entities(model, text):    
    rf_ner = spacy.load(model)
    doc = rf_ner(text)
    
    displacy.render(doc, style="ent", jupyter=True, options=get_entity_options())
    
    # GitHub does not render the highlights, so additionaly use text-only format
    pprint([(ent.text, ent.label_) for ent in doc.ents])

### en_ner_rf_i1_md

In [5]:
show_entities("en_ner_rf_i1_md", text)

[('age', 'RISK_FACTOR'),
 ('gender', 'RISK_FACTOR'),
 ('comorbidities', 'RISK_FACTOR'),
 ('HBT', 'RISK_FACTOR'),
 ('leukemia', 'RISK_FACTOR'),
 ('CVD', 'RISK_FACTOR'),
 ('older age', 'RISK_FACTOR'),
 ('leukemia', 'RISK_FACTOR')]


## en_ner_rf_i2_md

In [6]:
show_entities("en_ner_rf_i2_md", text)

[('age', 'RISK_FACTOR'),
 ('gender', 'RISK_FACTOR'),
 ('comorbidities', 'RISK_FACTOR'),
 ('HBT', 'RISK_FACTOR'),
 ('leukemia', 'RISK_FACTOR'),
 ('CVD', 'RISK_FACTOR'),
 ('older age', 'RISK_FACTOR')]


## en_ner_rf_i3_md

In [7]:
show_entities("en_ner_rf_i3_md", text)

[('age', 'RISK_FACTOR'),
 ('gender', 'RISK_FACTOR'),
 ('comorbidities', 'RISK_FACTOR'),
 ('HBT', 'RISK_FACTOR'),
 ('leukemia', 'RISK_FACTOR'),
 ('CVD', 'RISK_FACTOR'),
 ('older age', 'RISK_FACTOR'),
 ('leukemia to be the major factors', 'RISK_FACTOR')]


## en_ner_rf_i4_md

In [8]:
show_entities("en_ner_rf_i4_md", text)

[('comorbidities', 'RISK_FACTOR'),
 ('HBT', 'RISK_FACTOR'),
 ('leukemia', 'RISK_FACTOR'),
 ('CVD', 'RISK_FACTOR'),
 ('older age', 'RISK_FACTOR'),
 ('leukemia to be the major factors', 'RISK_FACTOR')]
