In [None]:
import warnings
warnings.filterwarnings("ignore")

# 3. Document Classification

# Overview
Now we have the basic pieces in place to make our document classification. Our document classification is as follows:
- **'POS'**: **There is at least one mention of COVID-19 with a positive status and no excluding modifiers**
- **'UNK': There is are no positive mentions of COVID-19**
- **'NEG': There are no positive, uncertain, or unasserted mentions**

Here is the relevant snippet of code from `cov_bsv.document_classifier.py`:

---
```python
excluded_ents = set()
positive_ents = set()
unasserted_ents = set()

for ent in doc.ents:
    if ent.label_ != "COVID-19":
        continue
    # If the entity is negated, experienced by someone else,
    # Future/hypothetical, or marked as not relevant,
    # considered this entity to be 'excluded'
    if any(
        [
            ent._.is_negated,
            ent._.is_other_experiencer,
            ent._.is_future,
            ent._.is_not_relevant,

        ]
    ):
        excluded_ents.add(ent)
    # If it is 'positive' and not uncertain,
    # consider it to be a 'positive' ent
    elif ent._.is_positive and not ent._.is_uncertain:
        positive_ents.add(ent)
    # If either there are no excluding modifiers or it is
    # marked as 'uncertain', consider it 'unasserted
    else:
        unasserted_ents.add(ent)

# If there is at least one 'positive' ent, return 'POS'
# If there are only unasserted/uncertain, return 'UNK'
# If there are no ents or only excluded ents, return 'NEG'
if positive_ents:
    doc_classification = "POS"
elif unasserted_ents:
    doc_classification = "UNK"
else:
    doc_classification = "NEG"
return doc_classification
```
---


In this notebook, we'll look at some examples of how this classifies example texts.

In [14]:
import cov_bsv
from cov_bsv import visualize_doc

In [15]:
nlp = cov_bsv.load()

In [16]:
nlp.pipe_names

['tagger',
 'parser',
 'concept_tagger',
 'target_matcher',
 'sectionizer',
 'context',
 'postprocessor',
 'document_classifier']

In [17]:
texts = [
    "Patient presents to be tested for COVID-19.",
    "Suspicion for novel coronavirus",
    "His wife recently tested positive for novel coronavirus.",
    "SARS-COV-2 results came back positive.",
    "Diagnosis: SARS-COV-2",
    "Patient is a 76 year old man with COVID-19",
    "negative for novel coronavirus."
]

In [18]:
docs = list(nlp.pipe(texts))

In [19]:
for doc in docs:
    visualize_doc(doc)

# Next Steps
We've now walked through each of the steps of our pipeline! These basic steps take care of a lot of the logic of our COVID-19 classification:

- Target concept extraction
- Attribute assertion
- Document inference

However, clinical data is messy, and there are certain clinical scenarios and cases that we may want to handle differently. In the final notebook, we'll look at some mechanisms we have for handling special cases and implementing special logic.

[04-fixing-errors.ipynb](04-fixing-errors.ipynb)