In [None]:
import warnings
warnings.filterwarnings("ignore")

# 2. Attribute Assertion

# Overview
In this notebook, we'll look at how to assert attributes which indicate whether or not a mention of COVID-19 is positive or not. Due to the overwhelming volume of documents mentioning COVID-19, we needed to narrow down the amount of documents for clinical review using a very strict definition. A document must contain a mention of COVID-19 which is **clearly stated as being positive**. The exact definition of this has changed a bit over time - setting too strict of criteria leads to false negatives, while being too loose results in false positives. As of publication, the system follows this logic:

An entity is marked as positive if:
- It is not modfiied by an excluding concept, such as negation or uncertainty
- It is **either**:
    - Modified by a positive modifier, such as **"positive for"** or **"diagnosed with"**
    - Occurs in a certain section such as **"Diagnosis:"**
    - Is mentioned in conjunction with a related diagnosis such as **"Pneumonia"** or **"Acute respiratory failure"**
    
This notebook will show examples of how these attributes are asserted.

In [51]:
import cov_bsv
from cov_bsv import visualize_doc
from spacy import displacy

In [52]:
nlp = cov_bsv.load(enable=["tagger", "parser", "concept_tagger", "target_matcher"])

In [53]:
nlp.pipe_names

['tagger', 'parser', 'concept_tagger', 'target_matcher']

## Example Texts

In [54]:
texts = [
    "Patient presents to be tested for COVID-19.",
    "Suspicion for novel coronavirus",
    "His wife recently tested positive for novel coronavirus.",
    "SARS-COV-2 results came back positive.",
    "negative for COVID-19.",
    "Diagnoses:\n\n 1. SARS-COV-2",
    "Patient is a 76 year old man with COVID-19"
]

In [55]:
docs = list(nlp.pipe(texts))

In [56]:
for doc in docs:
    displacy.render(doc, style="ent")
    print("__"*20)

________________________________________


________________________________________


________________________________________


________________________________________


________________________________________


________________________________________


________________________________________


# ConText
## Contextual Modifiers
One of the most common use cases is that a target concept will be mentioned along with some statement indicating whether the disease is:
- Positive: **"SARS-COV-2 results came back *positive*"**
- Negative: **"*negative* for COVID-19."**
- Uncertain: **"*Suspicion for* novel coronavirus"**
- Experienced by someone other than the patient: **"*His wife* recently tested positive for novel coronavirus."**

One popular algorithm for detecting these modifiers is the 
[ConText](https://www.sciencedirect.com/science/article/pii/S1532046409000744) algorithm. You can find more complete examples and explanations of this algorithm in the [cycontext](https://github.com/medspacy/cycontext) package documentation. We'll give a few examples here.

ConText's algorithm is extremely simple, but effective. Once you have named entities identified in a sentence, you can run ConText to determine whether any of them are not affirmed for the patient at the time the note was written.

Let's look at some examples. First, we'll instantiate a blank instance of ConText and add it to our pipeline. 

In [57]:
from medspacy.context import ConTextComponent

In [58]:
context = ConTextComponent(nlp,
                           rules=None, # Don't load the default cycontext rules
                           add_attrs=cov_bsv.util.CONTEXT_MAPPING # Mapping of modifiers to attribute values
                          )

In [59]:
nlp.add_pipe(context)

We'll add the rules stored in `cov_bsv.knowledge_base.context_item_data` (this is all done under the hood with `medspacy.load()`). `context_item_data` contains the rules which define which modfiiers are extracted and how they behave:

In [60]:
from cov_bsv.knowledge_base import context_rules

In [61]:
context_rules[:5]

[ConTextRule(literal='Not Detected', category='NEGATED_EXISTENCE', pattern=[{'LOWER': {'IN': ['not', 'non']}}, {'IS_SPACE': True, 'OP': '*'}, {'TEXT': '-', 'OP': '?'}, {'LOWER': {'REGEX': 'detecte?d'}}], direction='BACKWARD'),
 ConTextRule(literal=': negative', category='NEGATED_EXISTENCE', pattern=None, direction='BACKWARD'),
 ConTextRule(literal='not been detected', category='NEGATED_EXISTENCE', pattern=None, direction='BACKWARD'),
 ConTextRule(literal='none detected', category='NEGATED_EXISTENCE', pattern=None, direction='BACKWARD'),
 ConTextRule(literal='free from', category='NEGATED_EXISTENCE', pattern=None, direction='FORWARD')]

Now, let's add our rules:

In [62]:
context.add(context_rules)

Now let's process our texts and see what modifiers are extracted. We can use additional visualizers from `medspacy.visualization` to show relationships between the targets and modifiers:

In [63]:
from medspacy.visualization import visualize_ent, visualize_dep

In [64]:
text = "Patient presents to be tested for COVID-19."
doc = nlp(text)

In [65]:
visualize_ent(doc)
visualize_dep(doc)

In [66]:
text =  "His wife recently tested positive for novel coronavirus."
doc = nlp(text)

In [67]:
visualize_ent(doc)
visualize_dep(doc)

In [68]:
text = "SARS-COV-2 results came back positive."
doc = nlp(text)

In [69]:
visualize_ent(doc)
visualize_dep(doc)

In [70]:
text = "Patient is a 76 year old man with COVID-19"
doc = nlp(text)

In [71]:
visualize_ent(doc)
visualize_dep(doc)

When a target concept is modified by a modifier, ConText sets certain attributes. We defined these in the `add_attrs` argument which maps modifier category names to attribute name/value pairs.

In [72]:
cov_bsv.util.CONTEXT_MAPPING

{'NEGATED_EXISTENCE': {'is_negated': True},
 'FUTURE/HYPOTHETICAL': {'is_future': True},
 'HISTORICAL': {'is_historical': True},
 'DEFINITE_POSITIVE_EXISTENCE': {'is_positive': True},
 'ADMISSION': {'is_positive': True},
 'NOT_RELEVANT': {'is_not_relevant': True},
 'UNCERTAIN': {'is_uncertain': True},
 'UNLIKELY': {'is_uncertain': True},
 'SCREENING': {'is_screening': True},
 'OTHER_EXPERIENCER': {'is_other_experiencer': True},
 'CONTACT': {'is_other_experiencer': True},
 'PATIENT_EXPERIENCER': {'is_other_experiencer': False, 'is_positive': True}}

The mapping below shows these rules. For example:
- If an entity is modified by **"NEGATED_EXISTENCE"**, `is_negated` is set to `True`
- If an entity is modfiied by **"DEFINITE_POSITIVE_EXISTENCE"**, `is_positive` is set to `True`

These can then be acccessed through the `_` attribute of each ent. For example:
```python
ent._.is_negated
```

Let's go through our original list of texts, see what modfifiers are extracted, and check a few attributes:

In [73]:
docs = list(nlp.pipe(texts))

In [74]:
for doc in docs:
    visualize_ent(doc)
    for ent in doc.ents:
        print("Uncertain:", ent._.is_uncertain)
        print("Negated:", ent._.is_negated)
        print("Positive:", ent._.is_positive)
        print("Experienced by someone else:", ent._.is_other_experiencer)
        print()

Uncertain: False
Negated: False
Positive: False
Experienced by someone else: False



Uncertain: True
Negated: False
Positive: False
Experienced by someone else: False



Uncertain: False
Negated: False
Positive: True
Experienced by someone else: True



Uncertain: False
Negated: False
Positive: True
Experienced by someone else: False



Uncertain: False
Negated: True
Positive: False
Experienced by someone else: False



Uncertain: False
Negated: False
Positive: False
Experienced by someone else: False



Uncertain: False
Negated: False
Positive: True
Experienced by someone else: False



# Section Detection
Clinical notes often contain a certain structure. The one example of this is the [SOAP note](https://www.globalpremeds.com/blog/2015/01/02/understanding-soap-format-for-clinical-rounds/). Different parts of the notes have different significance. For example, a document listed in the **Past Medical History** or **Problem List** is likely a historical condition which may not be relevant to a patient visit, where as the **Assessment/Plan** will be contain more up-to-date diagnoses.

Here, we'll add a section detection component and defines rules for matching sections and setting attributes based on section titles, similar to what we did with ConText.

In [75]:
from medspacy.section_detection import Sectionizer

In [76]:
sectionizer = Sectionizer(nlp, rules=None, add_attrs=cov_bsv.util.SECTION_ATTRS)

In [77]:
nlp.add_pipe(sectionizer)

In [78]:
sectionizer.add(cov_bsv.knowledge_base.section_rules)

In [79]:
# By default, entities occuring in this sections will be considered positive
print(cov_bsv.util.SECTION_ATTRS)

{'diagnoses': {'is_positive': True}, 'observation_and_plan': {'is_positive': True}, 'past_medical_history': {'is_positive': True}, 'problem_list': {'is_positive': True}}


In [80]:
text = """Labs:
    SARS-COV-2
    
    
    Diagnoses:
        1. Pneumonia
        2. Novel Coronavirus 2019
    """

In [81]:
doc = nlp(text)

In [82]:
visualize_ent(doc)

We can access a doc's sections and their normalized titles:

In [83]:
for section in doc._.sections:
    print(section.category)
    print(section.title_span)
    print(section.section_span)
    print("__"*20)

labs_and_studies
Labs:
Labs:
    SARS-COV-2
    
    
    
________________________________________
diagnoses
Diagnoses:
Diagnoses:
        1. Pneumonia
        2. Novel Coronavirus 2019
    
________________________________________


As well as for each entity:

In [84]:
for ent in doc.ents:
    print(ent)
    print(ent._.section_category)
    print(ent._.is_positive)
    print("__"*20)

SARS-COV-2
labs_and_studies
False
________________________________________
Novel Coronavirus 2019
diagnoses
True
________________________________________


# Attributes with Target Concepts
Sometimes, you don't want to rely on ConText to identify relationships between target and modifiers. ConText will miss these relationships if a sentence is split incorrectly, causing the two spans to be in different sentences. Or you may want to extract certain specific phrases but **explicitly define** them as being positive.

For example, if COVID-19 is mentioned with the name of an associated disease, such as **"pneumonia"**, this is more likely to be an actual diagnosis. Using the concept and target rules from the previous notebook, we can explicitly set `is_positive` to `True` for any span which contains an associated diagnosis next to a mention of COVID-19.

```python
TargetRule(literal="<ASSOCIATED_DIAGNOSIS> <COVID-19>", category="COVID-19",
              pattern=[
                  {"_": {"concept_tag": "ASSOCIATED_DIAGNOSIS"}, "OP": "+"},
                  {"_": {"concept_tag": "COVID-19"}, "OP": "+"},
              ],
           # Assign values to ent._.<attr_name>
           attributes={"is_positive": True},
)
```

**Note**: This was found to be one of the less precise rules and causes some false positives, but still useful in increasing sensitivity.

In [85]:
doc = nlp("Developed COVID-19 pneumonia.")

In [86]:
visualize_ent(doc)

In [87]:
for ent in doc.ents:
    print(ent._.is_positive)

True


# Next Steps
Now that we know whether individual mentions of COVID-19 are positive or not, we can see next how to roll this up to a document level to make a document prediction.

[03-document-classification.ipynb](03-document-classification.ipynb)