<html>
<table width="100%" cellspacing="2" cellpadding="2" border="1">
<tbody>
<tr>
<td valign="center" align="center" width="45%"><img src="../media/Univ-Utah.jpeg"><br>
</td>
    <td valign="center" align="center" width="75%">
<h1 align="center"><font size="+1">University of Utah<br>Population Health Sciences<br>Data Science Workshop</font></h1></td>
<td valign="center" align="center" width="45%"><img
src="../media/U_Health_stacked_png_red.png" alt="Utah Health
Logo" width="128" height="134"><br>
</td>
</tr>
</tbody>
</table>
<br>
</html>

In [1]:
import medspacy
from IPython.display import Image

In [2]:
from medspacy.visualization import visualize_dep, visualize_ent, MedspaCyVisualizerWidget

In [3]:
from helpers import *

In [4]:
import warnings
warnings.filterwarnings("ignore") 

# Attribute Detection
In the last notebook, we used a statistical NLP model to extract clinical events such as problems, treatments, and tests. However, just because a report mentions a clinical concept doesn't mean that a patient actually has that concept.

Another important task in clinical NLP is **attribute detection**, which involves looking for clues around a concept that indicate whether a concept is:
- Negated
- Historical
- Uncertain
- Experienced by someone other than the patient (such as family history)
- Hypothetical (something that could occur in the future)

There are two main ways for doing this: using **contextual analysis** or by identifying which **section** in a note it occurred in. We'll cover the former in this notebook and see some examples of section detection in the next notebook.

## `ConText`
### The ConText algorithm
One method for performing this analysis is the **ConText** algorithm. This algorithm was originally proposed in this paper: [Context: An Algorithm for Determining Negation, Experiencer, and Temporal Status from Clinical Reports](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2757457/) by Harkema et al. ConText is an extension of the NegEx algorithm, which is very similar.

Along with medspaCy, there are several implementations of ConText and clinical NLP systems which use ConText, including:
- [cTAKES](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2995668/)
- [Leo](https://department-of-veterans-affairs.github.io/Leo/index.html)

### How ConText works

ConText connects certain **modifiers**, such as **"no evidence of"** or no **"is negative"**, with the target concepts we are extracting. 

---
There is **no evidence of** **_pneumonia_**

---
In this sentence, the **target** is **_pneumonia_**: this is the clinical concept we are trying to extract. The **modifier** is **no evidence of**: this shows that the concept is **negated**. 

ConText finds these targets and modifiers in text and builds **relationships** between them. Here is a visual representation of the graph ConText would create out of this sentence:

<img src="./media/negated_pneumonia.png" width="75%"></img>

Likewise, here is another example using family history:

---
There is **_diabetes_** **on her mother's side**

---

<img src="./media/family_history_diabetes.png" width="75%"></img>

In both of these examples, finding the clinical problem in the text is not enough: you also need to recognize that the concept is negated or in the patient's family, rather than the patient themself.

### ConText with medspaCy
The helper function below will load a medspaCy model with some pre-defined target rules to work with, as well as `ConText`.

In [5]:
nlp = build_nlp_context()
nlp.pipe_names

['medspacy_pyrush', 'medspacy_target_matcher', 'medspacy_context']

We can access the ConText component in the pipeline:

In [6]:
context = nlp.get_pipe("medspacy_context")

In [7]:
context

<medspacy.context.context_component.ConTextComponent at 0x7f830af44730>

In [8]:
context.categories

{'FAMILY',
 'HISTORICAL',
 'HYPOTHETICAL',
 'NEGATED_EXISTENCE',
 'POSSIBLE_EXISTENCE'}

### A simple example
Let's go back to the example we saw in our last notebook. Our target matcher correctly identified **"pneumonia"** as a **problem**, but in this sentence it is explicitly negated:

---
There is **no evidence of** **_pneumonia_**

---

Using the ConText algorithm, we can now recognize that this concept is negated. Thanks to the modular nature of spaCy processing pipelines, we don't need to do anything different:

In [9]:
doc = nlp("There is no evidence of pneumonia.")

Let's visualize what the ConText algorithm is doing. MedspaCy offers two functions for visualizing the algorithm. The first function, `visualize_ent`, visualizes the clinical concepts and modifiers in an NER-style visualization, as we saw in the last notebook:

In [10]:
from medspacy.visualization import visualize_dep, visualize_ent

In [11]:
visualize_ent(doc)

The second, `visualize_dep` one visualizes the relationships between targets and modifiers in a dependency-style visualization. Here, we can see that the modifier **"no evidence of"** is applied to the target **"pneumonia"**.

In [12]:
visualize_dep(doc)

When an entity is negated by context, the negation is stored in the `ent._.is_negated` attribute. If `True`, then the concept is negated. By default, it will be `False`.

In [13]:
for ent in doc.ents:
    print(ent, ent._.is_negated)

pneumonia True


#### TODO
Each of the following sentences have an entity marked in **bold**. Which of them should have the attribute `ent._.is_negated == True`?
1. **Pneumonia** was not seen on the x-ray.
2. She was hospitalized for **pneumonia** in 2012.
3. **Pneumonia** was ruled out.
4. We will order a chest x-ray to rule out **pneumonia**.
5. While **pneumonia** is unlikely, it is still a possible diagnosis.


In [14]:
# RUN CELL TO SEE QUIZ
quiz_pneumonia_negated_select_multiple

VBox(children=(HTML(value='Select all sentences where pneumonia is negated.'), SelectMultiple(options=(1, 2, 3…



## Other attributes
By default, medspaCy will extract the following attributes, which are all False unless the entity is modified by a certain type of modifier:
- `ent._.is_negated`
- `ent._.is_historical`
- `ent._.is_uncertain`
- `ent._.is_family`
- `ent._.is_hypothetical`

Let's see some more examples:

### Historical

In [15]:
doc = nlp("Past medical history significant for nephrectomy.")

In [16]:
visualize_dep(doc)

In [17]:
for ent in doc.ents:
    print(ent, "Historical:", ent._.is_historical)

nephrectomy Historical: True


### Uncertainty

In [18]:
doc = nlp("The scan likely shows a pneumothorax.")

In [19]:
visualize_ent(doc)

In [20]:
visualize_dep(doc)

In [21]:
for ent in doc.ents:
    print(ent, "Uncertain:", ent._.is_uncertain)

pneumothorax Uncertain: True


### Family history

In [22]:
doc = nlp("Her mother had breast cancer.")

In [23]:
visualize_ent(doc)

In [24]:
visualize_dep(doc)

In [25]:
for ent in doc.ents:
    print(ent, "Family:", ent._.is_family)

breast cancer Family: True


### Hypothetical

In [26]:
doc = nlp("She should stop taking warfarin if she develops a rash.")

In [27]:
visualize_ent(doc)

In [28]:
visualize_dep(doc)

In [29]:
for ent in doc.ents:
    print(ent, "Hypothetical:", ent._.is_hypothetical)

warfarin Hypothetical: False
rash Hypothetical: True


#### TODO
For each of the examples below, choose whether the entity in bold should be negated, historical, uncertain, family experiencer, or hypothetical. If multiple attributes are true, choose all. If none of those are true, leave it blank.

In [30]:
# RUN CELL TO SEE QUIZ
quiz_context_attributes1

VBox(children=(HTML(value='He was previously <strong>homeless</strong>.'), SelectMultiple(options=('is_negated…



In [31]:
# RUN CELL TO SEE QUIZ
quiz_context_attributes2

VBox(children=(HTML(value='If you develop any <strong>bleeding</strong>, go to the ER right away.'), SelectMul…



In [32]:
# RUN CELL TO SEE QUIZ
quiz_context_attributes3

VBox(children=(HTML(value='Her father had a <strong>heart attack</strong> in 1996.'), SelectMultiple(options=(…



In [33]:
# RUN CELL TO SEE QUIZ
quiz_context_attributes4

VBox(children=(HTML(value='The patient presents with symptoms concerning for <strong>Covid-19</strong>.'), Sel…



In [34]:
# RUN CELL TO SEE QUIZ
quiz_context_attributes5

VBox(children=(HTML(value='He lives with his <strong>two daughters.</strong>'), SelectMultiple(options=('is_ne…



### Asserted entities
Sometimes we are interested in an entity only if all of the attributes above are `False`. We can refer to these as `asserted` entities since they have been asserted to be current and to exist.

One way to check if an entity is asserted in medspaCy is to use the `ent._.any_context_attributes` flag, or to look at the dictionary `ent._.context_attributes`.

In [35]:
ent_neg = nlp("There is no evidence of pneumonia.").ents[0]

# This is True, so this entity is not asserted
print(ent_neg._.any_context_attributes)
print(ent_neg._.context_attributes)

True
{'is_negated': True, 'is_historical': False, 'is_hypothetical': False, 'is_family': False, 'is_uncertain': False}


In [36]:
ent_pos = nlp("Final diagnosis: pneumonia.").ents[0]

# This is False, so this entity is asserted
print(ent_pos._.any_context_attributes)
print(ent_pos._.context_attributes)

False
{'is_negated': False, 'is_historical': False, 'is_hypothetical': False, 'is_family': False, 'is_uncertain': False}


#### TODO: Document classification
After identifying attributes for individual entities we will often make some inference about the **document**. For example, based on the entities found in a note, is the patient *positive* or *negative* for pneumonia? This is called **document classification**.

A simple schema for document classification is to say:
- **"POS"** if at least one entity is asserted
- **"NEG"** otherwise

The texts below each have a mention of Covid-19. Write a function `classify_covid` which returns **"POS"** if any of the mentions of Covid-19 are asserted and **"NEG"** otherwise.

In [39]:
texts = [
    # POS
    "The patient has Covid-19.", 
    
    # POS
    "Her husband recently came down with Covid. He is isolated and doing okay. She tested positive for SARS-COV-2 one week later.", 
    
    # NEG
    "If you test positive for SARS-COV-2, isolate according to CDC guidelines.", 
    
    # "POS"
    "She recently had a positive PCR for Covid-19.", 
    
    # NEG
    "She had symptoms which were concerning for Covid-19 but her Covid test was negative." 
]

In [38]:
for text in texts:
    visualize_ent(nlp(text))

In [41]:
doc

She should stop taking warfarin if she develops a rash.

In [44]:
def classify_covid(doc):
    for ent in doc.ents:
        if ent._.any_context_attributes is False:
            return "POS"
    return "NEG"

In [45]:
# RUN CELL TO TEST FUNCTION
test_classify_covid.test(classify_covid)

That is correct!


In [46]:
doc = nlp("Her husband recently came down with Covid. He is isolated and doing okay. She tested positive for SARS-COV-2 one week later.")

In [52]:
visualize_dep(doc)

In [51]:
ent = doc.ents[1]
ent._.any_context_attributes

False

### Creating your own modifiers
We've been using the default knowledge base that comes with context. However, you may want to modify or context's behavior, including adding brand new concepts.

Let's replace our **context** component with a blank instance, then add our own rules.

In [53]:
from medspacy.context import ConTextComponent, ConTextItem

In [54]:
nlp_blank_context = build_nlp_context(rules=False)

In [55]:
blank_context = nlp_blank_context.get_pipe("medspacy_context")
blank_context.rules # Empty list

[]

Let's go back to our first example. If we process this will our blank ConText, it won't be negated:

In [56]:
text = "There is no evidence of pneumonia."

In [57]:
doc = nlp_blank_context(text)
visualize_ent(doc)
print("is_negated:", doc.ents[0]._.is_negated)

is_negated: False


### `ConTextRule`
The modifier rules in context are controlled by `ConTextRule`. A ConTextRule defines what span of text to match as a modifier, how that modifier behaves, and the semantic category of the modifier. It takes these main arguments:
- **`literal`**: The exact text to match
- **`category`**: The semantic category of the modifier, such as **"NEGATED_EXISTENCE"** or **"HISTORICAL"**
- **`direction`**: Which **direction** the modifier should look in the sentence. Look back at the two images at the top of the notebook. In the first example, "There is **no evidence** of **_pneumonia_**", the modifier is **"no evidence"** and it comes before the target concept. In that case, we say it moves **"forward"** in the sentence (to the right). In the other example, "There is **_diabetes_** **on her mother's side**", the modifier comes after the target and we say it moves **backward** in the sentence. This argument in context can take the following values:
    - **"BIDIRECTIONAL"** - This is the default and the modifier will apply to targets on both sides of the modifier
    - **"FORWARD"** - The modifier will modify any targets *after* the modifier
    - **"BACKWARD"** - The modifier will modify any targets *before* the modifier
    - **TERMINATE"** - Any modifiers will stop at this point, such as **"but"** in "No evidence of pneumonia or pleural opacities **but** he has PE."
- **`pattern`**: An optional spaCy pattern to match, like we saw in the pattern-matching notebooks

Here are some examples of ConTextRules from the default pipeline we loaded earlier:

In [58]:
for rule in nlp.get_pipe("medspacy_context").rules[:5]:
    print(rule)
    print()

ConTextRule(literal='absence of', category='NEGATED_EXISTENCE', pattern=None, direction='FORWARD')

ConTextRule(literal='adequate to rule out', category='NEGATED_EXISTENCE', pattern=[{'LOWER': {'IN': ['adequate', 'sufficient']}}, {'LOWER': 'to'}, {'LOWER': 'rule'}, {'LOWER': {'IN': ['him', 'her', 'them', 'patient', 'pt']}, 'OP': '?'}, {'LOWER': 'out'}, {'LOWER': {'IN': ['against', 'for']}, 'OP': '?'}], direction='FORWARD')

ConTextRule(literal='adequate to rule the patient out', category='NEGATED_EXISTENCE', pattern=[{'LOWER': {'IN': ['adequate', 'sufficient']}}, {'LOWER': 'to'}, {'LOWER': 'rule'}, {'LOWER': 'the'}, {'LOWER': {'IN': ['patient', 'pt']}}, {'LOWER': 'out'}, {'LOWER': {'IN': ['against', 'for']}, 'OP': '?'}], direction='FORWARD')

ConTextRule(literal='any other', category='NEGATED_EXISTENCE', pattern=None, direction='FORWARD')

ConTextRule(literal='apart from', category='NEGATED_EXISTENCE', pattern=[{'LOWER': 'apart'}, {'LOWER': {'IN': ['for', 'from']}}], direction='TERMINA

Let's import the `ConTextRule` from medspaCy and go through some examples.

In [59]:
from medspacy.context import ConTextRule

### Example 1: Negation
Let's create a `ConTextRule` to negate **"pneumonia"** in our first example: "There is **no evidence** of **_pneumonia_**".

#### TODO
Create a `ConTextRule` with the following arguments:
- **"no evidence of"**: This will match the phrase in the text
- **"NEGATED_EXISTENCE"**: This is the semantic category
- **"FORWARD"**: The target concept comes *after* the modifier in the sentence

In [60]:
rule = ConTextRule("no evidence of", category="NEGATED_EXISTENCE", direction="FORWARD")

We then add a list of ConTextItems to our context object:

In [61]:
blank_context.add([rule])

In [62]:
blank_context.rules

[ConTextRule(literal='no evidence of', category='NEGATED_EXISTENCE', pattern=None, direction='FORWARD')]

Now when we call `nlp` on our text, we can see that **"pneumonia"** is negated by the modifier.

In [63]:
doc = nlp_blank_context("There is no evidence of pneumonia.")

In [64]:
doc = nlp_blank_context(text)
visualize_ent(doc)
print("is_negated:", doc.ents[0]._.is_negated)

is_negated: True


In [65]:
visualize_dep(doc)

## Example 2: Family History

Now let's identify the **family** modifier in "There is diabetes on her mother's side."

### TODO
Create a ConTextRule which will match **"mother's side"** and modify **"diabetes"**. It should have the category **"FAMILY"** and the rule should be **"BACKWARD"**.

In [66]:
doc = nlp_blank_context("There is diabetes on her mother's side.")
doc.ents

(diabetes,)

In [67]:
rule = ConTextRule("mother's side", category="FAMILY", rule="BACKWARD")

In [68]:
blank_context.add([rule])

In [69]:
doc = nlp_blank_context("There is diabetes on her mother's side.")
visualize_ent(doc)

In [70]:
visualize_dep(doc)

## Example 3: Uncertainty
In the phrase below, the physician is considering both **"pneumonia"** and **"bronchitis"** as a diagnosis. In this case, the modifier should go in both directions, not just **"forward"** or **"backward"**.

### TODO
Create a ConTextRule which matches both targets. The category should be **"POSSIBLE_EXISTENCE"** and the rule should be **"BIDIRECTIONAL"**.

In [None]:
doc = nlp_blank_context("Pneumonia vs bronchitis")
doc.ents

In [71]:
rule = ConTextRule("vs", "POSSIBLE_EXISTENCE", rule="BIDIRECTIONAL")

In [72]:
blank_context.add([rule])

In [73]:
doc = nlp_blank_context("Pneumonia vs bronchitis")
visualize_ent(doc)

In [74]:
visualize_dep(doc)

# Additional examples
Below are a number of additional texts. Go through each of them and process with the NLP. Identify which modifiers should be matched in the sentence and create ConTextItems to connect the modifiers with the targets.

The **category** arguments in the ConTextRules can be:
- "NEGATED_EXISTENCE"
- "POSSIBLE_EXISTENCE"
- "HISTORICAL"
- "HYPOTHETICAL"
- "FAMILY"

And the **rule** arguments can be:
- "BIDIRECTIONAL"
- "FORWARD"
- "BACKWARD"
- "TERMINATE"

You may also need to add additional target rules to identify all of the entities.

In [None]:
# RUN CELL TO SEE HINT
hint_custom_context

In [None]:
from medspacy.target_matcher import TargetRule
target_rules = [

]
nlp_blank_context.get_pipe("medspacy_target_matcher").add(target_rules)

In [None]:
context_rules = [
    
]

In [None]:
blank_context.add(context_rules)

In [None]:
texts = [
    "His wife recently died from end stage renal disease.",
    "Whether this is pneumonia is unknown.",
    "Pneumonia vs. bronchitis",
    "Past medical history significant for afib, CHF, and CKD.",
    "Pt's grandfather had prostate cancer.",
    "Stop taking medications if any side effects occur.",
    "The respiratory panel returned negative for influenza.",
    
]

In [None]:
docs = list(nlp_blank_context.pipe(texts))

In [None]:
w = MedspaCyVisualizerWidget(docs)

In [None]:
for doc in docs:
    visualize_ent(doc)