# Student Name
### TODO
Edit this cell and add your name to the top of the page.

In [None]:
import spacy
from IPython.display import Image

# Overview
In the last notebook, we used a statistical NLP model to extract clinical events such as problems, treatments, and tests. However, just because a report mentions a clinical concept doesn't mean that a patient actually has that concept.

Another important task in clinical NLP is **contextual analysis**, which involves looking for contextual modifiers around a concept which indicate whether a concept is:
- Negated
- Historical
- Uncertain
- Experienced by someone other than the patient (such as family history)
- Hypothetical (something that could occur in the future)

## The ConText algorithm
One method for performing this analysis is the **ConText** algorithm. This algorithm was originally proposed in this paper: [Context: An Algorithm for Determining Negation, Experiencer, and Temporal Status from Clinical Reports](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2757457/) by Harkema et al. ConText is an extension of the NegEx algorithm, which is very similar.

There are several implementations of ConText and clinical NLP systems which use ConText, including:
- [cTAKES](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2995668/)
- [Leo](https://department-of-veterans-affairs.github.io/Leo/index.html)
- [pyConText](https://github.com/chapmanbe/pyConTextNLP)

## How ConText works

ConText connects certain **modifiers**, such as **"no evidence of"** or no **"is negative"**, with the target concepts we are extracting. 

---
There is **no evidence of** **_pneumonia_**

---
In this sentence, the **target** is **_pneumonia_**: this is the clinical concept we are trying to extract. The **modifier** is **no evidence of**: this shows that the concept is **negated**. 

ConText finds these targets and modifiers in text and builds a **directed graph** between them, where the targets and modifiers are **nodes** and the edges between them show that the modifier applies to the target. Here is a visual representation of the graph ConText would create out of this sentence:


In [None]:
Image("./images/negated_pneumonia.png", width=600)

Likewise, here is another example using family history:

---
There is **_diabetes_** **on her mother's side**

---

In [None]:
Image("./images/family_history_diabetes.png", width=600)

In both of these examples, finding the clinical problem in the text is not enough: you also need to recognize that the concept is negated or in the patient's family, rather than the patient themself.

## medSpaCy
The implementation which we will use is from the **medSpaCy** project. **MedSpaCy** is an ongoing project from a group of NLP developers at the Department of Veterans Affairs (including myself). The goal of medSpaCy is to create a set of clinical NLP tools implemented in Python and using the spaCy framework. Each module in medSpaCy will be developed as a spaCy component and can be used in the same way as other spaCy models and components.

The first package to be relased is **cycontext**, which is a spaCy implementation of the Context algorithm.

https://github.com/medspacy/cycontext

# Using cycontext
We will use cycontext by adding it to our NLP pipeline, just like we did with the **EntityRuler**. First, let's load our model in which contains our NER trained to extracted clinical concepts:

In [None]:
nlp = spacy.load("en_info_3700_i2b2_2012")

In [None]:
nlp.pipe_names

Now, we'll import the `ConTextComponent` class from `cycontext`. By default, this component includes a set of rules which are used.

In [None]:
from cycontext import ConTextComponent

In [None]:
context = ConTextComponent(nlp, rules="default");

In [None]:
context

Now we'll add this to our pipeline by using the `nlp.add_pipe()` method.

### TODO
Add `context` to the NLP pipeline through the `nlp.add_pipe()` method (**hint:** we did the same thing with the EntityRuler in the last notebook).

In [None]:
nlp.____(____)

Now, when we call `nlp(text)` on clinical text, the document will also go through the context algorithm:

In [None]:
nlp.pipe_names

## A simple example
Let's go back to the example we saw in our last notebook. Our NER model correctly identified **"pneumonia"** as a **problem**, but in this sentence it is explicitly negated:

---
There is **no evidence of** **_pneumonia_**

---

Using the ConText algorithm, we can now recognize that this concept is negated. Thanks to the modular nature of spaCy processing pipelines, we don't need to do anything different:

In [None]:
doc = nlp("There is no evidence of pneumonia.")

Let's visualize what the ConText algorithm is doing. cycontext offers two functions for visualizing the algorithm. The first function, `visualize_ent`, visualizes the clinical concepts and modifiers in an NER-style visualization (similar to how we looked at the extracted entities in the previous notebook):

In [None]:
from cycontext.viz import visualize_dep, visualize_ent

In [None]:
visualize_ent(doc)

The second, `visualize_dep` one visualizes the relationships between targets and modifiers in a dependency-style visualization. Here, we can see that the modifier **"no evidence of"** is applied to the target **"pneumonia"**.

In [None]:
visualize_dep(doc)

When an entity is negated by cycontext, the negation is stored in the `ent._.is_negated` attribute. If `True`, then the concept is negated. By default, it will be `False`.

In [None]:
for ent in doc.ents:
    print(ent, ent._.is_negated)

## A slightly more complex example
In the sentence below, there are three problems in the sentence. The first two, **"pneumonia"** and **"pleural opacities"** is negated, but the third, **"PE"**, is not. Let's see if cycontext can tell the difference:

In [None]:
doc = nlp("No evidence of pneumonia or pleural opacities but he has PE.")

In [None]:
visualize_ent(doc)

In [None]:
visualize_dep(doc)

In [None]:
for ent in doc.ents:
    print(ent, "Negated:", ent._.is_negated)

Cycontext applies negation to **"pneumonia"** and **"pleural opacities"**, but stops and doesn't apply to **"PE"**.

### Discussion
How do you think cycontext does this?


## Other attributes
By default, cycontext will extract the following attributes, which are all False unless the entity is modified by a certain type of modifier:
- `ent._.is_negated`
- `ent._.is_historical`
- `ent._.is_uncertain`
- `ent._.is_family`
- `ent._.is_hypothetical`

Let's see some more examples:

### Historical

In [None]:
doc = nlp("Past medical history significant for nephrectomy.")

In [None]:
visualize_ent(doc)

In [None]:
visualize_dep(doc)

In [None]:
for ent in doc.ents:
    print(ent, "Historical:", ent._.is_historical)

### Uncertainty

In [None]:
doc = nlp("The scan likely shows a pneumothorax.")

In [None]:
visualize_ent(doc)

In [None]:
visualize_dep(doc)

In [None]:
for ent in doc.ents:
    print(ent, "Uncertain:", ent._.is_uncertain)

### Family history

In [None]:
doc = nlp("Her mother had breast cancer.")

In [None]:
visualize_ent(doc)

In [None]:
visualize_dep(doc)

In [None]:
for ent in doc.ents:
    print(ent, "Family:", ent._.is_family)

### Hypothetical

In [None]:
doc = nlp("She should stop taking warfarin if she develops a rash.")

In [None]:
visualize_ent(doc)

In [None]:
visualize_dep(doc)

In [None]:
for ent in doc.ents:
    print(ent, "Hypothetical:", ent._.is_hypothetical)

# Creating your own modifiers
We've been using the default knowledge base that comes with cycontext. However, you may want to modify or cycontext's behavior, including adding brand new concepts.

Let's replace our **context** component with a blank instance, then add our own rules.

In [None]:
blank_context = ConTextComponent(nlp, rules=None);

In [None]:
# Now replace our old context component with this
if "context" in nlp.pipe_names:
    nlp.remove_pipe("context")
nlp.add_pipe(blank_context)

Let's go back to our first example.

In [None]:
text = "There is no evidence of pneumonia."

## ConTextItem
The modifier rules in cycontext are controlled by `ConTextItem`. A ConTextItem defines what span of text to match as a modifier, how that modifier behaves, and the semantic category of the modifier. It takes these main arguments:
- **`literal`**: The exact text to match
- **`category`**: The semantic category of the modifier, such as **"NEGATED_EXISTENCE"** or **"HISTORICAL"**
- **`rule`**: Which **direction** the modifier should look in the sentence. Look back at the two images at the top of the notebook. In the first example, "There is **no evidence** of **_pneumonia_**", the modifier is **"no evidence"** and it comes before the target concept. In that case, we say it moves **"forward"** in the sentence (to the right). In the other example, "There is **_diabetes_** **on her mother's side**", the modifier comes after the target and we say it moves **backward** in the sentence. This argument in cycontext can take the following values:
    - **"BIDIRECTIONAL"** - This is the default and the modifier will apply to targets on both sides of the modifier
    - **"FORWARD"** - The modifier will modify any targets *after* the modifier
    - **"BACKWARD"** - The modifier will modify any targets *before* the modifier
    - **TERMINATE"** - Any modifiers will stop at this point, such as **"but"** in "No evidence of pneumonia or pleural opacities **but** he has PE."
- **`pattern`**: An optional spaCy pattern to match, like we saw in the pattern-matching notebooks

Let's import the `ConTextItem` from cycontext. We can also view the docstring .

In [None]:
from cycontext import ConTextItem

## Example 1: Negation
Let's create a `ConTextItem` to negate **"pneumonia"** in our first example: "There is **no evidence** of **_pneumonia_**".

### TODO
Create a `ConTextItem` with the following arguments:
- **"no evidence of"**: This will match the phrase in the text
- **"NEGATED_EXISTENCE"**: This is the semantic category
- **"FORWARD"**: The target concept comes *after* the modifier in the sentence

In [None]:
item = ConTextItem(____, category=____, rule=____)

We then add a list of ConTextItems to our context object:

In [None]:
blank_context.add([item])

In [None]:
blank_context.item_data

Now when we call `nlp` on our text, we can see that **"pneumonia"** is negated by the modifier.

In [None]:
doc = nlp("There is no evidence of pneumonia.")

In [None]:
visualize_ent(doc)

In [None]:
visualize_dep(doc)

## Example 2: Family History

Now let's identify the **family** modifier in "There is diabetes on her mother's side."

### TODO
Create a ConTextItem which will match **"mother's side"** and modify **"diabetes"**. It should have the category **"FAMILY"** and the rule should be **"BACKWARD"**.

In [None]:
doc = nlp("There is diabetes on her mother's side.")
doc.ents

In [None]:
item = ConTextItem(____, category=____, rule=____)

In [None]:
blank_context.add([item])

In [None]:
doc = nlp("There is diabetes on her mother's side.")
visualize_ent(doc)

In [None]:
visualize_dep(doc)

## Example 3: Uncertainty
In the phrase below, the physician is considering both **"pneumonia"** and **"bronchitis"** as a diagnosis. In this case, the modifier should go in both directions, not just **"forward"** or **"backward"**.

### TODO
Create a ConTextItem which matches both targets. The category should be **"POSSIBLE_EXISTENCE"** and the rule should be **"BIDIRECTIONAL"**.

In [None]:
doc = nlp("Pneumonia vs bronchitis")
doc.ents

In [None]:
item = ConTextItem(____, category=____, rule=____)

In [None]:
blank_context.add([item])

In [None]:
doc = nlp("Pneumonia vs bronchitis")
visualize_ent(doc)

In [None]:
visualize_dep(doc)

# Additional examples
Below are a number of additional texts. Each sentence has a clinical problem which is being modified by a semantic modifer. Go through each of them and process with the NLP. Identify which modifiers should be matched in the sentence and create ConTextItems to connect the modifiers with the targets.

The **category** arguments in the ConTextItems can be:
- "NEGATED_EXISTENCE"
- "POSSIBLE_EXISTENCE"
- "HISTORICAL"
- "HYPOTHETICAL"
- "FAMILY"

And the **rule** arguments can be:
- "BIDIRECTIONAL"
- "FORWARD"
- "BACKWARD"
- "TERMINATE"

In [None]:
item_data = [
    ConTextItem(____, category=____, ____),
    # etc...
]

In [None]:
blank_context.add(item_data)

In [None]:
texts = [
    "His wife recently died from end stage renal disease.",
    "Whether this is pneumonia is unknown.",
    "Pneumonia vs. bronchitis",
    "Past medical history significant for afib, CHF, and CKD.",
    "Pt's grandfather had prostate cancer.",
    "Stop taking medications if any side effects occur.",
    "The respiratory panel returned negative for influenza.",
    
]

In [None]:
text = texts[0] # Change this number to go through each of the texts

In [None]:
doc = nlp(text)

In [None]:
visualize_ent(doc)

In [None]:
visualize_dep(doc)

# Next Steps
We now have a fairly comprehensive set of tools for processing clinical text:
1. A pre-trained statistical model which can detect clinical problems, treatments, and tests
2. A rule-based matcher which can extract additional entities which are not extracted by our NER
3. ConText to detect attributes such as negation, temporality, and uncertainty

In our next notebook, we'll put all of this together to analyze an **annotated dataset** and evaluate how well our system works on MIMIC data.


[05-clinical_information_extraction.ipynb](05-clinical_information_extraction.ipynb)

## Week 12 Attendance
Save this notebook as an HTML and submit it on Canvas for credit for Week 11.