<html>
<table width="100%" cellspacing="2" cellpadding="2" border="1">
<tbody>
<tr>
<td valign="center" align="center" width="45%"><img src="../media/Univ-Utah.jpeg"><br>
</td>
    <td valign="center" align="center" width="75%">
<h1 align="center"><font size="+1">University of Utah<br>Population Health Sciences<br>Data Science Workshop</font></h1></td>
<td valign="center" align="center" width="45%"><img
src="../media/U_Health_stacked_png_red.png" alt="Utah Health
Logo" width="128" height="134"><br>
</td>
</tr>
</tbody>
</table>
<br>
</html>

In [None]:
from helpers import *
import medspacy
import pandas as pd

from medspacy.visualization import visualize_dep, visualize_ent, MedspaCyVisualizerWidget

In [None]:
import warnings
warnings.filterwarnings("ignore") 

# NLP Exercises
The last few notebooks stepped through the major parts of a clinical NLP system. In this notebook you'll build a complete system for identifying [pneumonia](https://en.wikipedia.org/wiki/Pneumonia) in radiology reports.

## 0. Load the data
The dataset in these examples is a set of MIMIC-II radiology reports. The annotations were created by University of Utah physician-scientist and pneumonia extraordinaire [Dr. Barbara Jones](https://healthcare.utah.edu/fad/mddetail.php?physicianID=u0102859&name=barbara-e-jones). As baseline to compare our system against we will use a system recently developed by her team for identifying misdiagnosis of pneumonia in clinical notes: [`medspacy_pna`](https://github.com/abchapman93/medspacy_pneumonia). This was system was designed for VA and University of Utah data, so it might not achieve as high of performance on MIMIC data as what is reported in the paper. Let's see if we can beat its performance!

The data is split into two sets: the **training set** and **testing set**. We'll start by developing our system with the training set before doing a final evaluation on the testing set.

### 0.3 Read in the data
Run the code below to read in the training set. The resulting dataframe will have a column for:
- The document name
- The text
- The annotator's document classification (this is the **"truth"**)
- The baseline NLP system's document classification (this is the **"prediction"**)

We'll eventually add another column with our own predictions.

In [None]:
df = read_pneumonia_data("train")

In [None]:
df.head()

## 1. Document annotation
Before building an NLP system we need to define our concepts annotate a corpus of notes to use as a reference standard. We already have an annotated corpus, so we'll review a few short examples and see how we would annotate them and then look at the reference standard annotations that we already have.

### 1.1
For this task, we will define a **"POS"** note as: 

*A note which contains a positive **or** possible mention of a term referring to pneumonia.*

Consider the following terms to be pneumonia:

- Pneumonia
- Pna
- Opacity
- Infiltrate
- Consolidation

Review the following notes and annotate each as 1 for positive or 0 for negative.

In [None]:
# RUN CELL TO SEE QUIZ
quiz_pna_annotation1

In [None]:
# RUN CELL TO SEE QUIZ
quiz_pna_annotation2

In [None]:
# RUN CELL TO SEE QUIZ
quiz_pna_annotation3

In [None]:
# RUN CELL TO SEE QUIZ
quiz_pna_annotation4

### 1.2
The true annotations can be found in `df["document_classification"]`. Answer the questions below.

In [None]:
# RUN CELL TO SEE QUIZ
quiz_num_pna_notes

In [None]:
# RUN CELL TO SEE QUIZ
quiz_num_pos_pna_notes

## 2. Build your NLP system and process texts
Now that we have some idea about what our dataset contains, let's starting building an NLP system and reviewing the output. First, build an empty NLP system. Then we'll process the notes in our dataset using our system as is (which doesn't have any rules). Go through the output and review the data. Find some examples of pneumonia that you should extract. Then go through and add rules for each of the following components as needed:

1. Add target concept rules to `target_matcher` to identify pneumonia in the text
2. Add ConText rules to `context` to improve attribute assertion
3. Optionally, add additional rules to `sectionizer` if the section logic is helpful for classifying the entities.
4. Build a document classifier which returns `0` or `1` for a doc. A simple version would just use the ConText attributes like `is_negated`, but a more complex version might also use information such as the section of the note.
5. Evaluate the system and review errors

After adding rules, reprocess your notes and review the output again. Since NLP is a computationally expensive procedure, you might want to work in batches of 10 or so before processing the whole corpus.

### 2.1: Load a model
Import `medspacy` and create an `nlp` model.

In [None]:
nlp = medspacy.load()

In [None]:
nlp.pipe_names

In [None]:
# RUN CELL TO TEST VALUE
test_load_nlp.test(nlp)

### 2.2: Add a Sectionizer
By default, medspaCy doesn't load a `Sectionizer` component, but we want to include section detection in our pipeline. Add a sectionizer to your pipeline.

In [None]:
nlp.add_pipe("medspacy_sectionizer")

In [None]:
test_load_nlp_add_sectionizer.test(nlp)

### 2.3: Process notes
The code below will process the notes in `df_train` with your NLP. If you want to use the whole dataset, set `df_train = df`. Otherwise, work in batches like `df_train = df.iloc[:10]`, `df_train = df.iloc[10:20]`, etc.

In [None]:
# df_train = df
df_train = df.iloc[:10]

In [None]:
docs = list(nlp.pipe(df_train["text"]))

In [None]:
w = MedspaCyVisualizerWidget(docs)

### 2.1 Concept extraction
Add rules to the `target_matcher` component to extract mentions of pneumonia.

In [None]:
from medspacy.target_matcher import TargetRule
target_rules = [

]
if len(target_rules) > 0:
    nlp.get_pipe("medspacy_target_matcher").add(target_rules)

### 2.2 ConText
Add any modifiers which were not captured with the default rule set.

In [None]:
from medspacy.context import ConTextRule
context_rules = [

]
if len(context_rules) > 0:
    nlp.get_pipe("medspacy_context").add(context_rules)

### 2.3 Sections
Add any section titles which were not detected and led to errors.

In [None]:
from medspacy.section_detection import Sectionizer
section_rules = [

]

if len(section_rules) > 0:
    nlp.get_pipe("medspacy_sectionizer").add(section_rules)

### 2.4: Document Classification
Write a function called `classify_pna` which takes a doc and returns a `1` if the document is positive for pneumonia and `0` if it is negative.

In [None]:
def classify_pna(doc):
    # ...

In [None]:
pred_1 = classify_pna(nlp("There is no evidence of pna"))

In [None]:
# RUN CELL TO TEST VALUE
test_classify_pna_1.test(pred_1)

In [None]:
pred_2 = classify_pna(nlp("Impression: pneumonia"))

In [None]:
# RUN CELL TO TEST VALUE
test_classify_pna_2.test(pred_2)

### 2.5: Evaluate your system on training data
After reprocessing your texts and creating `docs` with an updated NLP, run the code below to get performance metrics for your system. The function `evaluate_system` will return a DataFrame with performance characteristics for your system as well as the baseline system.

Look at the results and ask the following questions:
- What sorts of mistakes does my system appear to be making?
- Is precision or recall higher? What does that mean in the context of the research question?
- How is it comparing to the baseline NLP?

In [None]:
# Add your predictions
df_train = add_document_classifications(df_train, docs, classify_pna)

In [None]:
df_train.head()

In [None]:
results_train = evaluate_system(df_train)
results_train

### 2.6: Error Analysis
Review examples of mistakes your NLP system made. We'll subset the dataframe to look at **false positives** and **false negatives**.

First, edit the code below so that we have two different DataFrames containing errors: `fps` for false positibes and `fns` for false negatives.

In [None]:
__ = df_train.query("document_classification == 0 & nlp_document_classification == 1")

In [None]:
__ = df_train.query("document_classification == 1 & nlp_document_classification == 0")

In [None]:
fps

In [None]:
w_fps = MedspaCyVisualizerWidget(list(fps["doc"]))

In [None]:
w_fns = MedspaCyVisualizerWidget(list(fns["doc"]))

## 4. Final evaluation
Once you feel like you're ready, read in the testing data, run your NLP on it, and evaluate it. You should do this **one time** so that it is an honest evaluation of how your system will perform on new, unseen data. Once you see the final results, go through the steps we did above with the training data to understand our performance on the testing set and what sorts of errors happened. How did your final system do?

In [None]:
df_test = read_pneumonia_data("test")
docs_test = list(nlp.pipe(df_test["text"]))
df_test = add_document_classifications(df_test, docs_test, classify_pna)

In [None]:
(df_test["document_classification"] == df_test["baseline_document_classification"]).mean()

In [None]:
evaluate_system(df_test)