<html>
<table width="100%" cellspacing="2" cellpadding="2" border="1">
<tbody>
<tr>
<td valign="center" align="center" width="45%"><img src="../media/Univ-Utah.jpeg"><br>
</td>
    <td valign="center" align="center" width="75%">
<h1 align="center"><font size="+1">University of Utah<br>Population Health Sciences<br>Data Science Workshop</font></h1></td>
<td valign="center" align="center" width="45%"><img
src="../media/U_Health_stacked_png_red.png" alt="Utah Health
Logo" width="128" height="134"><br>
</td>
</tr>
</tbody>
</table>
<br>
</html>

In [None]:
import medspacy
from IPython.display import Image

In [None]:
from medspacy.visualization import visualize_dep, visualize_ent, MedspaCyVisualizerWidget

In [None]:
from helpers import *
import pandas as pd

In [None]:
import warnings
warnings.filterwarnings("ignore") 

In [None]:
conn = connect_to_mimic()

# Clinical notes in MIMIC

The reason that we can use MIMIC-II for education and research is that the data is **deidentified**, meaning any references to patient identifiers (like names, dates, SSNs) have been removed. Deidentification is especially important for clinical notes which refer to patients by name and discuss details of their lives and care. MIMIC-II contains a set of clinical notes which have been deidentified. In this notebook, we'll query some of these notes and get a sense of what real clinical text looks like.

## `noteevents`

In MIMIC, clinical notes are stored in the `noteevents` table. Let's select the first 10 rows.

### TODO
Query the first 10 rows of `noteevents` and save to `df`.

In [None]:
# ...
df = # ...

In [None]:
df.head()

In addition to the identifier columns like `subject_id` and `hadm_id` that we've been working with throughout this workshop, two important new columns are:
- `category`: The type of note 
- `text`: The raw text of the note

Let's explore what types of notes are stored in MIMIC.

#### TODO
Write and execute a SQL query to answer the quiz below.

In [None]:
# RUN CELL TO SEE QUIZ
quiz_note_categories

In [None]:
# ...

In this notebook, we'll focus on two note types: **discharge summaries** and **radiology reports**.

## Discharge summaries
A discharge summary is written at the end of a hospitalization and provides a detailed summary of the most important events of the hospitalization. Among other things, a discharge summary will typically contain:
- A brief histroy of the patient and what brought them to the hospital
- Results of labs, imaging, and other procedures
- A description of the course of care since being admitted
- Plans for future care and patient instructions

Let's take a look at the discharge summary for a particular hospitalization.

#### TODO
Edit the query below to pull the discharge summary for hospital admission with id `28766`.

In [None]:
query = """
SELECT text
FROM noteevents
WHERE ____ = 28766
    AND ____ = '____'
LIMIT 1
"""
disch_summ = pd.read_sql(query, conn)["text"].iloc[0]

In [None]:
print(disch_summ)

Read the note above. Does it look similar to the discharge summary we looked at earlier today? How is the note structured?

To help understand the contents of a discharge summary, we should better understand its structure.

### Clinical note sections
Clinical notes are typically broken up into different sections, with each section containing information about a different patient's course of care. Referring back to our list of what is typically in a discharge summary, here's how that information might be structured in the note:

1. History of Present Illness/Past Medical History/Family History
2. Pertinent Results/Procedures/Imaging
3. Hospital Course
4. Final Diagnosis/Discharge Medications/Discharge Instructions

Let's see how medspaCy handles sections and use that to help us read the note.

### `Sectionizer`
The `Sectionizer` component identifies section headers in the text and uses that to split up a note. The sectionizer isn't loaded by default, but we can add it to our pipeline using the `nlp.add_pipe` method.

Let's load the NLP pipeline we used in the previous notebook and add a sectionizer.

In [None]:
nlp = build_nlp_context()
print(nlp.pipe_names)

In [None]:
sectionizer = nlp.add_pipe("medspacy_sectionizer")

Now when we process a doc we can see the section headers highlighted in gray:

In [None]:
doc = nlp(disch_summ)

In [None]:
visualize_ent(doc)

In [None]:
# RUN CELL TO SEE QUIZ
quiz_pna_in_disch_summ

We can iterate through the sections with `doc._.sections`. For each section, `title_span` is the section header (typically the named of the section followed by ":") and `section_category` is the normalized category of the section.

In [None]:
for section in doc._.sections:
    print(section.title_span, section.category)
    print()

`section_span` is the entire section of the doc within that section:

In [None]:
print(doc._.sections[5].section_span)

We can also see this information for the section where an entity occurred with `ent._.section_category`, `ent._.section_title`, and `ent._.section_body`:

In [None]:
ent = doc.ents[0]
print(ent)

In [None]:
ent._.section_category

In [None]:
ent._.section_body

### Section attributes
Certain sections are associated with attributes like being historical (e.g., **"Past Medical History"**) or experienced by a family member (e.g., **"Family History"**). When medspaCy finds an entity in these sections, it sets the appropriate attributes like `is_historical` or `is_family` to `True`:

In [None]:
doc_pmh = nlp("Past Medical History: pneumonia")
ent_pmh = doc_pmh.ents[0]
visualize_ent(doc_pmh)

In [None]:
print(ent_pmh)
print("is_historical:", ent_pmh._.is_historical)

In [None]:
doc_fh = nlp("Family History: breast cancer")
ent_fh = doc_fh.ents[0]
visualize_ent(doc_pmh)

In [None]:
print(ent_fh)
print("is_family:", ent_fh._.is_family)

### Adding sections
The structure of notes differs widely across different institutions and across different clinical settings. For example, a discharge summary in the VA might be structured differently than on in the University of Utah, and any discharge summary will look very different from a chest imaging report. So it's important to customize section detection for a specific setting.

We can control section detection in medspaCy using the `SectionRule` class. This behaves just like `ContextRule` and `TargetRule` and we add it to the `medspacy_sectionzer` component: 

In [None]:
from medspacy.section_detection import SectionRule

In [None]:
# Section isn't recognized
text_procedures = "Important procedures: rij central line placement"
doc = nlp(text_procedures)
visualize_ent(doc)
print(doc._.section_categories)

In [None]:
# Add a rule to recognize this section
rule = SectionRule("Important procedures:", "procedures")
nlp.get_pipe("medspacy_sectionizer").add(rule)
doc = nlp(text_procedures)
visualize_ent(doc)
print(doc._.section_categories)

#### TODO
Update the sections below to match the **Social History** sections in the texts below and assign it the category of `social_history`. You could do this with one rule if you want to use some more advanced techniques, or do it with multiple rules.

In [None]:
social_hx_texts = [
    "Social Hx: homeless",
    "Social Factors: lives with two daughters."
]

In [None]:
rules = [
    
]

In [None]:
nlp.get_pipe("medspacy_sectionizer").add(rules)

In [None]:
for text in social_hx_texts:
    visualize_ent(nlp(text))

## Radiology Reports
The other type of note we'll look at in this class is **radiology reports**. These are narratives describing a radiologist's interpretation of an imaging procedure like a [chest x-ray (CXR)](https://www.hopkinsmedicine.org/health/treatment-tests-and-therapies/chest-xray#:~:text=What%20is%20a%20chest%20X,cause%20changes%20in%20your%20lungs.), [MRI](https://en.wikipedia.org/wiki/Magnetic_resonance_imaging) or [CT scan](https://en.wikipedia.org/wiki/CT_scan).

#### TODO
Edit the query below to query **all** radiology reports for hospital admission 28766.

In [None]:
query = """
SELECT text
FROM noteevents
WHERE hadm_id = 28766
    AND category = '____'
"""
rad_reports = pd.read_sql(query, conn)["text"]

In [None]:
rad_reports

In [None]:
# RUN CELL TO SEE QUIZ
quiz_n_rad_reports

Let's process all of the radiology reports from this hospitalization and review them. To process multiple docs with a medspaCy model, we can run:

```
docs = list(nlp.pipe(texts))
```

and a nice way to visualize multiple docs is the `MedspacyVisualizerWidget` class.

In [None]:
docs = list(nlp.pipe(rad_reports))

In [None]:
w = MedspaCyVisualizerWidget(docs)
w

Here is a summary of some of the sections in a radiology report:
- **Indication** / **Reason for Exam**: What the patient is hospitalized for and why they're undergoing the procedure
- **Technique**: Technical details about the procedure
- **Findings**: An objective study of what the images show
- **Interpretation** / **Impression**: The radiologist's interpretation of what this means for the patient's diagnosis

In [None]:
# RUN CELL TO SEE QUIZ
quiz_rad_interpretation