<html>
<table width="100%" cellspacing="2" cellpadding="2" border="1">
<tbody>
<tr>
<td valign="center" align="center" width="45%"><img src="../media/Univ-Utah.jpeg"><br>
</td>
    <td valign="center" align="center" width="75%">
<h1 align="center"><font size="+1">University of Utah<br>Population Health Sciences<br>Data Science Workshop</font></h1></td>
<td valign="center" align="center" width="45%"><img
src="../media/U_Health_stacked_png_red.png" alt="Utah Health
Logo" width="128" height="134"><br>
</td>
</tr>
</tbody>
</table>
<br>
</html>

In [1]:
import medspacy
from IPython.display import Image

In [2]:
from medspacy.visualization import visualize_dep, visualize_ent, MedspaCyVisualizerWidget

In [3]:
from helpers import *
import pandas as pd

In [4]:
import warnings
warnings.filterwarnings("ignore") 

In [5]:
conn = connect_to_mimic()

Enter password for MIMIC2 database········


# Clinical notes in MIMIC

The reason that we can use MIMIC-II for education and research is that the data is **deidentified**, meaning any references to patient identifiers (like names, dates, SSNs) have been removed. Deidentification is especially important for clinical notes which refer to patients by name and discuss details of their lives and care. MIMIC-II contains a set of clinical notes which have been deidentified. In this notebook, we'll query some of these notes and get a sense of what real clinical text looks like.

## `noteevents`

In MIMIC, clinical notes are stored in the `noteevents` table. Let's select the first 10 rows.

### TODO
Query the first 10 rows of `noteevents` and save to `df`.

In [6]:
query = """
SELECT *
FROM noteevents
LIMIT 10
"""
df = pd.read_sql(query, conn)

In [7]:
df.head()

Unnamed: 0,subject_id,hadm_id,icustay_id,elemid,charttime,realtime,cgid,correction,cuid,category,title,text,exam_name,patient_info
0,56,28766,,,2644-01-17 00:00:00,,,,,DISCHARGE_SUMMARY,,\n \n \n \nAdmission Date: [**2644-1-17**] ...,,
1,56,28766,,,2644-01-17 00:00:00,,,,,RADIOLOGY_REPORT,,\n\n\n DATE: [**2644-1-17**] 10:53 AM\n ...,,
2,56,28766,,,2644-01-17 00:00:00,,,,,RADIOLOGY_REPORT,,\n\n\n DATE: [**2644-1-17**] 10:53 AM\n ...,,
3,56,28766,,,2644-01-17 00:00:00,,,,,RADIOLOGY_REPORT,,\n\n\n DATE: [**2644-1-17**] 10:43 AM\n ...,,
4,56,28766,,,2644-01-17 00:00:00,,,,,RADIOLOGY_REPORT,,\n\n\n DATE: [**2644-1-17**] 6:37 AM\n ...,,


In addition to the identifier columns like `subject_id` and `hadm_id` that we've been working with throughout this workshop, two important new columns are:
- `category`: The type of note 
- `text`: The raw text of the note

Let's explore what types of notes are stored in MIMIC.

#### TODO
Write and execute a SQL query to answer the quiz below.

In [8]:
# RUN CELL TO SEE QUIZ
quiz_note_categories

VBox(children=(HTML(value='Which of the following note types are stored in MIMIC?.'), SelectMultiple(options=(…



In [9]:
query = """
SELECT DISTINCT category FROM noteevents;
"""
pd.read_sql(query, conn)

Unnamed: 0,category
0,DISCHARGE_SUMMARY
1,MD Notes
2,Nursing/Other
3,RADIOLOGY_REPORT


In this notebook, we'll focus on two note types: **discharge summaries** and **radiology reports**.

## Discharge summaries
A discharge summary is written at the end of a hospitalization and provides a detailed summary of the most important events of the hospitalization. Among other things, a discharge summary will typically contain:
- A brief histroy of the patient and what brought them to the hospital
- Results of labs, imaging, and other procedures
- A description of the course of care since being admitted
- Plans for future care and patient instructions

Let's take a look at the discharge summary for a particular hospitalization.

#### TODO
Edit the query below to pull the discharge summary for hospital admission with id `28766`.

In [10]:
query = """
SELECT text
FROM noteevents
WHERE hadm_id = 28766
    AND category = 'DISCHARGE_SUMMARY'
LIMIT 1
"""
disch_summ = pd.read_sql(query, conn)["text"].iloc[0]

In [11]:
print(disch_summ)


 
 
 
Admission Date:  [**2644-1-17**]              Discharge Date:   [**2644-1-23**]
 
Date of Birth:  [**2553-5-26**]             Sex:   F
 
Service: MEDICINE
 
Allergies: 
Patient recorded as having No Known Allergies to Drugs
 
Attending:[**First Name3 (LF) 2775**] 
Chief Complaint:
fall
 
Major Surgical or Invasive Procedure:
n/a

 
History of Present Illness:
NF admit seen and appreciated.  Briefly this is a [**Age over 90 **] yo f w/ h/o 
lung ca w/ metastasis to brain on XRT, s/p fall at NH on Sun 
eve. No LOC.  [**1-24**] lethargy, n/v,L arm weakness, sent to [**Hospital1 **] 
[**Location (un) 579**] where she was noted to havv 3x3cm R posterior parietal 
bleed.  Tx to NSICU, started on dilantin (load) and decadron.  
Deemed not a surgical candidate -> DNR/DNI.
Noted to be guiac pos, +hct drop, coffee grounds on NGL.  GI 
recommended transfusing to maintain hct.
 
Past Medical History:
Lung ca w/ mets to brain
high cholesterol
HTN
CAD
Lopresser 25mg tid
 
Social History:
live

Read the note above. Does it look similar to the discharge summary we looked at earlier today? How is the note structured?

To help understand the contents of a discharge summary, we should better understand its structure.

### Clinical note sections
Clinical notes are typically broken up into different sections, with each section containing information about a different patient's course of care. Referring back to our list of what is typically in a discharge summary, here's how that information might be structured in the note:

1. History of Present Illness/Past Medical History/Family History
2. Pertinent Results/Procedures/Imaging
3. Hospital Course
4. Final Diagnosis/Discharge Medications/Discharge Instructions

Let's see how medspaCy handles sections and use that to help us read the note.

### `Sectionizer`
The `Sectionizer` component identifies section headers in the text and uses that to split up a note. The sectionizer isn't loaded by default, but we can add it to our pipeline using the `nlp.add_pipe` method.

Let's load the NLP pipeline we used in the previous notebook and add a sectionizer.

In [15]:
nlp = build_nlp_context()
print(nlp.pipe_names)

['medspacy_pyrush', 'medspacy_target_matcher', 'medspacy_context']


In [16]:
sectionizer = nlp.add_pipe("medspacy_sectionizer")

Now when we process a doc we can see the section headers highlighted in gray:

In [17]:
doc = nlp(disch_summ)

In [18]:
visualize_ent(doc)

In [19]:
# RUN CELL TO SEE QUIZ
quiz_pna_in_disch_summ

VBox(children=(HTML(value='<h4>TODO</h4>\nThe doc above should have an entity of pneumonia highlighted. What s…



We can iterate through the sections with `doc._.sections`. For each section, `title_span` is the section header (typically the named of the section followed by ":") and `section_category` is the normalized category of the section.

In [20]:
for section in doc._.sections:
    print(section.title_span, section.category)
    print()

 None

Service: other

Allergies: allergies

Allergies allergies

Chief Complaint: chief_complaint

History of Present Illness: history_of_present_illness

Past Medical History: past_medical_history

Social History: social_history

Physical Exam: physical_exam

Pertinent Results: labs_and_studies

MRI: imaging

Brief Hospital Course: hospital_course

Consult history_of_present_illness

Medications on Admission: medications

Discharge Medications: medications

Discharge Disposition: observation_and_plan

Facility: other

Discharge Diagnosis: observation_and_plan

Discharge Condition: observation_and_plan

Discharge Instructions: patient_instructions

Followup Instructions: patient_instructions

Signed electronically by: signature



`section_span` is the entire section of the doc within that section:

In [21]:
print(doc._.sections[5].section_span)

History of Present Illness:
NF admit seen and appreciated.  Briefly this is a [**Age over 90 **] yo f w/ h/o 
lung ca w/ metastasis to brain on XRT, s/p fall at NH on Sun 
eve. No LOC.  [**1-24**] lethargy, n/v,L arm weakness, sent to [**Hospital1 **] 
[**Location (un) 579**] where she was noted to havv 3x3cm R posterior parietal 
bleed.  Tx to NSICU, started on dilantin (load) and decadron.  
Deemed not a surgical candidate -> DNR/DNI.
Noted to be guiac pos, +hct drop, coffee grounds on NGL.  GI 
recommended transfusing to maintain hct.
 



We can also see this information for the section where an entity occurred with `ent._.section_category`, `ent._.section_title`, and `ent._.section_body`:

In [22]:
ent = doc.ents[0]
print(ent)

pneumonia


In [23]:
ent._.section_category

'imaging'

In [24]:
ent._.section_body

On diffusion-weighted images there is a small area of 
restricted diffusion along the falx within the left 
occipitotemporal lobe. It is also bright on FLAIR-weighted 
images and may represent a subacute infarct. Clinical 
correlation is recommended. On gradient echo images there is a 
large area of intraparenchymal hemorrhage within the right 
parietal lobe and left thalamus which following administration 
of gadolinium reveals ring-enhancing lesions. These are 
suspicious for hemorrhagic metastases given the patient's 
history. Additional ring-enhancing lesions throughout the supra- 
and infratentorial compartments are visualized. There is a 
moderate amount of peritumoral edema involving the right 
parietal lobe lesion in addition to a second right parietal 
lesion along the falx high in the vertex. The other areas of 
metastases reveal a minimal amount of peritumoral edema.
.
cxr: Cardiomegaly and mild CHF. Nasogastric tube as described 
above. Rounded opacity overlying the left hi

### Section attributes
Certain sections are associated with attributes like being historical (e.g., **"Past Medical History"**) or experienced by a family member (e.g., **"Family History"**). When medspaCy finds an entity in these sections, it sets the appropriate attributes like `is_historical` or `is_family` to `True`:

In [25]:
doc_pmh = nlp("Past Medical History: pneumonia")
ent_pmh = doc_pmh.ents[0]
visualize_ent(doc_pmh)

In [26]:
print(ent_pmh)
print("is_historical:", ent_pmh._.is_historical)

pneumonia
is_historical: True


In [27]:
doc_fh = nlp("Family History: breast cancer")
ent_fh = doc_fh.ents[0]
visualize_ent(doc_pmh)

In [28]:
print(ent_fh)
print("is_family:", ent_fh._.is_family)

breast cancer
is_family: True


### Adding sections
The structure of notes differs widely across different institutions and across different clinical settings. For example, a discharge summary in the VA might be structured differently than on in the University of Utah, and any discharge summary will look very different from a chest imaging report. So it's important to customize section detection for a specific setting.

We can control section detection in medspaCy using the `SectionRule` class. This behaves just like `ContextRule` and `TargetRule` and we add it to the `medspacy_sectionzer` component: 

In [29]:
from medspacy.section_detection import SectionRule

In [30]:
# Section isn't recognized
text_procedures = "Important procedures: rij central line placement"
doc = nlp(text_procedures)
visualize_ent(doc)
print(doc._.section_categories)

[None]


In [31]:
# Add a rule to recognize this section
rule = SectionRule("Important procedures:", "procedures")
nlp.get_pipe("medspacy_sectionizer").add(rule)
doc = nlp(text_procedures)
visualize_ent(doc)
print(doc._.section_categories)

['procedures']


#### TODO
Update the sections below to match the **Social History** sections in the texts below and assign it the category of `social_history`. You could do this with one rule if you want to use some more advanced techniques, or do it with multiple rules.

In [32]:
social_hx_texts = [
    "Social Hx: homeless",
    "Social Factors: lives with two daughters."
]

In [33]:
rules = [
    SectionRule("Social History", "SOCIAL_HISTORY",
               pattern=[
                   {"LOWER": "social"},
                   {"LOWER": {"IN": ["hx", "factors", "history"]}},
                   {"LOWER": ":"}
               ])
]

In [34]:
nlp.get_pipe("medspacy_sectionizer").add(rules)

In [35]:
for text in social_hx_texts:
    visualize_ent(nlp(text))

## Radiology Reports
The other type of note we'll look at in this class is **radiology reports**. These are narratives describing a radiologist's interpretation of an imaging procedure like a [chest x-ray (CXR)](https://www.hopkinsmedicine.org/health/treatment-tests-and-therapies/chest-xray#:~:text=What%20is%20a%20chest%20X,cause%20changes%20in%20your%20lungs.), [MRI](https://en.wikipedia.org/wiki/Magnetic_resonance_imaging) or [CT scan](https://en.wikipedia.org/wiki/CT_scan).

#### TODO
Edit the query below to query **all** radiology reports for hospital admission 28766.

In [41]:
query = """
SELECT text
FROM noteevents
WHERE hadm_id = 28766
    AND category = 'RADIOLOGY_REPORT'
"""
rad_reports = pd.read_sql(query, conn)["text"]

In [42]:
rad_reports

0    \n\n\n     DATE: [**2644-1-17**] 10:53 AM\n   ...
1    \n\n\n     DATE: [**2644-1-17**] 10:53 AM\n   ...
2    \n\n\n     DATE: [**2644-1-17**] 10:43 AM\n   ...
3    \n\n\n     DATE: [**2644-1-17**] 6:37 AM\n    ...
4    \n\n\n     DATE: [**2644-1-19**] 12:09 PM\n   ...
Name: text, dtype: object

In [38]:
# RUN CELL TO SEE QUIZ
quiz_n_rad_reports

VBox(children=(HTML(value='How many radiology reports did this hospitalization have?'), Textarea(value='', pla…



Let's process all of the radiology reports from this hospitalization and review them. To process multiple docs with a medspaCy model, we can run:

```
docs = list(nlp.pipe(texts))
```

and a nice way to visualize multiple docs is the `MedspacyVisualizerWidget` class.

In [43]:
docs = list(nlp.pipe(rad_reports))

In [44]:
w = MedspaCyVisualizerWidget(docs)
w

Box(children=(HBox(children=(RadioButtons(options=('Ent', 'Dep', 'Both'), value='Ent'), Button(description='Pr…

<medspacy.visualization.MedspaCyVisualizerWidget at 0x7f83b2fbeb80>

Here is a summary of some of the sections in a radiology report:
- **Indication** / **Reason for Exam**: What the patient is hospitalized for and why they're undergoing the procedure
- **Technique**: Technical details about the procedure
- **Findings**: An objective study of what the images show
- **Interpretation** / **Impression**: The radiologist's interpretation of what this means for the patient's diagnosis

In [None]:
# RUN CELL TO SEE QUIZ
quiz_rad_interpretation