In [1]:
%load_ext autoreload
%autoreload 2

In [2]:
import os, sys
sys.path.append("..")

# medspaCy Pneumonia
## Overview
This repository implements a rule-based NLP pipeline for extracting diagnostic assertions of pneumonia from clinical text. The primary purpose of this system is to identify diagnostic discordance of pneumonia at different points in a hospitalization. As such, three different "flavors" of NLP are implemented in this repository, each designed for a specific type of clinical note and clinical definition:
- Emergency notes for identifying an initial diagnosis in the emergency department (ED)
- Radiology reports for identifying radiographic evidence of pneumonia from chest imaging (RAD)
- Discharge summaries for identifying a final diagnosis when a patient is discharged from the hospital (DC)

While much of the logic is common between the three models, they are customized in the following ways:
- Certain terms may be considered evidence of pneumonia in one note but not another. For example, **"infectious process"** in a chest imaging report should be considered evidence of pneumonia but this would be too vague to be used in clinical notes
- Section titles are customized for each note type
- The logic for classifying a document is specific to each note type

Additionally, in order to support generalizability, new rules can be added to customize the system to a new EHR. In our work we first developed the system in Veterans Affairs (VA) and then customized it for Universit of Utah (UU).

This work is described in detail in a manuscript that is currently being prepared for submission (PI Dr. Barbara Jones). 

These notebooks will walk through some of the core logic of this system.

0. **Quickstart**
1. **Emergency department logic**
2. **Radiology report logic**
3. **Discharge summary logic**
4. **Customizing for a new dataset**

## Quickstart
The following examples show how to load a model for a specific clinical setting, process a note, and view the output. The `build_nlp` function takes a domain name, loads a set of rules, and returns a spaCy model.

In [3]:
from medspacy_pna import build_nlp
from medspacy_pna.display import create_html
from medspacy.visualization import visualize_ent, visualize_dep
from IPython.display import HTML

## Emergency

In [4]:
%%capture
nlp_emergency = build_nlp("emergency")

Each model contains the same general pipeline components:
- Standard spaCy components for tokenizing, POS tagging, and sentence parsing
- medspaCy components for extracting entities: `medspacy_concept_tagger` and `medspacy_target_matcher`
- The `medspacy_context` component for asserting attributes such as negation and uncertainty
- `medspacy_sectionizer` for identifying sections in the clinical note
- `medspacy_postprocessor`
- A custom component unique to each note type for document classification and assigns one of three labels to the document:
    - **Positive**
    - **Possible**
    - **Negative**
    
For most of these notebooks, **"Positive"** and **"Possible"** diagnoses won't be differentiated and will both be referred to as positive notes.

Definitions of the document classifications for each domain will be explained in subsequent notebooks.


In [5]:
nlp_emergency.pipe_names

['tok2vec',
 'tagger',
 'parser',
 'medspacy_concept_tagger',
 'medspacy_target_matcher',
 'medspacy_context',
 'medspacy_sectionizer',
 'medspacy_postprocessor',
 'pneumonia_emergencydocumentclassifier']

Let's process an example emergency note. Here the provider suspects pneumonia but expresses some uncertainty, so the note is classified as **Possible**.

In [6]:
note_emergency = """
History of Present Illness: Patient presents to ED with cough and fever.

Medical decision making: Differential diagnoses include pna and CHF. 
Will order a chest x-ray to r/o pneumonia.

Assessment/Plan:
Admit for suspected community-acquired pneumonia.
"""

In [7]:
nlp_emergency.pipe_names

['tok2vec',
 'tagger',
 'parser',
 'medspacy_concept_tagger',
 'medspacy_target_matcher',
 'medspacy_context',
 'medspacy_sectionizer',
 'medspacy_postprocessor',
 'pneumonia_emergencydocumentclassifier']

In [8]:
doc_ed = nlp_emergency(note_emergency)

We can visualize the output using either medspaCy's `visualize_ent` function, which is useful for debugging and viewing details about the NLP output, or a custom `create_html` function which is slightly higher level and used for building a provider display.

In [9]:
print(doc_ed._.document_classification)
visualize_ent(doc_ed)

POSSIBLE


  from IPython.core.display import display, HTML


In [10]:
HTML(create_html(doc_ed, "emergency", document_classification=True))

## Radiology
Following our fictional example from earlier, after being admitted the patient receives a chest x-ray to confirm the provider's suspicion of pneumonia. This also has positive findings and is classified as **"Positive"**.

In [11]:
%%capture
nlp_rad = build_nlp("radiology")

In [12]:
text_rad = """
Indication: Evaluate for pneumonia.

Findings: Opacities noted in the left lower lobe. 
Possible infiltrates representing atelectasis vs pneumonia.

Impression: Most likely pneumonia.
"""

In [13]:
doc_rad = nlp_rad(text_rad)

  matches = self.matcher(doc)


In [14]:
HTML(create_html(doc_rad, "radiology", document_classification=True))

## Discharge
When the patient is eventually discharged, we see that the final diagnosis was not pneumonia. This is an example of **diagnostic discordance** and identifying this change from a **Possible** to **Negative** diagnosis is the primary purpose of this repository.

In [15]:
%%capture
nlp_dc = build_nlp("discharge")

In [16]:
text_dc = """
Hospital Course: Initially concern for pneumonia. 
After admissions findings were found to be more consistent with CHF and pneumonia was ruled out.

Final Diagnosis: CHF
"""


In [17]:
doc_dc = nlp_dc(text_dc)

In [18]:
HTML(create_html(doc_dc, "discharge", document_classification=True))

## Summary of pipeline components
Each of the pipelines have a similar design. The rest of this notebook will quickly go over some of the main components in the pipeline. More details can be found in [medspaCy's  repository](https://github.com/medspacy/medspacy).

### I. Entity Extraction
First, entities are extracted using the `ConceptTagger` and `TargetMatcher` components.

In [19]:
texts = [
    "Pneumonia",
    "pna",
    "opacities",
    "infiltrate",
    "cap"
]
for text in texts:
    for ent in nlp_emergency(text).ents:
        print(ent, ent.label_)

Pneumonia PNEUMONIA
pna PNEUMONIA
opacities OPACITY
infiltrate INFILTRATE
cap PNEUMONIA


### ConText
Once entities have been identified in the text, attributes such as negation, temporality, and uncertainty using the ConText algorithm.

In [20]:
texts = [
    "Recent diagnosis of pneumonia",
    "No opacity or evidence of pneumonia",
    "Order CXR to r/o pna",
    "No signs of infiltrate but still suspect pneumonia"
]
for text in texts:
    doc = nlp_emergency(text)
    visualize_dep(doc)
    for ent in doc.ents:
        print(ent, "is_uncertain:", ent._.is_uncertain, 
              "  is_negated:", ent._.is_negated, 
              "  is_historical:", ent._.is_historical)

  from IPython.core.display import display, HTML


pneumonia is_uncertain: False   is_negated: False   is_historical: True


  from IPython.core.display import display, HTML


opacity is_uncertain: False   is_negated: True   is_historical: False
pneumonia is_uncertain: False   is_negated: True   is_historical: False


  from IPython.core.display import display, HTML


pna is_uncertain: True   is_negated: False   is_historical: False


  from IPython.core.display import display, HTML


infiltrate is_uncertain: False   is_negated: True   is_historical: False
pneumonia is_uncertain: True   is_negated: False   is_historical: False


### Section detection
The location of a mention of pneumonia is crucial for determing whether the provider believes the patient has pneumonia. Each of the different pipelines will look for different sections to determine a classification.

In [21]:
for doc, domain in zip((doc_ed, doc_rad, doc_dc), ("Emergency:", "Radiology:", "Discharge:")):
    print(domain)
    visualize_ent(doc)
    for ent in doc.ents:
        print(ent, ent._.section_category)
    print()

Emergency:


  from IPython.core.display import display, HTML


pna medical_decision_making
pneumonia medical_decision_making
pneumonia observation_and_plan

Radiology:


  from IPython.core.display import display, HTML


pneumonia indication
Opacities impression
infiltrates impression
atelectasis impression
pneumonia impression
pneumonia impression

Discharge:


  from IPython.core.display import display, HTML


pneumonia hospital_course
pneumonia hospital_course



### Document classification
Finally, the pipeline will make a classification based on the entities and their attributes. The next few notebooks will walk through the document classification logic for each note type.