In [1]:
%load_ext autoreload
%autoreload 2

In [2]:
import os, sys
sys.path.append("..")

# Discharge Summary
## Overview
When the patient visited the emergency room, there was a great deal of uncertainty in the diagnosis. However, by the time the patient is discharged, it should have become clearer whether the patient had pneumonia.

The primary logic for discharge summaries is quite simple. Beneath this diagram is the set of sections used for classification.
![Discharge classification](./Discharge.png)

In [3]:
from medspacy_pna.document_classification.discharge_document_classifier import (
    RELEVANT_SECTIONS, TIER_1_SECTIONS, TIER_2_SECTIONS
)

The default schema will consider assertions of pneumonia from the **"Diagnosis"** or **"Hospital Course"** sections.

In [4]:
print(RELEVANT_SECTIONS)

{'diagnoses', 'hospital_course', 'discharge_diagnoses'}


However, the hospital course will also include information about the initial diagnosis made in the DR. While the NLP will attempt to exclude initial diagnoses, an alternate classification schema ignores the hospital course and achieves higher precision (but lower recall).

In [5]:
from medspacy_pna import build_nlp
from medspacy_pna.display import create_html
from medspacy.visualization import visualize_ent
from IPython.display import HTML

In [6]:
%%capture
nlp_discharge = build_nlp("discharge")

In [7]:
nlp_discharge.pipe_names

['tok2vec',
 'tagger',
 'parser',
 'medspacy_concept_tagger',
 'medspacy_target_matcher',
 'medspacy_context',
 'medspacy_sectionizer',
 'medspacy_postprocessor',
 'pneumonia_dischargedocumentclassifier']

## Example 1: Positive

In [8]:
text = """
Final Diagnosis:
1. Pneumonia
2. Afib
"""
doc = nlp_discharge(text)

In [9]:
HTML(create_html(doc, "discharge", document_classification=True))

In [10]:
for ent in doc.ents:
    print(ent, ent.label_, ent._.section_category)

Pneumonia PNEUMONIA discharge_diagnoses


In [11]:
print(doc._.document_classification)

POS


## Example 2: Negative 

In [12]:
text = """
Hospital Course: Initially the diagnosis included pneumonia but chest imaging
ruled out pna.
"""
doc = nlp_discharge(text)

In [13]:
HTML(create_html(doc, "discharge", document_classification=True))

In [14]:
for ent in doc.ents:
    print(ent, ent.label_, ent._.section_category)

pneumonia PNEUMONIA hospital_course
pna PNEUMONIA hospital_course


In [15]:
print(doc._.document_classification)

NEG


## Example 3: False positive in Hospital Course
In this example, the hospital course includes information about an earlier diagnosis that was later discarded. However, this will be classified as **"Positive"** by the NLP.

In [16]:
text = """
Hospital Course: Patient was admitted and treated for pneumonia. However, 
chest imaging was negative and patient was found to have CHF.

Final diagnosis: CHF
"""
doc = nlp_discharge(text)

In [17]:
HTML(create_html(doc, "discharge", document_classification=True))

In [18]:
for ent in doc.ents:
    print(ent, ent.label_, ent._.section_category)

pneumonia PNEUMONIA hospital_course


In [19]:
print(doc._.document_classification)

POS


This false positive can be avoided by using the **"diagnoses"** schema which will not included the hospital course section:

In [20]:
clf = nlp_discharge.get_pipe("pneumonia_dischargedocumentclassifier")

In [21]:
doc2 = nlp_discharge(text, disable=["pneumonia_dischargedocumentclassifier"])
clf.classify_document(doc2, classification_schema="diagnoses")

'NEG'

## Example 5: Positive only in hospital course
However, ignoring the hospital course will cause false negatives, such as in the example below. We found the tradeoff between precision and recall to be pretty much identical, so it just depends on which metric you care more about. You may also be able to stick to a more precise definition of pneumonia by ignoring the hospital course and then add diagnosis codes to try and improve recall.

In [22]:
text = """
Hospital Course: Patient tested positive for Covid-19. 
He was admitted and treated for pneumonia.

Final diagnosis: Covid-19
"""
doc = nlp_discharge(text)

In [23]:
HTML(create_html(doc, "discharge", document_classification=True))

In [24]:
for ent in doc.ents:
    print(ent, ent.label_, ent._.section_category)

pneumonia PNEUMONIA hospital_course


In [25]:
doc2 = nlp_discharge(text, disable=["pneumonia_dischargedocumentclassifier"])
clf.classify_document(doc2, classification_schema="diagnoses")

'NEG'