# SPOR NW 2025: Exploring the Potential of Electronic Medical Records

## Load some made up discharge summaries

In [1]:
import pandas as pd

In [44]:
dsum = pd.read_csv("https://raw.githubusercontent.com/centre-for-health-informatics/SPORNW-2025-EMR-Workshop/refs/heads/master/synthetic_discharge_summaries.csv")
print(dsum.shape)
dsum.head(2)

(20, 4)


Unnamed: 0,title,summary,hypertension-status,Age
0,Pneumonia (Community-Acquired),The patient is a 67-year-old female with a his...,1,67
1,Acute Cholecystitis,The patient is a 49-year-old male with a BMI o...,0,49


# Data Overview

In [43]:
dsum['hypertension-status'].value_counts()

hypertension-status
0    15
1     5
Name: count, dtype: int64

In [34]:
(dsum['hypertension-status']==1).sum()/dsum.shape[0]

0.25

In [28]:
dsum['Age'].describe()

count    20.000000
mean     45.350000
std      18.204323
min      15.000000
25%      30.500000
50%      47.000000
75%      59.000000
max      72.000000
Name: Age, dtype: float64

In [29]:
dsum['summary'].apply(len).describe()

count      20.000000
mean      721.500000
std       208.127112
min       504.000000
25%       591.000000
50%       665.500000
75%       763.250000
max      1361.000000
Name: summary, dtype: float64

In [30]:
print(dsum['summary'].iloc[0])

The patient is a 67-year-old female with a history of hypertension and type 2 diabetes mellitus. She presented with a 3-day history of fever, productive cough, and shortness of breath. Her physical exam revealed crackles in the right lower lung field. Family history includes a mother with a history of lung cancer and a father who had hypertension and coronary artery disease.

The patient was diagnosed with community-acquired pneumonia (CAP) and treated with a course of intravenous ceftriaxone and azithromycin. Her fever resolved by day three, and she demonstrated significant clinical improvement. The patient was switched to oral antibiotics on day five and discharged after a 7-day course.

She was advised to monitor her blood glucose levels closely and avoid smoking, as it may worsen her lung condition. The patient was instructed to follow up with her primary care physician within one week to ensure continued improvement.


# Determining disease status with Natural Language Processing (NLP)

## Keyword Search

In [49]:
htn_1 = dsum['summary'].str.contains('hypertension')
print(htn_1.sum())
htn_1.head()

7


0     True
1    False
2    False
3    False
4     True
Name: summary, dtype: bool

## How well does this work?
 - what metrics can we use to assess the performance of our methods

### Accuracy
- What fraction do we get correct, where 1 is perfect and 0 is perfectly wrong

In [50]:
def acc(reference_labels, our_predictions):
    ll = reference_labels == our_predictions
    return ll.sum()/ll.shape[0]

acc(dsum['hypertension-status'],htn_1)

0.6

### PPV

In [51]:
def ppv(reference_labels, our_predictions):
    ll = (reference_labels == 1) & (our_predictions == 1)
    return ll.sum()/our_predictions.sum()

ppv(dsum['hypertension-status'],htn_1)

0.2857142857142857

### Sensitivity

In [52]:
def sen(reference_labels, our_predictions):
    ll = (reference_labels == 1) & (our_predictions == 1)
    return ll.sum()/reference_labels.sum()

sen(dsum['hypertension-status'],htn_1)

0.4

### NPV

In [54]:
def npv(reference_labels, our_predictions):
    ll = (reference_labels == 0) & (our_predictions == 0)
    return ll.sum()/(~our_predictions).sum()

npv(dsum['hypertension-status'],htn_1)

0.7692307692307693

### Specificity

In [55]:
def spe(reference_labels, our_predictions):
    ll = (reference_labels == 0) & (our_predictions == 0)
    return ll.sum()/(reference_labels==0).sum()

spe(dsum['hypertension-status'],htn_1)

0.6666666666666666

### Put them all together in one performance function

In [57]:
def stats(reference_labels, our_predictions):
    print(
        f"accuracy = {acc(reference_labels, our_predictions):.2}",
        f"PPV = {ppv(reference_labels, our_predictions):.2}",
        f"Sensitivity = {sen(reference_labels, our_predictions):.2}",
        f"NPV = {npv(reference_labels, our_predictions):.2}",
        f"Specificity = {spe(reference_labels, our_predictions):.2}",
        sep='\n',
    )
stats(dsum['hypertension-status'],htn_1)

accuracy = 0.6
PPV = 0.29
Sensitivity = 0.4
NPV = 0.77
Specificity = 0.67


## Where are we getting it wrong?

### False Positives

In [64]:
def fp(reference_labels, our_predictions):
    ll = (reference_labels == 0) & (our_predictions == 1)
    n = ll.sum()
    print(n)
    if n > 0:
        print(dsum['summary'].loc[ll].iloc[0])
    
fp(dsum['hypertension-status'],htn)

5
The patient is a 58-year-old diabetic male presenting with redness, warmth, and swelling over his left lower leg. No history of hypertension. He denied any trauma or recent infection in the area. Physical exam revealed localized cellulitis without signs of abscess formation. His blood glucose levels were elevated at the time of admission, reflecting poor diabetes control.

The patient was treated with IV antibiotics (vancomycin and piperacillin-tazobactam) for 48 hours, followed by a 7-day course of oral antibiotics (cephalexin). Blood glucose levels were closely monitored, and adjustments were made to his diabetic regimen during hospitalization.

The patient was discharged with instructions to keep the leg elevated, monitor for signs of worsening infection, and follow up with his primary care physician in one week.


### False Negatives

In [62]:
def fn(reference_labels, our_predictions):
    ll = (reference_labels == 1) & (our_predictions == 0)
    n = ll.sum()
    print(n)
    if n > 0:
        print(dsum['summary'].loc[ll].iloc[0])
    
fn(dsum['hypertension-status'],htn_1)

3
The patient is a 62-year-old male with a history of gout. Hypertension present for the past 5 years. He presented with severe pain and swelling in his right great toe. His uric acid level was elevated, and joint aspiration confirmed monosodium urate crystals.

The patient was treated with colchicine and NSAIDs for symptom management, and his uric acid levels were adjusted with allopurinol. He was discharged after 48 hours of treatment, with instructions to avoid purine-rich foods and alcohol.

He was advised to follow up with his rheumatologist in one month for further management.


## Regular Expressions
https://www.rexegg.com/regex-quickstart.php

### Can We clear up our False Negatives first?

In [59]:
htn_2 = dsum['summary'].str.contains("(?i)hypertension",regex=True)
fn(dsum['hypertension-status'],htn_2)

2
The patient is a 55-year-old male with intermittent asthma. He presented with wheezing, shortness of breath, and cough, which had worsened over the past 48 hours following a viral upper respiratory tract infection. History of HTN. Family history includes a mother with asthma and a brother with allergic rhinitis.

The patient was treated with nebulized albuterol and systemic corticosteroids. He showed rapid improvement and was discharged after 24 hours with instructions to use a prescribed inhaler (albuterol) and a 5-day course of prednisone.

The patient was advised to avoid triggers such as smoke and dust and to follow up with his pediatric pulmonologist within one week.


In [66]:
htn_3 = dsum['summary'].str.contains("(?i)(?:hypertension|htn)",regex=True)
fn(dsum['hypertension-status'],htn_3)

1
The patient is a 58-year-old female  presenting with severe headache, blurred vision, and chest discomfort, suggesting a hypertensive crisis. Her blood pressure was recorded at 220/130 mmHg upon arrival.

The patient was started on IV labetalol and was closely monitored in the ICU. Her blood pressure was controlled over 48 hours, and she was transitioned to oral antihypertensives. Upon discharge, she was instructed to monitor her blood pressure at home and follow up with her cardiologist in one week.


In [67]:
htn_4 = dsum['summary'].str.contains("(?i)(?:hypertension|htn|hypertensive)",regex=True)
fn(dsum['hypertension-status'],htn_4)

0


### How do the False Positives look now after those changes?

In [68]:
fp(dsum['hypertension-status'],htn_4)

6
The patient is a 49-year-old male with a BMI of 35 and a history of gallstones, and struck by lightning 5 years ago. He presented with a 48-hour history of severe right upper quadrant pain, nausea, and vomiting. Ultrasound confirmed acute cholecystitis with no signs of gallbladder perforation or abscess formation. The patient has a known family history of gallbladder disease in his mother, who underwent cholecystectomy. His father had type 2 diabetes, and he is currently overweight with elevated blood sugar.

The patient underwent a laparoscopic cholecystectomy within 24 hours of presentation. Intraoperative findings included an inflamed but intact gallbladder with no complications. His post-operative recovery was uncomplicated, and he was started on clear liquids within 6 hours.

Discharge instructions included a low-fat diet for the next few weeks and recommendations for weight management. He is prescribed a 7-day course of oral antibiotics (ciprofloxacin) and pain management with 

## Negations

## Uncertainty

## Extracting Values

## Family history

## LLM