# PSDL: Real ICU Data Demo (MIMIC-IV Demo)

## Patient Scenario Definition Language with 100 Real ICU Patients

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Chesterguan/PSDL/blob/main/notebooks/PSDL_Colab_MIMIC_Demo.ipynb)

---

**What is PSDL?**

> *What SQL became for data queries, PSDL aims to become for clinical logic.*

This notebook demonstrates PSDL using **real de-identified ICU data** from the MIMIC-IV Demo dataset:

- **100 real ICU patients** from Beth Israel Deaconess Medical Center
- **No credentials required** - freely available demo subset
- **Real clinical outcomes** - actual AKI, sepsis diagnoses (ICD codes)

---

## Step 1: Setup Environment (~1 minute)

In [None]:
# Install PSDL from PyPI
!pip install -q psdl-lang pandas numpy matplotlib requests

print("✓ Environment ready!")

In [None]:
# Core imports
from datetime import datetime, timedelta
import pandas as pd
import numpy as np
import requests
import gzip
import io
import os

# PSDL imports - all from the installed package
from psdl import PSDLParser, PSDLEvaluator, InMemoryBackend
from psdl.examples import get_scenario, get_scenario_yaml

print("✓ Imports successful!")

## Step 2: Download MIMIC-IV Demo Data

The MIMIC-IV Demo is a publicly available subset containing 100 ICU patients.

**Source**: [PhysioNet MIMIC-IV Demo](https://physionet.org/content/mimic-iv-demo/2.2/)

In [None]:
# MIMIC-IV Demo data URLs (PhysioNet)
BASE_URL = "https://physionet.org/files/mimic-iv-demo/2.2"

# Files we need
FILES = {
    'labevents': f"{BASE_URL}/hosp/labevents.csv.gz",
    'patients': f"{BASE_URL}/hosp/patients.csv.gz",
    'admissions': f"{BASE_URL}/hosp/admissions.csv.gz",
    'd_labitems': f"{BASE_URL}/hosp/d_labitems.csv.gz",
    'diagnoses': f"{BASE_URL}/hosp/diagnoses_icd.csv.gz",
}

def download_file(url: str, name: str) -> pd.DataFrame:
    """Download and decompress a gzipped CSV file."""
    print(f"Downloading {name}...", end=" ")
    response = requests.get(url)
    if response.status_code != 200:
        print(f"\nError: Could not download {name} (status {response.status_code})")
        print("Note: MIMIC-IV Demo requires accepting the data use agreement.")
        print("Visit: https://physionet.org/content/mimic-iv-demo/2.2/")
        return None
    
    with gzip.GzipFile(fileobj=io.BytesIO(response.content)) as f:
        df = pd.read_csv(f)
    print(f"{len(df):,} rows")
    return df

# Download data
print("=== Downloading MIMIC-IV Demo Data ===\n")
labevents = download_file(FILES['labevents'], 'labevents')
patients = download_file(FILES['patients'], 'patients')
admissions = download_file(FILES['admissions'], 'admissions')
d_labitems = download_file(FILES['d_labitems'], 'd_labitems')
diagnoses = download_file(FILES['diagnoses'], 'diagnoses')

print(f"\nDownloaded data for {patients['subject_id'].nunique()} patients")

In [None]:
# Map lab item IDs to signal names
# These are the standard MIMIC-IV lab item IDs
LAB_MAPPINGS = {
    # Renal
    50912: 'Cr',       # Creatinine
    51006: 'BUN',      # Blood Urea Nitrogen
    
    # Electrolytes
    50971: 'K',        # Potassium
    50983: 'Na',       # Sodium
    
    # Hematology
    51222: 'Hgb',      # Hemoglobin
    51265: 'Plt',      # Platelet Count
    51301: 'WBC',      # White Blood Cells
    
    # Metabolic
    50813: 'Lact',     # Lactate
    50820: 'pH',       # pH
    50931: 'Glucose',  # Glucose
    50882: 'HCO3',     # Bicarbonate
}

# Filter to relevant labs only
relevant_labs = labevents[labevents['itemid'].isin(LAB_MAPPINGS.keys())].copy()
relevant_labs['signal'] = relevant_labs['itemid'].map(LAB_MAPPINGS)
relevant_labs['charttime'] = pd.to_datetime(relevant_labs['charttime'])

print(f"Filtered to {len(relevant_labs):,} relevant lab measurements")
print(f"\nLab distribution:")
print(relevant_labs['signal'].value_counts().head(10))

## Step 3: Extract Ground Truth Labels

MIMIC-IV contains ICD diagnosis codes. We'll use these to identify patients with:
- **AKI** (Acute Kidney Injury) - ICD codes N17.x
- **Sepsis** - ICD codes A41.x, R65.2x

In [None]:
# ICD-10 codes for conditions
AKI_CODES = ['N170', 'N171', 'N172', 'N178', 'N179']  # Acute kidney failure
SEPSIS_CODES = ['A410', 'A411', 'A412', 'A413', 'A414', 'A4150', 'A4151', 'A4152',
                'A4153', 'A4159', 'A418', 'A419', 'R6520', 'R6521']  # Sepsis

# Find patients with each condition
aki_patients = set(diagnoses[diagnoses['icd_code'].isin(AKI_CODES)]['subject_id'].unique())
sepsis_patients = set(diagnoses[diagnoses['icd_code'].isin(SEPSIS_CODES)]['subject_id'].unique())

print(f"=== Ground Truth from ICD Codes ===")
print(f"\nPatients with AKI diagnosis: {len(aki_patients)}")
print(f"Patients with Sepsis diagnosis: {len(sepsis_patients)}")
print(f"Total unique patients: {patients['subject_id'].nunique()}")

# Create ground truth dictionary
ground_truth = {}
for pid in patients['subject_id'].unique():
    conditions = []
    if pid in aki_patients:
        conditions.append('aki')
    if pid in sepsis_patients:
        conditions.append('sepsis')
    ground_truth[pid] = conditions if conditions else ['normal']

## Step 4: Load Data into PSDL Backend

In [None]:
# Create in-memory backend
backend = InMemoryBackend()

# Load lab data into backend
loaded = 0
for _, row in relevant_labs.iterrows():
    if pd.notna(row['valuenum']):
        backend.add_observation(
            patient_id=str(row['subject_id']),
            signal_name=row['signal'],
            value=float(row['valuenum']),
            timestamp=row['charttime'].to_pydatetime()
        )
        loaded += 1

print(f"Loaded {loaded:,} observations into PSDL backend")
print(f"Patients: {len(set(relevant_labs['subject_id']))}")

## Step 5: Load PSDL Scenarios

In [None]:
# Load scenarios using built-in examples
scenarios = {
    'AKI': get_scenario('aki_detection'),
    'Lactic Acidosis': get_scenario('lactic_acidosis'),
}

print("Loaded PSDL Scenarios:")
for name, scenario in scenarios.items():
    print(f"\n{name}:")
    print(f"  Signals: {list(scenario.signals.keys())}")
    print(f"  Logic Rules: {list(scenario.logic.keys())[:5]}...")

In [None]:
# View AKI scenario details
print("=== AKI Detection Logic ===")
print("\nTrends (temporal computations):")
for name, trend in list(scenarios['AKI'].trends.items())[:5]:
    print(f"  {name}: {trend.raw_expr}")

print("\nLogic Rules (alerts):")
for name, logic in scenarios['AKI'].logic.items():
    print(f"  {name} [{logic.severity}]: {logic.expr}")

## Step 6: Run PSDL Evaluation

In [None]:
# Get the latest timestamp for each patient (evaluation time)
patient_eval_times = relevant_labs.groupby('subject_id')['charttime'].max().to_dict()

# Evaluate all patients
results = []

for patient_id in patients['subject_id'].unique():
    pid_str = str(patient_id)
    
    # Skip if no lab data
    if patient_id not in patient_eval_times:
        continue
    
    eval_time = patient_eval_times[patient_id].to_pydatetime()
    patient_results = {
        'patient_id': patient_id,
        'ground_truth': ground_truth.get(patient_id, ['unknown']),
    }
    
    for scenario_name, scenario in scenarios.items():
        evaluator = PSDLEvaluator(scenario, backend)
        result = evaluator.evaluate_patient(pid_str, eval_time)
        
        patient_results[f'{scenario_name}_triggered'] = result.is_triggered
        patient_results[f'{scenario_name}_rules'] = result.triggered_logic if result.is_triggered else []
    
    results.append(patient_results)

df = pd.DataFrame(results)
print(f"Evaluated {len(df)} patients")

In [None]:
# Compare PSDL detection to ICD diagnoses
print("=== PSDL vs ICD Diagnosis Comparison ===\n")

# AKI Detection Performance
df['has_aki'] = df['ground_truth'].apply(lambda x: 'aki' in x)

aki_tp = df[(df['has_aki']) & (df['AKI_triggered'])].shape[0]
aki_fn = df[(df['has_aki']) & (~df['AKI_triggered'])].shape[0]
aki_fp = df[(~df['has_aki']) & (df['AKI_triggered'])].shape[0]
aki_tn = df[(~df['has_aki']) & (~df['AKI_triggered'])].shape[0]

aki_sensitivity = aki_tp / (aki_tp + aki_fn) if (aki_tp + aki_fn) > 0 else 0
aki_specificity = aki_tn / (aki_tn + aki_fp) if (aki_tn + aki_fp) > 0 else 0

print(f"AKI Detection:")
print(f"  ICD-diagnosed AKI patients: {df['has_aki'].sum()}")
print(f"  PSDL detected: {df['AKI_triggered'].sum()}")
print(f"  True Positives: {aki_tp}")
print(f"  False Negatives: {aki_fn}")
print(f"  Sensitivity: {aki_sensitivity:.1%}")
print(f"  Specificity: {aki_specificity:.1%}")

# Lactic Acidosis (Sepsis marker)
df['has_sepsis'] = df['ground_truth'].apply(lambda x: 'sepsis' in x)

sepsis_tp = df[(df['has_sepsis']) & (df['Lactic Acidosis_triggered'])].shape[0]
sepsis_fn = df[(df['has_sepsis']) & (~df['Lactic Acidosis_triggered'])].shape[0]

print(f"\nLactic Acidosis (Sepsis marker):")
print(f"  ICD-diagnosed Sepsis patients: {df['has_sepsis'].sum()}")
print(f"  PSDL Lactic Acidosis detected: {df['Lactic Acidosis_triggered'].sum()}")
print(f"  Overlap (Sepsis + Lactic Acidosis): {sepsis_tp}")

In [None]:
# Visualize Results
import matplotlib.pyplot as plt

fig, axes = plt.subplots(1, 2, figsize=(12, 5))

# Left: AKI Detection Confusion Matrix
matrix_aki = np.array([[aki_tp, aki_fn], [aki_fp, aki_tn]])
im1 = axes[0].imshow(matrix_aki, cmap='RdYlGn', aspect='auto')
axes[0].set_xticks([0, 1])
axes[0].set_yticks([0, 1])
axes[0].set_xticklabels(['PSDL+', 'PSDL-'])
axes[0].set_yticklabels(['ICD AKI+', 'ICD AKI-'])
for i in range(2):
    for j in range(2):
        axes[0].text(j, i, matrix_aki[i, j], ha='center', va='center', fontsize=16, fontweight='bold')
axes[0].set_title(f'AKI Detection\nSens: {aki_sensitivity:.0%}, Spec: {aki_specificity:.0%}', fontweight='bold')

# Right: Venn-like summary
categories = ['AKI Only\n(ICD)', 'Both', 'Detected Only\n(PSDL)', 'Neither']
values = [aki_fn, aki_tp, aki_fp, aki_tn]
colors = ['#e74c3c', '#27ae60', '#f39c12', '#95a5a6']

bars = axes[1].bar(categories, values, color=colors, edgecolor='#2c3e50')
axes[1].set_ylabel('Number of Patients')
axes[1].set_title('PSDL vs ICD Diagnosis Agreement', fontweight='bold')
for bar, val in zip(bars, values):
    axes[1].text(bar.get_x() + bar.get_width()/2, bar.get_height() + 0.5, str(val), 
                 ha='center', fontsize=12, fontweight='bold')

plt.tight_layout()
plt.suptitle('PSDL Validation on Real MIMIC-IV Demo Data', fontsize=14, fontweight='bold', y=1.02)
plt.show()

## Step 7: Detailed Patient Analysis

In [None]:
# Show patients where PSDL detected AKI
print("=== Patients with PSDL AKI Detection ===\n")

aki_detected = df[df['AKI_triggered']]
for _, row in aki_detected.head(10).iterrows():
    icd_status = "ICD+" if row['has_aki'] else "ICD-"
    rules = ', '.join(row['AKI_rules'][:3]) if row['AKI_rules'] else 'None'
    print(f"Patient {row['patient_id']}: {icd_status}")
    print(f"  PSDL Rules: {rules}")
    print()

In [None]:
# Visualize a specific patient's creatinine trajectory
def plot_patient_trajectory(patient_id: int):
    """Plot lab trajectory for a specific patient."""
    patient_labs = relevant_labs[relevant_labs['subject_id'] == patient_id]
    cr_labs = patient_labs[patient_labs['signal'] == 'Cr'].sort_values('charttime')
    
    if len(cr_labs) == 0:
        print(f"No creatinine data for patient {patient_id}")
        return
    
    fig, ax = plt.subplots(figsize=(10, 4))
    
    ax.plot(cr_labs['charttime'], cr_labs['valuenum'], 'o-', color='#3498db', linewidth=2, markersize=6)
    ax.axhline(y=4.0, color='red', linestyle='--', alpha=0.7, label='KDIGO Stage 3 (Cr >= 4.0)')
    ax.axhline(y=1.5, color='orange', linestyle='--', alpha=0.5, label='Elevated')
    
    gt = ground_truth.get(patient_id, ['unknown'])
    psdl_result = df[df['patient_id'] == patient_id]['AKI_triggered'].values
    psdl_status = "DETECTED" if psdl_result[0] if len(psdl_result) > 0 else False else "not detected"
    
    ax.set_title(f'Patient {patient_id} - Creatinine Trajectory\n'
                 f'ICD: {gt}, PSDL: {psdl_status}', fontweight='bold')
    ax.set_xlabel('Time')
    ax.set_ylabel('Creatinine (mg/dL)')
    ax.legend()
    ax.grid(True, alpha=0.3)
    plt.xticks(rotation=45)
    plt.tight_layout()
    plt.show()

# Plot a patient with AKI
if len(aki_patients) > 0:
    sample_aki_patient = list(aki_patients)[0]
    plot_patient_trajectory(sample_aki_patient)

## Key Insights

### What This Demo Shows

1. **Real Data Works**: PSDL correctly processes actual ICU lab data from MIMIC-IV
2. **ICD Alignment**: PSDL detection correlates with ICD diagnosis codes
3. **Earlier Detection**: PSDL may detect abnormalities before formal diagnosis
4. **Portable Logic**: Same PSDL scenarios work on any OMOP/FHIR data source

### Why PSDL + ICD May Differ

- **Timing**: ICD codes are assigned at discharge; PSDL detects in real-time
- **Criteria**: PSDL uses KDIGO criteria; ICD codes may use different thresholds
- **Scope**: PSDL detects lab patterns; ICD may include clinical judgment

---

**Next Steps**:
- Try with your own OMOP/FHIR data
- Customize detection thresholds
- Build real-time alerting pipelines

**Learn More**: [PSDL GitHub](https://github.com/Chesterguan/PSDL)