# PSDL: Try It in 2 Minutes!

## Patient Scenario Definition Language - Interactive Demo

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Chesterguan/PSDL/blob/main/notebooks/PSDL_Colab_Synthea.ipynb)

---

**What is PSDL?**

> *What SQL became for data queries, PSDL aims to become for clinical logic.*

PSDL is an open, vendor-neutral standard for expressing clinical detection scenarios. This notebook demonstrates:

1. **Zero Setup** - Everything runs in your browser
2. **Synthetic Patients** - Using Synthea-style generated data
3. **Real Clinical Scenarios** - AKI, Hyperkalemia, Sepsis detection
4. **Instant Results** - See PSDL in action in under 2 minutes

---

## Step 1: Install PSDL (30 seconds)

In [None]:
# Install PSDL from PyPI
!pip install -q psdl-lang

print("✓ PSDL installed!")

In [None]:
# Imports
from datetime import datetime, timedelta
import random
import pandas as pd
import numpy as np

# PSDL imports - all from the installed package
from psdl import PSDLParser, PSDLEvaluator, InMemoryBackend
from psdl.examples import get_scenario, list_scenarios, get_scenario_yaml

print("✓ PSDL Ready!")
print(f"\nAvailable built-in scenarios: {list_scenarios()}")

## Step 2: Generate Synthetic Patient Data

We'll create realistic ICU patient data with:
- **Normal patients** - stable labs
- **AKI patients** - rising creatinine
- **Hyperkalemia patients** - elevated potassium
- **Sepsis patients** - elevated lactate + WBC

In [None]:
def generate_patient_data(patient_id: str, condition: str, base_time: datetime) -> dict:
    """
    Generate realistic time-series lab data for a patient.
    
    Conditions:
    - 'normal': Stable labs within normal ranges
    - 'aki': Rising creatinine (acute kidney injury)
    - 'hyperkalemia': Elevated potassium
    - 'sepsis': Elevated lactate and WBC
    """
    data = {'patient_id': patient_id, 'condition': condition, 'observations': []}
    
    # Generate 24 hours of data (hourly measurements)
    for hour in range(24):
        timestamp = base_time - timedelta(hours=23-hour)
        
        if condition == 'normal':
            cr = random.gauss(1.0, 0.1)  # Normal creatinine: ~1.0
            k = random.gauss(4.0, 0.2)   # Normal potassium: ~4.0
            lactate = random.gauss(1.0, 0.2)  # Normal lactate: ~1.0
            wbc = random.gauss(8.0, 1.0)  # Normal WBC: ~8.0
            
        elif condition == 'aki':
            # Creatinine rises from 1.0 to 4.5 over 24h
            cr = 1.0 + (hour / 24) * 3.5 + random.gauss(0, 0.1)
            k = random.gauss(4.5, 0.3)  # Slightly elevated K (common in AKI)
            lactate = random.gauss(1.5, 0.3)
            wbc = random.gauss(10.0, 1.5)
            
        elif condition == 'hyperkalemia':
            cr = random.gauss(1.5, 0.2)
            # Potassium rises from 4.5 to 7.0
            k = 4.5 + (hour / 24) * 2.5 + random.gauss(0, 0.1)
            lactate = random.gauss(1.2, 0.2)
            wbc = random.gauss(9.0, 1.0)
            
        elif condition == 'sepsis':
            cr = random.gauss(1.8, 0.3)  # Mild AKI common in sepsis
            k = random.gauss(4.2, 0.3)
            # Lactate rises from 1.5 to 5.0
            lactate = 1.5 + (hour / 24) * 3.5 + random.gauss(0, 0.2)
            # WBC rises from 10 to 18
            wbc = 10.0 + (hour / 24) * 8.0 + random.gauss(0, 1.0)
        
        # Add observations
        data['observations'].extend([
            {'signal': 'Cr', 'value': max(0.3, cr), 'timestamp': timestamp},
            {'signal': 'K', 'value': max(2.5, k), 'timestamp': timestamp},
            {'signal': 'Lact', 'value': max(0.5, lactate), 'timestamp': timestamp},
            {'signal': 'WBC', 'value': max(1.0, wbc), 'timestamp': timestamp},
        ])
    
    return data

# Generate cohort: 20 patients with different conditions
random.seed(42)  # Reproducibility
base_time = datetime(2024, 1, 15, 12, 0, 0)

patients = []
ground_truth = {}  # What we know about each patient

# 5 normal patients
for i in range(5):
    p = generate_patient_data(f'P{i+1:03d}', 'normal', base_time)
    patients.append(p)
    ground_truth[p['patient_id']] = 'normal'

# 5 AKI patients
for i in range(5, 10):
    p = generate_patient_data(f'P{i+1:03d}', 'aki', base_time)
    patients.append(p)
    ground_truth[p['patient_id']] = 'aki'

# 5 hyperkalemia patients
for i in range(10, 15):
    p = generate_patient_data(f'P{i+1:03d}', 'hyperkalemia', base_time)
    patients.append(p)
    ground_truth[p['patient_id']] = 'hyperkalemia'

# 5 sepsis patients
for i in range(15, 20):
    p = generate_patient_data(f'P{i+1:03d}', 'sepsis', base_time)
    patients.append(p)
    ground_truth[p['patient_id']] = 'sepsis'

print(f"Generated {len(patients)} synthetic patients:")
print(f"  - 5 Normal (stable labs)")
print(f"  - 5 AKI (rising creatinine)")
print(f"  - 5 Hyperkalemia (elevated potassium)")
print(f"  - 5 Sepsis (elevated lactate + WBC)")
print(f"\nTotal observations: {sum(len(p['observations']) for p in patients):,}")

In [None]:
# Visualize sample patient trajectories
import matplotlib.pyplot as plt

fig, axes = plt.subplots(2, 2, figsize=(12, 8))

# Sample one patient from each condition
sample_patients = {
    'Normal': patients[0],
    'AKI': patients[5],
    'Hyperkalemia': patients[10],
    'Sepsis': patients[15]
}

signal_map = {'Normal': 'Cr', 'AKI': 'Cr', 'Hyperkalemia': 'K', 'Sepsis': 'Lact'}
thresholds = {'Cr': 4.0, 'K': 6.0, 'Lact': 4.0}
colors = {'Normal': '#27ae60', 'AKI': '#e74c3c', 'Hyperkalemia': '#9b59b6', 'Sepsis': '#e67e22'}

for ax, (name, patient) in zip(axes.flat, sample_patients.items()):
    signal = signal_map[name]
    obs = [o for o in patient['observations'] if o['signal'] == signal]
    hours = list(range(len(obs)))
    values = [o['value'] for o in obs]
    
    ax.plot(hours, values, color=colors[name], linewidth=2, marker='o', markersize=3)
    ax.axhline(y=thresholds[signal], color='red', linestyle='--', alpha=0.5, label=f'Threshold ({signal})')
    ax.set_title(f'{name} Patient ({patient["patient_id"]})', fontweight='bold')
    ax.set_xlabel('Hours')
    ax.set_ylabel(f'{signal} Value')
    ax.legend()
    ax.grid(True, alpha=0.3)

plt.tight_layout()
plt.suptitle('Synthetic Patient Lab Trajectories', fontsize=14, fontweight='bold', y=1.02)
plt.show()

## Step 3: Load PSDL Scenarios

PSDL scenarios are defined in YAML - human-readable, version-controllable, and portable across systems.

In [None]:
# View the AKI detection scenario (built into the package)
print(get_scenario_yaml("aki_detection")[:2000])

In [None]:
# Load scenarios using built-in examples
scenarios = {
    'AKI': get_scenario('aki_detection'),
    'Hyperkalemia': get_scenario('hyperkalemia_detection'),
    'Lactic Acidosis': get_scenario('lactic_acidosis'),
}

print("Loaded PSDL Scenarios:")
for name, scenario in scenarios.items():
    print(f"\n{name}:")
    print(f"  Signals: {list(scenario.signals.keys())}")
    print(f"  Trends: {len(scenario.trends)}")
    print(f"  Logic Rules: {len(scenario.logic)}")

## Step 4: Run PSDL Evaluation

Now let's evaluate all patients against our scenarios and compare with ground truth!

In [None]:
# Create in-memory backend with our synthetic data
backend = InMemoryBackend()

# Load patient data into backend
for patient in patients:
    for obs in patient['observations']:
        backend.add_observation(
            patient_id=patient['patient_id'],
            signal_name=obs['signal'],
            value=obs['value'],
            timestamp=obs['timestamp']
        )

print(f"Loaded {backend.observation_count()} observations into memory backend")

In [None]:
# Evaluate all patients against all scenarios
results = []

for patient in patients:
    patient_id = patient['patient_id']
    patient_results = {'patient_id': patient_id, 'ground_truth': ground_truth[patient_id]}
    
    for scenario_name, scenario in scenarios.items():
        evaluator = PSDLEvaluator(scenario, backend)
        result = evaluator.evaluate_patient(patient_id, base_time)
        
        patient_results[f'{scenario_name}_triggered'] = result.is_triggered
        patient_results[f'{scenario_name}_rules'] = result.triggered_logic if result.is_triggered else []
    
    results.append(patient_results)

# Convert to DataFrame
df = pd.DataFrame(results)
print("=== PSDL Evaluation Results ===\n")
print(df[['patient_id', 'ground_truth', 'AKI_triggered', 'Hyperkalemia_triggered', 'Lactic Acidosis_triggered']].to_string(index=False))

In [None]:
# Calculate detection metrics
print("=== Detection Performance ===\n")

# AKI Detection
aki_patients = df[df['ground_truth'] == 'aki']
aki_detected = aki_patients['AKI_triggered'].sum()
print(f"AKI Detection:")
print(f"  True Positives: {aki_detected}/5 ({aki_detected/5*100:.0f}%)")
print(f"  False Positives: {df[(df['ground_truth'] != 'aki') & (df['AKI_triggered'])].shape[0]}")

# Hyperkalemia Detection
hk_patients = df[df['ground_truth'] == 'hyperkalemia']
hk_detected = hk_patients['Hyperkalemia_triggered'].sum()
print(f"\nHyperkalemia Detection:")
print(f"  True Positives: {hk_detected}/5 ({hk_detected/5*100:.0f}%)")
print(f"  False Positives: {df[(df['ground_truth'] != 'hyperkalemia') & (df['Hyperkalemia_triggered'])].shape[0]}")

# Lactic Acidosis (Sepsis marker)
sepsis_patients = df[df['ground_truth'] == 'sepsis']
sepsis_detected = sepsis_patients['Lactic Acidosis_triggered'].sum()
print(f"\nLactic Acidosis Detection (Sepsis marker):")
print(f"  True Positives: {sepsis_detected}/5 ({sepsis_detected/5*100:.0f}%)")
print(f"  False Positives: {df[(df['ground_truth'] != 'sepsis') & (df['Lactic Acidosis_triggered'])].shape[0]}")

In [None]:
# Visualize Results - Confusion Matrix Style
import matplotlib.pyplot as plt
import numpy as np

fig, axes = plt.subplots(1, 3, figsize=(14, 4))

scenario_mapping = {
    'AKI': ('AKI_triggered', 'aki'),
    'Hyperkalemia': ('Hyperkalemia_triggered', 'hyperkalemia'),
    'Lactic Acidosis': ('Lactic Acidosis_triggered', 'sepsis')
}

for ax, (scenario_name, (col, condition)) in zip(axes, scenario_mapping.items()):
    # Calculate confusion matrix
    tp = df[(df['ground_truth'] == condition) & (df[col])].shape[0]
    fn = df[(df['ground_truth'] == condition) & (~df[col])].shape[0]
    fp = df[(df['ground_truth'] != condition) & (df[col])].shape[0]
    tn = df[(df['ground_truth'] != condition) & (~df[col])].shape[0]
    
    matrix = np.array([[tp, fn], [fp, tn]])
    
    im = ax.imshow(matrix, cmap='RdYlGn', aspect='auto', vmin=0, vmax=max(5, matrix.max()))
    
    # Labels
    ax.set_xticks([0, 1])
    ax.set_yticks([0, 1])
    ax.set_xticklabels(['Detected', 'Missed'])
    ax.set_yticklabels(['Actual +', 'Actual -'])
    
    # Add values
    for i in range(2):
        for j in range(2):
            text = ax.text(j, i, matrix[i, j], ha='center', va='center', 
                          fontsize=16, fontweight='bold',
                          color='white' if matrix[i, j] > 2 else 'black')
    
    sensitivity = tp / (tp + fn) if (tp + fn) > 0 else 0
    specificity = tn / (tn + fp) if (tn + fp) > 0 else 0
    ax.set_title(f'{scenario_name}\nSens: {sensitivity:.0%}, Spec: {specificity:.0%}', fontweight='bold')

plt.suptitle('PSDL Detection Performance on Synthetic Data', fontsize=14, fontweight='bold')
plt.tight_layout()
plt.show()

## Step 5: Explore PSDL Logic Details

Let's look at exactly which rules triggered for specific patients.

In [None]:
# Show detailed results for AKI patients
print("=== AKI Patient Details ===\n")

for _, row in df[df['ground_truth'] == 'aki'].iterrows():
    status = "DETECTED" if row['AKI_triggered'] else "MISSED"
    rules = row['AKI_rules'][:3] if row['AKI_rules'] else ['None']
    print(f"{row['patient_id']}: {status}")
    print(f"  Triggered rules: {', '.join(rules)}")
    print()

In [None]:
# Interactive: Evaluate a specific patient
def evaluate_patient_detail(patient_id: str):
    """Show detailed evaluation for a single patient."""
    print(f"\n=== Detailed Evaluation: {patient_id} ===\n")
    print(f"Ground Truth: {ground_truth[patient_id]}\n")
    
    for scenario_name, scenario in scenarios.items():
        evaluator = PSDLEvaluator(scenario, backend)
        result = evaluator.evaluate_patient(patient_id, base_time)
        
        status = "TRIGGERED" if result.is_triggered else "not triggered"
        print(f"{scenario_name}: {status}")
        
        if result.is_triggered:
            print(f"  Rules: {result.triggered_logic}")
            if hasattr(result, 'trend_values'):
                print(f"  Trend values: {result.trend_values}")
        print()

# Try it!
evaluate_patient_detail('P006')  # An AKI patient
evaluate_patient_detail('P011')  # A hyperkalemia patient

## Key Takeaways

### What You Just Saw

1. **Declarative Scenarios**: Clinical logic defined in YAML, not code
2. **Temporal Operators**: `delta()`, `slope()`, `last()` for trend detection
3. **Portable Logic**: Same scenario works with any data backend
4. **Validated Detection**: High sensitivity on labeled synthetic data

### Next Steps

- **Real Data**: Try with MIMIC-IV or your own OMOP/FHIR data
- **Custom Scenarios**: Write your own detection logic
- **Integration**: Connect to your EHR or data warehouse

---

**Learn More**: [PSDL GitHub](https://github.com/Chesterguan/PSDL) | [Documentation](https://github.com/Chesterguan/PSDL/docs)