# CareMap: EHR Companion

## One Model. Three Modules. Maximum Impact on Both Sides of Healthcare.

---

**[Kaggle MedGemma Impact Challenge](https://www.kaggle.com/competitions/med-gemma-impact-challenge)**

CareMap demonstrates how a single clinical AI model (MedGemma) can enhance healthcare on **both sides**:

| Side | Who | Module | MedGemma Mode |
|------|-----|--------|---------------|
| **Patient Side** | Caregivers, Ayahs (home healthcare workers in India), Families | Fridge Sheet Generator | Text Reasoning |
| **Provider Side** | Radiologists | X-ray Triage Queue | **Multimodal** |
| **Provider Side** | Lab/Clinical Staff | HL7 Message Triage | Text Reasoning |

---

## The Problem: Healthcare Information Overload

### On the Patient Side

Family caregivers receive overwhelming amounts of medical information:
- Dense clinical notes with medical jargon
- Multiple medication instructions with complex interactions
- Lab results with unexplained abbreviations
- Follow-up tasks scattered across documents

**In India specifically**, the challenge is compounded by:
- **Rotating Ayahs** (home healthcare workers) who change frequently
- **Language barriers** - Ayahs often cannot read English
- **No EHR infrastructure** - patients carry paper records in bags
- **Overwhelmed doctors** - seeing 125-175 patients per day

### On the Provider Side

Healthcare workers are drowning in data:
- **Radiologists**: 1:100,000 patient ratio in India, 72-hour report delays
- **Lab Staff**: 1,000+ HL7 messages per day, critical values buried in routine results
- **No prioritization** - life-threatening findings wait in queue behind normal studies

---

## User Research: Voices from the Field

We interviewed **4 healthcare professionals and caregivers** to understand the real challenges. Their insights directly shaped how MedGemma powers CareMap.

---

### The Need for Simple, Plain-Language Explanations

**Dr. Vinodhini Sriram** (Family Medicine, US)

> "For the caregivers, I think it's mainly the medications. They are not going to care about the kidney function... **All they need is that first page which is the medication 8:00 a.m., 12 p.m., whatever it is.**"

> "Absolutely. If the source of this data is reliable, **this is very good.**"

*How MedGemma helps:* Transforms complex sig codes like "PO BID AC" into plain language: "Take by mouth, twice daily, before meals."

---

### The Family Caregiver's Burden

**Sunayana Mann** (Daughter/Family Caregiver)

> "If I had to drive her to the hospital, she cannot tell me all the medications and I might not be able to grab the whole bag. **So easiest would be... I peel the fridge magnet.**"

> "**Task oriented for them** like I don't want them to think. I want to give them instructions and I want them to follow it like nothing extra."

> "**I'm not a medical professional.** I don't know all these things... I need someone to tell me what questions to ask."

*How MedGemma helps:* Generates "Ask the Doctor" prompts so families know what questions matter for their specific situation.

---

### The Reality of Healthcare in India

**Dr. Manini Moudgal** (Pediatrician, Bangalore, India)

> "Given the volumes that we deal with over here, **EHR is a bloody headache**... We usually see **125-175 patients a day.**"

> "The parents don't understand what I'm writing. They just want to know - **when do I give this medicine, how much, and what should I watch for?**"

*How MedGemma helps:* Extracts the essential "what, when, and watch for" from clinical notes, eliminating jargon.

---

### Two Audiences, Two Levels of Detail

**Dr. Gaurav Mishra** (Child, Adolescent and Adult Psychiatrist)

> "**The pages for the ayah need to be even simpler**... for kids with autism sometimes we would use image based chart."

> "I would have **both levels available**... deep dive for family, simple for ayah."

> "The ayah changes every week. **How do you hand off care?** You can't expect verbal instructions to work."

*How MedGemma helps:* Generates separate pages for Ayahs (visual, task-focused) and Family (context, explanations, connections).

---

### Common Themes to CareMap Design Principles

| User Research Insight | CareMap Response |
|----------------------|------------------|
| "All they need is medication times" | Page 1: Medication Schedule with time/food badges |
| "I need someone to tell me what to ask" | MedGemma generates "Ask the Doctor" prompts |
| "Two levels - deep dive and simple" | Separate pages for Ayah vs Family audiences |
| "Ayah changes every week" | Printable poster reduces reliance on verbal handoffs |
| "Don't omit medications" | All 8 medications shown, nothing hidden |
| "Emergency contacts" | Page 5: Who to call when something goes wrong |

---

## The Hypothesis

**A single clinical AI model can serve BOTH sides of healthcare:**

```
                    ┌─────────────────────────────────┐
                    │       MedGemma 1.5 4B-IT        │
                    │   Clinical Foundation Model     │
                    │   (Text + Multimodal)           │
                    └─────────────────────────────────┘
                                   │
              ┌────────────────────┼────────────────────┐
              │                    │                    │
              ▼                    ▼                    ▼
    ┌─────────────────┐  ┌─────────────────┐  ┌─────────────────┐
    │  Module 1       │  │  Module 2       │  │  Module 3       │
    │  Fridge Sheets  │  │  X-ray Triage   │  │  HL7 Triage     │
    │  (Patient Side) │  │  (Multimodal)   │  │  (Provider Side)│
    └─────────────────┘  └─────────────────┘  └─────────────────┘
           │                    │                    │
           ▼                    ▼                    ▼
    Ayahs & Families     Radiologists        Lab Staff
```

**Why MedGemma?**

MedGemma is specifically trained on clinical text including **Electronic Health Records (EHR)**, making it ideal for:
- Understanding medication sig codes (e.g., "PO BID AC" → "by mouth, twice daily, before meals")
- Interpreting lab result patterns and clinical significance
- Analyzing chest X-rays for findings and urgency
- Translating medical jargon while preserving safety-critical information

### Fridge Sheet Pipeline: From EHR to Printed Page

The fridge sheet module expands into four independent interpreters that can run in parallel, followed by an optional translation pass:

```
    ┌──────────────────────┐
    │  EHR Patient JSON    │
    │  (medications, labs,  │
    │   care gaps, imaging) │
    └──────────┬───────────┘
               │
               ▼
    ┌──────────────────────┐
    │   MedGemma 1.5 4B-IT │
    │   (Text Reasoning)   │
    └──────────┬───────────┘
               │
    ┌──────────┼──────────┬──────────┐
    ▼          ▼          ▼          ▼
┌────────┐┌────────┐┌────────┐┌────────┐
│  Meds  ││  Labs  ││  Care  ││Imaging │  <- 4 independent interpreters
│Interp. ││Interp. ││  Gaps  ││Interp. │     (parallelizable)
│        ││        ││Interp. ││        │
└───┬────┘└───┬────┘└───┬────┘└───┬────┘
    │         │         │         │
    └─────────┴────┬────┴─────────┘
                   ▼
    ┌──────────────────────┐
    │  5-Page Fridge Sheet │
    │  (Printable HTML)    │
    │  8.5x11" per page    │
    └──────────┬───────────┘
               │
               ▼
    ┌──────────────────────┐
    │     NLLB-200         │  <- Optional translation pass
    │  (Meta, 600+ langs)  │
    └──────────┬───────────┘
               │
               ▼
    ┌──────────────────────┐
    │  Bengali / Hindi     │
    │  Fridge Sheet        │
    │  (med names stay     │
    │   in English)        │
    └──────────────────────┘
```

**Key design choices:**
- Each interpreter is a standalone function with no shared state, trivially parallelizable
- Safety validation runs on every output (no forbidden terms, no raw values, no jargon)
- Translation preserves medication names untranslated (safety-critical)
- The final deliverable is a **printed page** that requires zero technology once printed

---

## Deployment Pathways & Future Work

### US Integration: Reducing Administrative Burden

The fridge sheet can become a standard discharge/checkout handout, printed alongside after-visit summaries that patients already receive. Rather than adding work, CareMap *reduces* the administrative burden on clinical staff: instead of manually writing plain-language instructions for each patient, the EHR data feeds directly into MedGemma, which generates the fridge sheet automatically. The clinician reviews and prints - no manual translation required.

Radiology and HL7 triage modules integrate into existing EHR systems via standard HL7/FHIR interfaces with minimal workflow changes: incoming messages route through CareMap's priority classifier before reaching the existing review queue.

### India & Low-Resource Settings

Where paper records dominate, MedGemma's multimodal capability enables a natural next step: caregivers photograph medication strips and upload outpatient lab reports. CareMap already handles structured data end-to-end; the gap is only the intake layer (image-to-structured-data), which MedGemma 1.5's vision capabilities are designed for. This would allow families in settings like Amma's, where prescriptions arrive as handwritten notes and lab reports as printed PDFs, to generate a fridge sheet from a phone camera.

### Performance & Scalability

**Fridge Sheets:** Healthcare wait times are pre-baked: patients already wait 15-30 minutes at checkout. CareMap's four interpreters (medication, lab, care gap, imaging) are independent functions with no shared state, making them trivially parallelizable. On a single T4, sequential generation takes ~10 minutes; with 4-way parallelism, this drops to ~3 minutes, well within existing checkout wait times. Translation is a follow-up pass after all four sheets are generated.

**Radiology & HL7 Triage:** Each incoming HL7 message or X-ray image is an independent event with no FIFO ordering required. This enables horizontal scaling via event-driven architecture: messages arrive, fan out to available GPU workers (map), and results merge into the prioritized queue (reduce). A hospital processing 1,000+ HL7 messages/day can distribute across workers without coordination overhead.

**Modular Architecture:** The codebase is designed so the foundation model can be swapped without large-scale refactoring. All MedGemma interactions flow through a single `MedGemmaClient` class; replacing the model requires changing only the `model_id` parameter. Similarly, the physician-auditable rule engine (CSV) is decoupled from model code. A healthcare institution can customize rules based on their patient population: evaluate historical records, identify the most prevalent diagnoses treated at that facility, and build targeted rules for those conditions first rather than attempting to cover the entire ICD diagnosis code tree.

### Future Work: Scaling the Model

All evaluation in this notebook used `medgemma-1.5-4b-it` on a single T4 GPU (16GB VRAM). The framework is model-size agnostic: scaling to `medgemma-27b` requires only a larger GPU, not architectural changes. We expect the 27B model to improve radiology finding detection and reduce the current deliberate over-triage rate, and to produce richer, more nuanced medication explanations. Due to time and compute constraints, the 27B model was not evaluated in this submission; characterizing those gains is a priority for future work.

---

## Environment Setup

In [None]:
# Install dependencies for Kaggle environment
!pip install -q transformers>=4.50.0 accelerate>=0.27.0 huggingface-hub>=0.34.0 \
    pydantic>=2.6.0 sentencepiece tqdm lxml textstat


In [None]:
import json
import os
import sys
from pathlib import Path
from typing import Dict, Any, List
from IPython.display import display, Markdown, HTML, IFrame
from tqdm import tqdm

# --- Kaggle-specific setup ---
# Authenticate with HuggingFace for gated model access
from kaggle_secrets import UserSecretsClient
secrets = UserSecretsClient()
hf_token = secrets.get_secret("HUGGINGFACE_TOKEN")
os.environ["HF_TOKEN"] = hf_token

# Auto-detect dataset mount path (Kaggle uses different paths depending on how dataset is attached)
DATASET_NAME = "caremap-medgemma"
_candidates = [
    Path(f"/kaggle/input/{DATASET_NAME}"),                    # standard mount
    Path(f"/kaggle/input/d/codekunoichi/{DATASET_NAME}"),     # user-prefixed mount
]
DATASET_DIR = next((p for p in _candidates if p.exists()), None)
if DATASET_DIR is None:
    # Fallback: scan /kaggle/input for any directory containing src/caremap
    for parent in Path("/kaggle/input").rglob("src/caremap/__init__.py"):
        DATASET_DIR = parent.parent.parent.parent
        break
assert DATASET_DIR is not None, (
    f"Dataset '{DATASET_NAME}' not found. Check that it is attached to this notebook."
)

project_root = DATASET_DIR
sys.path.insert(0, str(DATASET_DIR / "src"))

print(f"Project root: {project_root}")
print(f"Python version: {sys.version}")
print(f"HF auth: {'OK' if hf_token else 'MISSING'}")

In [None]:
# Load MedGemma
from caremap.llm_client import MedGemmaClient

print("Loading MedGemma 1.5 4B-IT...")
print("(This may take 1-2 minutes on first run)")

client = MedGemmaClient()

print(f"\nModel: {client.model_id}")
print(f"Device: {client.device}")
print("Ready!")

---

## Module 1: Patient Portal - Fridge Sheet Generator

### Meet the Patients

CareMap ships with two realistic patient profiles:

| Patient | Age | Key Conditions | Medications | Profile |
|---------|-----|----------------|-------------|---------|
| **Amma** | 70s F | Alzheimer's, Diabetes, Anemia, Hypertension, Hyperlipidemia | 10 meds | `canonical_amma.json` |
| **Dadu** | 80s M | Diabetes (CKD 3b), CHF, Atrial Fibrillation | 8 meds | `golden_patient_complex.json` |

**Amma is selected by default** - she represents the author's mother, whose story inspired CareMap (see writeup). Change `SELECTED_PATIENT` in the next cell to switch to Dadu.

The care team for either patient includes rotating Ayahs (home healthcare workers), remote family members, and an elderly spouse - exactly the scenario CareMap was designed for.

CareMap generates **5 printable pages** tailored to each audience:

| Page | Audience | Purpose |
|------|----------|---------|
| 1. Medications | Ayah/Helper | Daily schedule with time/food badges |
| 2. Labs | Family Caregiver | Test results in plain language |
| 3. Care Actions | Family Caregiver | Today/Week/Later task buckets |
| 4. Imaging | Family Caregiver | X-ray findings explained |
| 5. Connections | Both | How meds, labs, and actions connect |

In [None]:
# === PATIENT SELECTION ===
# Change to "dadu" to generate fridge sheets for Dadu instead.
SELECTED_PATIENT = "amma"

PATIENT_FILES = {
    "dadu": "golden_patient_complex.json",   # 80s M - Diabetes, CHF, AFib, 8 meds
    "amma": "canonical_amma.json",           # 70s F - Alzheimer's, Diabetes, Anemia, 10 meds
}

print(f"Selected patient: {SELECTED_PATIENT.upper()}")
print(f"Data file: {PATIENT_FILES[SELECTED_PATIENT]}")

In [None]:
# Load selected patient data
patient_file = PATIENT_FILES[SELECTED_PATIENT]
golden_file = project_root / 'examples' / patient_file

with open(golden_file) as f:
    patient_data = json.load(f)

print(f"Patient: {patient_data['patient']['nickname']}")
print(f"Age: {patient_data['patient']['age_range']}")
print(f"Conditions: {', '.join(patient_data['patient']['conditions_display'])}")
print(f"\nMedications: {len(patient_data['medications'])}")
print(f"Lab Results: {len(patient_data['results'])}")
print(f"Care Gaps: {len(patient_data['care_gaps'])}")

### 1.1 Medication Interpretation

MedGemma transforms complex medication instructions into plain language:

**Input (from EHR):**
```
Warfarin - Take as directed based on INR results
Clinician Notes: Target INR 2.0-3.0 for AFib. Weekly INR checks required.
Interaction Notes: Avoid NSAIDs. Keep vitamin K intake consistent.
```

**Output (for Ayah):**
- **What it does:** Helps prevent dangerous blood clots
- **When:** Evening, same time each day
- **Watch for:** Unusual bleeding, bruising

In [None]:
from caremap.medication_interpretation import interpret_medication_v3_grounded

print("Interpreting medications with MedGemma...")
print("=" * 60)

# Interpret first 3 medications as examples
for med in tqdm(patient_data['medications'][:3], desc="Medications"):
    result, _ = interpret_medication_v3_grounded(
        client=client,
        medication_name=med['medication_name'],
        sig_text=med['sig_text'],
        clinician_notes=med.get('clinician_notes', ''),
        interaction_notes=med.get('interaction_notes', ''),
    )
    
    print(f"\n{'='*60}")
    print(f"MEDICATION: {med['medication_name']} ({med['timing']})")
    print(f"{'='*60}")
    if 'raw_response' not in result:
        print(f"What this does: {result.get('what_this_does', 'N/A')}")
        print(f"Watch out for: {result.get('watch_out_for', 'N/A')}")

### 1.2 Lab Interpretation

MedGemma explains lab results without numeric values or medical jargon:

**Design Principle:** Caregivers don't need to know "INR = 3.5". They need to know:
- What the test checks
- Whether the result is concerning
- What question to ask the doctor

In [None]:
from caremap.lab_interpretation import interpret_lab

print("Interpreting lab results with MedGemma...")
print("=" * 60)

for lab in tqdm(patient_data['results'][:3], desc="Labs"):
    result = interpret_lab(
        client=client,
        test_name=lab['test_name'],
        meaning_category=lab['meaning_category'],
        source_note=lab.get('source_note', ''),
    )
    
    print(f"\n{'='*60}")
    print(f"TEST: {lab['test_name']} ({lab['flag']})")
    print(f"{'='*60}")
    print(f"What was checked: {result.get('what_was_checked', 'N/A')}")
    print(f"What it means: {result.get('what_it_means', 'N/A')}")
    print(f"Ask the doctor: {result.get('what_to_ask_doctor', 'N/A')}")

### 1.3 Care Gap Interpretation

MedGemma converts clinical care gaps into actionable tasks:

**Input:** "INR recheck needed - last result was high"

**Output:**
- **Action:** Call clinic to schedule INR blood draw
- **Why:** Blood thinner level needs monitoring
- **Urgency:** This Week

In [None]:
from caremap.caregap_interpretation import interpret_caregap

print("Interpreting care gaps with MedGemma...")
print("=" * 60)

for gap in tqdm(patient_data['care_gaps'][:3], desc="Care Gaps"):
    result = interpret_caregap(
        client=client,
        item_text=gap['item_text'],
        next_step=gap['next_step'],
        time_bucket=gap['time_bucket'],
    )
    
    print(f"\n{'='*60}")
    print(f"GAP: {gap['item_text']} [{gap['time_bucket']}]")
    print(f"{'='*60}")
    print(f"Time bucket: {result.get('time_bucket', 'N/A')}")
    print(f"Action: {result.get('action_item', 'N/A')}")
    print(f"Next step: {result.get('next_step', 'N/A')}")

### 1.4 Imaging Interpretation

Imaging reports contain findings that caregivers need explained in plain language. MedGemma can explain findings while connecting them to the patient's medications.

**Safety Constraint:** CareMap never uses the word "cancer" - that's for the oncologist.

In [None]:
from caremap.imaging_interpretation import interpret_imaging_report

# Load a CT scenario for a heart failure patient
ct_file = project_root / 'examples' / 'golden_imaging_ct.json'
with open(ct_file) as f:
    ct_data = json.load(f)

ct_scenario = ct_data['ct_scenarios'][0]  # CHF patient with shortness of breath

print(f"Study: {ct_scenario['study_type']}")
print(f"Indication: {ct_scenario['clinical_indication']}")
print(f"\nRadiology Report (raw):")
print(f"{ct_scenario['radiology_report']['impression']}")

print("\nInterpreting with MedGemma...")

imaging_result = interpret_imaging_report(
    client=client,
    study_type=ct_scenario['study_type'],
    report_text=ct_scenario['radiology_report']['impression'],
    flag=ct_scenario['flag'],
)

print(f"\n{'='*60}")
print(f"IMAGING INTERPRETATION (Plain Language)")
print(f"{'='*60}")
print(f"What was done: {imaging_result.get('what_was_done', 'N/A')}")
print(f"Key finding: {imaging_result.get('key_finding', 'N/A')}")
print(f"Ask the doctor: {imaging_result.get('what_to_ask_doctor', 'N/A')}")

### 1.5 Connections: The Big Picture

CareMap helps caregivers understand **why** things are connected:

```
Warfarin (blood thinner) ──→ INR Test ──→ Weekly Monitoring
         │                      │
         └── Avoid NSAIDs ──────┘
```

This is especially valuable when explaining care to new Ayahs.

### 1.6 Generate Complete Fridge Sheets (5 Pages)

Now we generate all 5 printable HTML pages:

In [None]:
from caremap.fridge_sheet_html import (
    generate_medications_page,
    generate_labs_page,
    generate_gaps_page,
    generate_imaging_page,
    generate_connections_page,
    PatientInfo,
)

# Create patient info
patient_info = PatientInfo(
    nickname=patient_data['patient']['nickname'],
    age_range=patient_data['patient']['age_range'],
    conditions=patient_data['patient']['conditions_display']
)

# Output directory (writable on Kaggle)
html_output_dir = Path('/kaggle/working/fridge_sheets')
html_output_dir.mkdir(parents=True, exist_ok=True)

print(f"Generating 5 Concept B pages for {patient_info.nickname}...")
print(f"Output: {html_output_dir}")


In [None]:
# Page 1: Medications (for Ayah)
print("\n[1/5] Generating MEDICATIONS page (for Ayah)...")

meds_html = generate_medications_page(
    patient=patient_info,
    medications=patient_data['medications'],
    client=client,
    page_num=1,
    total_pages=5,
    progress_callback=lambda c, t, m: print(f"      [{c}/{t}] {m}")
)

med_file = html_output_dir / '1_medications.html'
with open(med_file, 'w') as f:
    f.write(meds_html)
print(f"      Saved: {med_file.name}")

In [None]:
# Page 2: Labs (for Family)
print("\n[2/5] Generating LABS page (for Family)...")

labs_html = generate_labs_page(
    patient=patient_info,
    results=patient_data['results'],
    client=client,
    page_num=2,
    total_pages=5,
    progress_callback=lambda c, t, m: print(f"      [{c}/{t}] {m}")
)

labs_file = html_output_dir / '2_labs.html'
with open(labs_file, 'w') as f:
    f.write(labs_html)
print(f"      Saved: {labs_file.name}")

In [None]:
# Page 3: Care Gaps (for Family)
print("\n[3/5] Generating CARE GAPS page (for Family)...")

gaps_html = generate_gaps_page(
    patient=patient_info,
    care_gaps=patient_data['care_gaps'],
    client=client,
    page_num=3,
    total_pages=5,
    progress_callback=lambda c, t, m: print(f"      [{c}/{t}] {m}")
)

gaps_file = html_output_dir / '3_care_gaps.html'
with open(gaps_file, 'w') as f:
    f.write(gaps_html)
print(f"      Saved: {gaps_file.name}")

In [None]:
# Page 4: Imaging (with X-ray from NIH dataset)
print("\n[4/5] Generating IMAGING page with chest X-ray...")

xray_path = project_root / 'data' / 'nih_chest_xray' / 'demo_images' / 'stat' / '00000032_001.png'
print(f"      Using X-ray: {xray_path.name}")

imaging_html = generate_imaging_page(
    patient=patient_info,
    image_path=str(xray_path) if xray_path.exists() else None,
    client=client,
    page_num=4,
    total_pages=5,
    progress_callback=lambda c, t, m: print(f"      [{c}/{t}] {m}")
)

imaging_file = html_output_dir / '4_imaging.html'
with open(imaging_file, 'w') as f:
    f.write(imaging_html)
print(f"      Saved: {imaging_file.name}")

In [None]:
# Page 5: Connections (for Both)
print("\n[5/5] Generating CONNECTIONS page (for Both)...")

connections_html = generate_connections_page(
    patient=patient_info,
    medications=patient_data['medications'],
    results=patient_data['results'],
    care_gaps=patient_data['care_gaps'],
    contacts=patient_data['contacts'],
    page_num=5,
    total_pages=5,
    progress_callback=lambda c, t, m: print(f"      [{c}/{t}] {m}")
)

connections_file = html_output_dir / '5_connections.html'
with open(connections_file, 'w') as f:
    f.write(connections_html)
print(f"      Saved: {connections_file.name}")

print("\n" + "=" * 60)
print("All 5 fridge sheet pages generated!")
print("=" * 60)

In [None]:
# Preview Page 1: Medications (for Ayah)
print("Preview: Page 1 - Medications (for Ayah)")
display(HTML(med_file.read_text()))

In [None]:
# Preview Page 2: Labs (for Family)
print("Preview: Page 2 - Labs (for Family)")
display(HTML(labs_file.read_text()))

In [None]:
# Preview Page 3: Care Gaps (for Family)
print("Preview: Page 3 - Care Gaps (for Family)")
display(HTML(gaps_file.read_text()))

In [None]:
# Preview Page 4: Imaging (for Family)
print("Preview: Page 4 - Imaging (for Family)")
display(HTML(imaging_file.read_text()))

In [None]:
# Preview Page 5: Connections (for Both)
print("Preview: Page 5 - Connections (for Both)")
display(HTML(connections_file.read_text()))

### 1.7 Multilingual Support: Bengali Fridge Sheet

For Ayahs who cannot read English, CareMap can translate the fridge sheet using Meta's NLLB-200 model. Bengali is particularly relevant for Amma's caregivers in Kolkata:

In [None]:
from caremap.translation import NLLBTranslator

print("Loading NLLB-200 translator...")
translator = NLLBTranslator()
print("Translator ready!")

# Example: Translate medication instructions to Bengali
sample_text = "This medicine helps control your heart rate. Take it every day at the same time. Do NOT stop taking it suddenly."

print(f"\nOriginal (English):")
print(f"  {sample_text}")

bengali = translator.translate_to(sample_text, "ben_Beng")
print(f"\nBengali (for Ayah):")
print(f"  {bengali}")

### 1.8 Bengali Fridge Sheets: Full Page Translation

Now we translate the **complete fridge sheet HTML pages** to Bengali using NLLB-200. This demonstrates the real-world value: an Ayah receives the **same professionally formatted fridge sheet**, but with all instructions in her language.

**What gets translated:**
- Page headings and labels
- MedGemma-generated explanations (why it matters, watch for)
- Action items and next steps
- Time bucket labels (Today, This Week, Later)

**What stays in English (safety-critical):**
- Medication names (Metformin, Warfarin, etc.)
- Dosage amounts (500mg, etc.)
- MedGemma badge

In [None]:
from caremap.html_translator import translate_fridge_sheet_html
import threading, time

print("Translating Medication Schedule to Bengali (\u09ac\u09be\u0982\u09b2\u09be)...")
print("=" * 60)

translate_result = {}
translate_error = [None]

def run_med_translation():
    try:
        translate_result['html'] = translate_fridge_sheet_html(
            html_content=meds_html,
            translator=translator,
            target_lang="ben_Beng",
            progress_callback=lambda c, t, m: print(f"  [{c}/{t}] {m}")
        )
    except Exception as e:
        translate_error[0] = e

thread = threading.Thread(target=run_med_translation)
thread.start()

with tqdm(desc="Medication translation (NLLB-200)", unit="s",
          bar_format="{desc}: {elapsed} elapsed |{bar}|") as pbar:
    while thread.is_alive():
        time.sleep(1)
        pbar.update(1)

thread.join()
if translate_error[0]:
    raise translate_error[0]

bengali_meds_html = translate_result['html']

bengali_med_file = html_output_dir / '1_medications_bn.html'
with open(bengali_med_file, 'w', encoding='utf-8') as f:
    f.write(bengali_meds_html)

print(f"\nSaved: {bengali_med_file.name}")
print("\nPreview: Bengali Medication Schedule (for Ayah)")
display(HTML(bengali_meds_html))

In [None]:
print("Translating Care Actions to Bengali (\u09ac\u09be\u0982\u09b2\u09be)...")
print("=" * 60)

translate_result = {}
translate_error = [None]

def run_gaps_translation():
    try:
        translate_result['html'] = translate_fridge_sheet_html(
            html_content=gaps_html,
            translator=translator,
            target_lang="ben_Beng",
            progress_callback=lambda c, t, m: print(f"  [{c}/{t}] {m}")
        )
    except Exception as e:
        translate_error[0] = e

thread = threading.Thread(target=run_gaps_translation)
thread.start()

with tqdm(desc="Care gaps translation (NLLB-200)", unit="s",
          bar_format="{desc}: {elapsed} elapsed |{bar}|") as pbar:
    while thread.is_alive():
        time.sleep(1)
        pbar.update(1)

thread.join()
if translate_error[0]:
    raise translate_error[0]

bengali_gaps_html = translate_result['html']

bengali_gaps_file = html_output_dir / '3_care_gaps_bn.html'
with open(bengali_gaps_file, 'w', encoding='utf-8') as f:
    f.write(bengali_gaps_html)

print(f"\nSaved: {bengali_gaps_file.name}")
print("\nPreview: Bengali Care Actions (for Family)")
display(HTML(bengali_gaps_html))

---

## Module 2: Radiology Triage (Multimodal)

### The Problem

| Metric | India | Impact |
|--------|-------|--------|
| Radiologist-to-patient ratio | 1:100,000 | Severe shortage |
| Average X-ray report delay | 72 hours | Critical findings missed |
| Daily imaging volume | 500+ studies/radiologist | Burnout, errors |

### The Solution: MedGemma Multimodal Triage

MedGemma analyzes chest X-rays and assigns priority:

| Priority | Review Time | Examples |
|----------|-------------|----------|
| STAT | Intervene now | Pulmonary edema, pneumothorax |
| SOON | < 1 hour | Consolidation, effusion, pneumonia |
| ROUTINE | < 24 hours | Cardiomegaly, nodule, emphysema |

*Priority rules reviewed and validated by Dr. Vinodhini Sriram (Family Medicine).*

**Key Principle:** AI prioritizes the worklist. Radiologists review ALL images.

In [None]:
# Free text-only client AND translator to make GPU room for multimodal
import gc, torch

# Delete text-only MedGemma client
del client
# Delete NLLB translator (loaded in cell 32, ~1.2GB on GPU)
try:
    del translator
except NameError:
    pass
gc.collect()
if torch.cuda.is_available():
    torch.cuda.empty_cache()
    free_gb = torch.cuda.mem_get_info()[0] / 1024**3
    print(f"GPU memory free after cleanup: {free_gb:.1f} GiB")

# Load MedGemma with multimodal support
print("Loading MedGemma with multimodal support...")
multimodal_client = MedGemmaClient(enable_multimodal=True)
client = multimodal_client  # Reuse for any remaining text cells
print(f"Device: {multimodal_client.device}")
print("Ready for image analysis!")

In [None]:
# Load sample X-ray from NIH dataset
import pandas as pd
from PIL import Image
import matplotlib.pyplot as plt

demo_images_dir = project_root / 'data' / 'nih_chest_xray' / 'demo_images'
manifest_path = project_root / 'data' / 'nih_chest_xray' / 'sample_manifest.csv'
manifest = pd.read_csv(manifest_path)

# Display a STAT case
stat_image = manifest[manifest['priority'] == 'STAT'].iloc[0]
image_path = demo_images_dir / 'stat' / stat_image['image_id']

print(f"STAT Case: {stat_image['image_id']}")
print(f"Patient: {stat_image['patient_age']} year old {stat_image['patient_gender']}")
print(f"Ground Truth: {stat_image['findings']}")

img = Image.open(image_path)
plt.figure(figsize=(8, 8))
plt.imshow(img, cmap='gray')
plt.title(f"Chest X-ray: {stat_image['image_id']}")
plt.axis('off')
plt.show()

In [None]:
import threading
import time
from caremap.radiology_triage import analyze_xray

print("Analyzing X-ray with MedGemma multimodal...")
print("(Single LLM call - typically takes 1-2 minutes on MPS)")

xray_result = {}
xray_error = [None]

def run_xray_analysis():
    try:
        xray_result.update(analyze_xray(
            client=multimodal_client,
            image_path=str(image_path),
            patient_age=int(stat_image['patient_age']),
            patient_gender=stat_image['patient_gender']
        ))
    except Exception as e:
        xray_error[0] = e

thread = threading.Thread(target=run_xray_analysis)
thread.start()

with tqdm(desc="X-ray Triage (MedGemma inference)", unit="s", bar_format="{desc}: {elapsed} elapsed |{bar}|") as pbar:
    while thread.is_alive():
        time.sleep(1)
        pbar.update(1)

thread.join()
if xray_error[0]:
    raise xray_error[0]

result = xray_result
priority_emoji = {'STAT': '\U0001F534', 'SOON': '\U0001F7E1', 'ROUTINE': '\U0001F7E2'}
priority = result.get('priority', 'STAT')

print(f"\n{priority_emoji.get(priority, '')} MedGemma Triage Result: {priority}")
print(f"Confidence: {result.get('confidence', 0):.0%}")
print(f"\nFindings:")
for finding in result.get('findings', []):
    print(f"  - {finding}")
print(f"\nGround Truth: {stat_image['findings']}")

---

## Module 3: HL7 ORU Message Triage (Text)

### The Problem

Clinical staff receive **1000+ HL7 messages per day**. Critical values (K+ > 6.5, Troponin elevation) can be buried in routine results.

### The Solution: MedGemma Text Reasoning

MedGemma triages incoming lab results and clinical messages by urgency.

In [None]:
from caremap.hl7_triage import load_sample_messages, triage_oru_message

messages = load_sample_messages()
stat_msg = [m for m in messages if m['expected_priority'] == 'STAT'][0]

print(f"Sample STAT Message: {stat_msg['message_id']}")
print(f"Patient: {stat_msg['patient']['age']}yo {stat_msg['patient']['gender']}")
print(f"Context: {stat_msg['clinical_context']}")
print(f"\nObservations:")
for obs in stat_msg['observations']:
    print(f"  - {obs['test_name']}: {obs['value']} {obs.get('units', '')} (ref: {obs['reference_range']})")

In [None]:
import threading
import time

print("\nTriaging with MedGemma...")
print("(Single LLM call - typically takes 1-2 minutes on MPS)")

triage_result = {}
triage_error = [None]

def run_triage():
    try:
        triage_result.update(triage_oru_message(client, stat_msg))
    except Exception as e:
        triage_error[0] = e

thread = threading.Thread(target=run_triage)
thread.start()

with tqdm(desc="HL7 Triage (MedGemma inference)", unit="s", bar_format="{desc}: {elapsed} elapsed |{bar}|") as pbar:
    while thread.is_alive():
        time.sleep(1)
        pbar.update(1)

thread.join()
if triage_error[0]:
    raise triage_error[0]

result = triage_result
priority = result.get('priority', 'STAT')
print(f"\n{priority_emoji.get(priority, '')} Triage Result: {priority}")
print(f"Reason: {result.get('priority_reason', 'N/A')}")
print(f"Recommended Action: {result.get('recommended_action', 'N/A')}")

---

## Quantitative Evaluation

The demos above show *what* CareMap produces. This section measures *how well* it performs across all four modules using systematic batch evaluation.

| Module | N | What We Measure | Ground Truth Source |
|--------|---|-----------------|---------------------|
| Radiology Triage | 25 images | Priority accuracy (STAT/SOON/ROUTINE) | `sample_manifest.csv` expert labels |
| HL7 Triage | 20 messages | Priority accuracy (STAT/SOON/ROUTINE) | `expected_priority` in each message |
| Medication Interp. | All patient meds | JSON parse + schema + safety check | `SafetyValidator` rule engine |
| Lab Interp. | 8 scenarios | Schema + forbidden terms + question mark | `golden_labs.json` specification |

### Design Insight: Divide and Conquer for Radiology Triage

Our first approach gave MedGemma **both** tasks - detect findings **and** assign priority. The combined approach failed. Decoupling them solved the problem:

| Metric | MedGemma Alone | MedGemma + Rule Engine | Improvement |
|--------|---------------|----------------------|-------------|
| Overall Accuracy | 42% | **50%** | +8pp |
| STAT Recall | **0%** (0/3) | **100%** (3/3) | 0 to 3 critical cases caught |
| ROUTINE Recall | 33% | Improved | Deliberate over-triage by design |

**Why the combined approach wins:** MedGemma 1.5 is excellent at *detecting what's in the image* (edema, effusion, cardiomegaly) but poor at *mapping findings to clinical urgency*. It classified everything as SOON or ROUTINE, missing all three critical cases. The rule engine maps detected findings to clinically validated priorities.

**The fix: play to each system's strengths.**

| Task | Best At | System |
|------|---------|--------|
| Visual finding detection | Pattern recognition in pixels | MedGemma (multimodal) |
| Priority assignment | Clinical judgment, protocols | Rule engine (physician-authored CSV) |

The architecture decouples detection from prioritization:

```
MedGemma (multimodal) -> findings list (what the model sees)
         |
Rule Engine (CSV)     -> priority from matched findings
         |
Final Priority = max(model_priority, rule_priority)
```

**Key design principles:**

1. **Rules only escalate, never downgrade.** If MedGemma says STAT and rules say SOON, the patient stays STAT. Safety-first: we never reduce urgency.
2. **"No Finding" forces ROUTINE.** The one exception - if MedGemma sees nothing abnormal, we override any erroneous SOON/STAT prediction down to ROUTINE.
3. **Physician-auditable.** All rules live in a single CSV (`radiology_priority_rules.csv`) with columns: `finding_pattern`, `min_priority`, `rule_name`, `clinical_rationale`. A clinician can review and adjust without touching code. *Rules validated by Dr. Vinodhini Sriram.*
4. **Transparent overrides.** Every triage result records the original model priority, the final priority, and which rules fired - enabling the diagnostic table below.

This is a general pattern for clinical AI: **use the model for perception, use rules for decision-making.** The model output becomes evidence; the rules encode clinical protocols that a human expert can audit and update.

In [None]:
### Cell B: Radiology Batch Triage - Confusion Matrix
import threading, time
import pandas as pd
from caremap.radiology_triage import triage_batch

print("Radiology Batch Triage: 25 NIH chest X-rays")
print("=" * 60)

manifest_path = project_root / 'data' / 'nih_chest_xray' / 'sample_manifest.csv'
demo_images_dir = project_root / 'data' / 'nih_chest_xray' / 'demo_images'

rad_results = []
rad_error = [None]

def run_rad_batch():
    try:
        rad_results.extend(triage_batch(
            client=multimodal_client,
            manifest_path=str(manifest_path),
            images_base_dir=str(demo_images_dir),
            progress_callback=lambda c, t, img: print(f"  [{c}/{t}] {img}")
        ))
    except Exception as e:
        rad_error[0] = e

thread = threading.Thread(target=run_rad_batch)
thread.start()

with tqdm(desc="Radiology batch triage (MedGemma multimodal)", unit="s",
          bar_format="{desc}: {elapsed} elapsed |{bar}|") as pbar:
    while thread.is_alive():
        time.sleep(1)
        pbar.update(1)

thread.join()
if rad_error[0]:
    raise rad_error[0]

# Build confusion matrix
manifest_df = pd.read_csv(manifest_path)
gt_lookup = dict(zip(manifest_df['image_id'], manifest_df['priority'].str.upper()))

rad_rows = []
for r in rad_results:
    rad_rows.append({
        'image_id': r.image_id,
        'predicted': r.priority,
        'ground_truth': gt_lookup.get(r.image_id, 'UNKNOWN'),
        'confidence': r.confidence,
    })

rad_df = pd.DataFrame(rad_rows)
labels = ['STAT', 'SOON', 'ROUTINE']
confusion = pd.crosstab(
    rad_df['ground_truth'], rad_df['predicted'],
    rownames=['Ground Truth'], colnames=['Predicted'],
    dropna=False,
)
# Ensure all labels present
for label in labels:
    if label not in confusion.columns:
        confusion[label] = 0
    if label not in confusion.index:
        confusion.loc[label] = 0
confusion = confusion.reindex(index=labels, columns=labels, fill_value=0)

rad_accuracy = (rad_df['predicted'] == rad_df['ground_truth']).mean()
stat_mask = rad_df['ground_truth'] == 'STAT'
rad_stat_recall = (rad_df.loc[stat_mask, 'predicted'] == 'STAT').mean() if stat_mask.any() else 0.0

print(f"\nConfusion Matrix (N={len(rad_df)}):")
display(confusion)
print(f"\nOverall Accuracy: {rad_accuracy:.0%}")
print(f"STAT Recall:      {rad_stat_recall:.0%}  (most clinically important)")
print(f"Avg Confidence:   {rad_df['confidence'].mean():.0%}")

In [None]:
### Cell B2: Rule Override Diagnostic - Which rules fired on misclassified cases?
print("Rule Override Diagnostic: Misclassified Cases")
print("=" * 60)

diag_rows = []
for r in rad_results:
    gt = gt_lookup.get(r.image_id, 'UNKNOWN')
    diag_rows.append({
        'image_id': r.image_id,
        'ground_truth': gt,
        'model_priority': r.model_priority or r.priority,
        'final_priority': r.priority,
        'matched_rules': ', '.join(r.matched_rules) if r.matched_rules else '(none)',
        'findings': ', '.join(r.findings[:3]) if r.findings else '(none)',
        'correct': r.priority == gt,
    })

diag_df = pd.DataFrame(diag_rows)

# Show misclassified cases
misclassified = diag_df[~diag_df['correct']].copy()
print(f"\nMisclassified: {len(misclassified)}/{len(diag_df)} cases\n")

if len(misclassified) > 0:
    display_cols = ['image_id', 'ground_truth', 'model_priority', 'final_priority', 'matched_rules', 'findings']
    styled = misclassified[display_cols].style.apply(
        lambda row: [
            '',
            '',
            'background-color: #fff3cd' if row['model_priority'] != row['ground_truth'] else '',
            'background-color: #f8d7da' if row['final_priority'] != row['ground_truth'] else '',
            'background-color: #f8d7da' if row['model_priority'] != row['final_priority'] else '',
            '',
        ], axis=1
    ).set_caption("Misclassified Cases - Rule Override Diagnostic")
    display(styled)

    # Summary: which rules cause the most over-escalations?
    from collections import Counter
    rule_counts = Counter()
    for _, row in misclassified.iterrows():
        if row['matched_rules'] != '(none)':
            for rule in row['matched_rules'].split(', '):
                rule_counts[rule] += 1

    if rule_counts:
        print("\nRules causing most misclassifications:")
        for rule, count in rule_counts.most_common():
            print(f"  {rule}: {count} cases")
else:
    print("No misclassified cases!")

In [None]:
### Cell C: HL7 Batch Triage - Confusion Matrix
import threading, time
from caremap.hl7_triage import triage_batch as hl7_triage_batch

print("HL7 Batch Triage: 20 ORU messages")
print("=" * 60)

hl7_results = []
hl7_error = [None]

def run_hl7_batch():
    try:
        hl7_results.extend(hl7_triage_batch(
            client=client,
            messages=messages,
            progress_callback=lambda c, t, mid: print(f"  [{c}/{t}] {mid}")
        ))
    except Exception as e:
        hl7_error[0] = e

thread = threading.Thread(target=run_hl7_batch)
thread.start()

with tqdm(desc="HL7 batch triage (MedGemma text)", unit="s",
          bar_format="{desc}: {elapsed} elapsed |{bar}|") as pbar:
    while thread.is_alive():
        time.sleep(1)
        pbar.update(1)

thread.join()
if hl7_error[0]:
    raise hl7_error[0]

# Build confusion matrix
hl7_rows = []
for r in hl7_results:
    hl7_rows.append({
        'message_id': r.message_id,
        'predicted': r.priority,
        'ground_truth': r.ground_truth_priority or 'UNKNOWN',
        'confidence': r.confidence,
    })

hl7_df = pd.DataFrame(hl7_rows)
labels = ['STAT', 'SOON', 'ROUTINE']
hl7_confusion = pd.crosstab(
    hl7_df['ground_truth'], hl7_df['predicted'],
    rownames=['Ground Truth'], colnames=['Predicted'],
    dropna=False,
)
for label in labels:
    if label not in hl7_confusion.columns:
        hl7_confusion[label] = 0
    if label not in hl7_confusion.index:
        hl7_confusion.loc[label] = 0
hl7_confusion = hl7_confusion.reindex(index=labels, columns=labels, fill_value=0)

hl7_accuracy = (hl7_df['predicted'] == hl7_df['ground_truth']).mean()
hl7_stat_mask = hl7_df['ground_truth'] == 'STAT'
hl7_stat_recall = (hl7_df.loc[hl7_stat_mask, 'predicted'] == 'STAT').mean() if hl7_stat_mask.any() else 0.0

print(f"\nConfusion Matrix (N={len(hl7_df)}):")
display(hl7_confusion)
print(f"\nOverall Accuracy: {hl7_accuracy:.0%}")
print(f"STAT Recall:      {hl7_stat_recall:.0%}  (most clinically important)")
print(f"Avg Confidence:   {hl7_df['confidence'].mean():.0%}")

In [None]:
### Cell D: Medication Batch Evaluation - Safety Checklist
from caremap.medication_interpretation import interpret_medication_v3_grounded, MED_V3_OUT_KEYS
from caremap.safety_validator import SafetyValidator

n_meds = len(patient_data['medications'])
print(f"Medication Batch Evaluation: all {n_meds} medications")
print("=" * 60)

validator = SafetyValidator(strict_mode=False)
med_eval_rows = []

for med in tqdm(patient_data['medications'], desc="Medications"):
    name = med['medication_name']

    # JSON parse check
    json_ok = True
    schema_ok = True
    safety_ok = True
    try:
        result, raw = interpret_medication_v3_grounded(
            client=client,
            medication_name=name,
            sig_text=med['sig_text'],
            clinician_notes=med.get('clinician_notes', ''),
            interaction_notes=med.get('interaction_notes', ''),
        )
        if 'raw_response' in result:
            json_ok = False
            schema_ok = False
            safety_ok = False
        else:
            # Schema completeness
            schema_ok = all(k in result for k in MED_V3_OUT_KEYS)
            # Safety check
            vr = validator.validate_medication_output(
                input_data=med,
                output_data=result,
            )
            safety_ok = vr.is_safe
    except Exception:
        json_ok = False
        schema_ok = False
        safety_ok = False

    med_eval_rows.append({
        'Medication': name,
        'JSON Parse': json_ok,
        'Schema Complete': schema_ok,
        'Safety Check': safety_ok,
    })

med_eval_df = pd.DataFrame(med_eval_rows)

# Color-coded display
def color_bool(val):
    if val is True:
        return 'background-color: #d4edda; color: #155724'
    return 'background-color: #f8d7da; color: #721c24'

def format_bool(val):
    return 'PASS' if val else 'FAIL'

display_df = med_eval_df.copy()
for col in ['JSON Parse', 'Schema Complete', 'Safety Check']:
    display_df[col] = display_df[col].map(format_bool)

styled = display_df.style.applymap(
    lambda v: 'background-color: #d4edda; color: #155724' if v == 'PASS'
    else ('background-color: #f8d7da; color: #721c24' if v == 'FAIL' else ''),
    subset=['JSON Parse', 'Schema Complete', 'Safety Check']
)
display(styled)

med_json_rate = med_eval_df['JSON Parse'].mean()
med_schema_rate = med_eval_df['Schema Complete'].mean()
med_safety_rate = med_eval_df['Safety Check'].mean()
print(f"\nJSON Parse Rate:      {med_json_rate:.0%}")
print(f"Schema Complete Rate: {med_schema_rate:.0%}")
print(f"Safety Pass Rate:     {med_safety_rate:.0%}")

In [None]:
### Cell E: Lab Batch Evaluation - Golden Scenarios
import re
from caremap.lab_interpretation import interpret_lab, LAB_OUT_KEYS

print("Lab Batch Evaluation: 8 golden scenarios from golden_labs.json")
print("=" * 60)

golden_labs_path = project_root / 'examples' / 'golden_labs.json'
with open(golden_labs_path) as f:
    golden_labs = json.load(f)

lab_eval_rows = []

for scenario in tqdm(golden_labs['lab_scenarios'], desc="Lab scenarios"):
    sid = scenario['scenario_id']
    test_name = scenario['test_name']
    meaning = scenario['result_context']['meaning_category']
    forbidden = scenario.get('_forbidden_in_output', [])

    schema_ok = True
    no_forbidden = True
    has_question = True

    try:
        result = interpret_lab(
            client=client,
            test_name=test_name,
            meaning_category=meaning,
            source_note=scenario.get('clinical_significance', ''),
        )
        # Schema check
        schema_ok = all(k in result for k in LAB_OUT_KEYS)

        # Forbidden terms check
        output_text = ' '.join(str(v) for v in result.values())
        found_forbidden = [t for t in forbidden if t.lower() in output_text.lower()]
        no_forbidden = len(found_forbidden) == 0

        # Question mark check
        ask_doc = result.get('what_to_ask_doctor', '')
        has_question = '?' in ask_doc

    except Exception:
        schema_ok = False
        no_forbidden = False
        has_question = False

    lab_eval_rows.append({
        'Scenario': f"{sid}: {test_name}",
        'Schema': schema_ok,
        'No Forbidden': no_forbidden,
        'Has Question': has_question,
    })

lab_eval_df = pd.DataFrame(lab_eval_rows)

display_df = lab_eval_df.copy()
for col in ['Schema', 'No Forbidden', 'Has Question']:
    display_df[col] = display_df[col].map(lambda v: 'PASS' if v else 'FAIL')

styled = display_df.style.applymap(
    lambda v: 'background-color: #d4edda; color: #155724' if v == 'PASS'
    else ('background-color: #f8d7da; color: #721c24' if v == 'FAIL' else ''),
    subset=['Schema', 'No Forbidden', 'Has Question']
)
display(styled)

lab_schema_rate = lab_eval_df['Schema'].mean()
lab_forbidden_rate = lab_eval_df['No Forbidden'].mean()
lab_question_rate = lab_eval_df['Has Question'].mean()
print(f"\nSchema Rate:          {lab_schema_rate:.0%}")
print(f"No Forbidden Rate:    {lab_forbidden_rate:.0%}")
print(f"Has Question Rate:    {lab_question_rate:.0%}")

In [None]:
### Cell F: Combined Evaluation Summary - Scorecard
print("CareMap Evaluation Scorecard")
print("=" * 60)

summary_rows = [
    {
        'Module': 'Radiology Triage',
        'N': len(rad_df),
        'Primary Metric': 'Accuracy',
        'Score': f"{rad_accuracy:.0%}",
        'STAT Recall': f"{rad_stat_recall:.0%}",
        'Details': f"Multimodal, {len(rad_df)} chest X-rays",
    },
    {
        'Module': 'HL7 Triage',
        'N': len(hl7_df),
        'Primary Metric': 'Accuracy',
        'Score': f"{hl7_accuracy:.0%}",
        'STAT Recall': f"{hl7_stat_recall:.0%}",
        'Details': f"Text, {len(hl7_df)} ORU messages",
    },
    {
        'Module': 'Medication Interp.',
        'N': len(med_eval_df),
        'Primary Metric': 'Safety Pass',
        'Score': f"{med_safety_rate:.0%}",
        'STAT Recall': 'N/A',
        'Details': f"JSON {med_json_rate:.0%} | Schema {med_schema_rate:.0%} | Safety {med_safety_rate:.0%}",
    },
    {
        'Module': 'Lab Interp.',
        'N': len(lab_eval_df),
        'Primary Metric': 'No Forbidden',
        'Score': f"{lab_forbidden_rate:.0%}",
        'STAT Recall': 'N/A',
        'Details': f"Schema {lab_schema_rate:.0%} | Forbidden {lab_forbidden_rate:.0%} | Question {lab_question_rate:.0%}",
    },
]

summary_df = pd.DataFrame(summary_rows)

styled_summary = summary_df.style.set_properties(
    **{'text-align': 'center'},
    subset=['N', 'Score', 'STAT Recall']
).set_properties(
    **{'text-align': 'left'},
    subset=['Module', 'Primary Metric', 'Details']
).set_caption("CareMap Quantitative Evaluation Summary")

display(styled_summary)

total_calls = len(rad_df) + len(hl7_df) + len(med_eval_df) + len(lab_eval_df)
print(f"\nTotal MedGemma calls: {total_calls}")
print(f"Triage modules:      STAT recall is the most clinically important metric")
print(f"Fridge sheet modules: Safety pass rate ensures no harmful jargon reaches caregivers")

### Evaluation Takeaways

**What the metrics mean clinically:**

- **STAT Recall** is the single most important metric for triage modules. A missed STAT case (false negative) means a critical patient waits in the routine queue, potentially for 72 hours. CareMap is designed to *over-triage* rather than *under-triage*: defaulting to STAT when uncertain ensures no critical findings are missed.

- **Safety Pass Rate** for medication interpretation validates that MedGemma's plain-language output does not contain forbidden diagnosis terms (e.g., "cancer"), medical jargon, or specific numeric values that caregivers should not act on.

- **No Forbidden Terms** for lab interpretation ensures that MedGemma follows the golden specification: no raw lab values (e.g., "8.4%", "42 mL/min"), no medical abbreviations (e.g., "CKD", "eGFR"), and no clinical jargon that a 6th-grade reader would not understand.

**Limitations and honest caveats:**

1. **Non-deterministic LLM** - MedGemma outputs vary between runs, hardware, and dtype. These scores represent a single evaluation pass.
2. **Curated datasets** - The 25 X-rays and 20 HL7 messages are hand-selected to cover the priority distribution. Real-world performance on thousands of studies may differ.
3. **Safety validation is rule-based** - The `SafetyValidator` catches known patterns (forbidden terms, jargon, measurements) but cannot detect all forms of misleading output.
4. **No human expert review** - These automated metrics are a necessary first step; clinical validation with radiologists and physicians is the next milestone.

---

## Impact Summary

### One Model, Maximum Impact

| Module | Beneficiary | Impact |
|--------|-------------|--------|
| **Fridge Sheets** | Caregivers & Ayahs | Reduced confusion, fewer medication errors |
| **Radiology Triage** | Radiologists | Critical findings prioritized, faster diagnosis |
| **HL7 Triage** | Lab/Clinical Staff | Life-threatening values surfaced immediately |

### Why This Matters for India

- **Caregivers**: Rotating Ayahs (home healthcare workers) can follow clear instructions without verbal handoffs
- **Radiologists**: Scarce specialists can focus on urgent cases first
- **Patients**: Better outcomes through timely intervention

### The Poster Effect: From Recall to Reference

> "The ayah changes every week. **How do you hand off care?**" - Dr. Gaurav Mishra

Printed fridge sheets fundamentally change how care is communicated:

| Without CareMap | With CareMap |
|-----------------|--------------|
| Outgoing ayah tries to remember all instructions | Outgoing ayah points to the poster on the fridge |
| New ayah relies on verbal handoff (incomplete, error-prone) | New ayah reads the poster and asks clarifying questions |
| Family must re-explain everything each week | Family walks new ayah through the printed pages once |
| Critical warnings get forgotten over time | Warnings stay visible: "Do NOT take with ibuprofen" |

**The goal is not perfect recall; it's reliable reference.** When the poster is on the fridge, the next Ayah doesn't need to memorize anything. They just need to read, follow, and ask when something doesn't make sense.

---

## Model Attribution

```bibtex
@article{sellergren2025medgemma,
  title={MedGemma Technical Report},
  author={Sellergren, Andrew and others},
  journal={arXiv preprint arXiv:2507.05201},
  year={2025}
}
```

---

**Interactive Demo:** [CareMap on HuggingFace Spaces](https://huggingface.co/spaces/rgiri2025/caremap-medgemma)

*CareMap - Making healthcare information accessible and actionable.*

---

## Acknowledgements

**Family.** My father, whose insights shaped the concept. My mother, whose encouragement to keep moving forward lives on in this work.

**User Research Partners.** Four professionals who generously shared their time and clinical expertise:
- [Dr. Vinodhini Sriram](https://www.pihhealth.org/find-a-doctor/physician-profile-advanced/vinodhini-sriram/), Family Medicine
- [Dr. Gaurav Mishra](https://www.linkedin.com/in/gaurav-mishra-md-mba-dfapa-99213a5/), Child, Adolescent and Adult Psychiatrist
- Dr. Manini Moudgal, Pediatrician, Mysore/Bangalore
- [Sunayana Mann](https://www.linkedin.com/in/sunayana-mann/), Digital Health Product Leader & Family Caregiver

**LLM Council.** Claude (Anthropic), ChatGPT (OpenAI), and Gemini (Google) for serving as thought partners, devil's advocates, and deep research aids across published datasets like NIH Chest X-rays.

**Claude Code.** For orchestrating the implementation, navigating dtype discrepancies between Apple MPS and Kaggle T4, managing complex multi-module codebases, and making a solo developer brave enough to attempt this competition.

**Kaggle & Google.** For hosting the MedGemma Impact Challenge and providing the opportunity to learn this model's capabilities by building with it.