# Discharge Navigator — MedGemma Impact Challenge

**One notebook. One model. Fully offline clinical note extraction.**

Discharge Navigator takes raw clinical notes and extracts structured, evidence-grounded discharge packets — diagnoses with ICD-10 codes, medications with dosing, follow-ups with urgency, red flags, missing information, and a patient-friendly summary. Every claim is backed by exact text spans from the source note.

| | |
|---|---|
| **Model** | MedGemma 4B IT — Google's HAI-DEF open-weight medical model |
| **What it does** | Single inference call → 7 structured fields, each evidence-grounded |
| **Edge deployment** | 2.5 GB quantized (Q4_K_M), CPU-only, zero internet |
| **Trust mechanism** | Evidence spans verified as exact substrings of the source note |
| **Safety** | Schema validation gate rejects malformed output. Clinician reviews everything. |

### How this notebook works

This notebook auto-detects your environment and picks the best available backend:

| Environment | What happens |
|---|---|
| **Kaggle + T4 GPU** | Loads `google/medgemma-4b-it` via HuggingFace Transformers (bf16) → live extraction |
| **Local + Ollama** | Uses `williamljx/medgemma-4b-it-Q4_K_M-GGUF` via Ollama → live extraction, CPU-only |
| **No GPU, no Ollama** | Loads pre-computed evidence pack (46 extractions) → full demo, no model needed |

**Just press Run All.**

## Step 0 — Setup

Clones the [GitHub repo](https://github.com/LegenDairy93/discharge-navigator) and installs dependencies. All source code lives in `src/` — this notebook is the entrypoint, the repo is the engine.

**Key source files:**
- `src/navigator.py` — inference pipeline with retry strategy and dual prompt variants
- `src/schemas.py` — Pydantic v2 schemas with type coercion validators
- `src/grounding.py` — exact substring verification of evidence spans
- `src/prompts.py` — Prompt Variant A (contract) and Variant B (strict fallback)
- `src/hf_backend.py` — HuggingFace Transformers backend for GPU inference
- `src/demo_app.py` — full Gradio demo (Evidence Explorer, Performance Dashboard, Edge Cases, How It Works)

In [None]:
import subprocess, sys, os
from pathlib import Path

# --- Clone the repo if not already present ---
REPO_URL = 'https://github.com/LegenDairy93/discharge-navigator.git'
REPO_DIR = Path('discharge-navigator')

if not REPO_DIR.exists():
    print('Cloning repository...')
    subprocess.check_call(['git', 'clone', '--depth', '1', REPO_URL, str(REPO_DIR)])
    print('Clone complete.')
else:
    print(f'Repo already exists at {REPO_DIR}')

# Repo root IS the project root
PROJECT_DIR = REPO_DIR

# Verify project structure
assert (PROJECT_DIR / 'src' / 'demo_app.py').exists(), 'Missing src/demo_app.py'
assert (PROJECT_DIR / 'eval' / 'results' / 'metrics_summary.json').exists(), 'Missing eval results'
print(f'Project root: {PROJECT_DIR.resolve()}')

# --- Install dependencies ---
deps = ['requests', 'pydantic', 'pandas', 'gradio', 'matplotlib']
for pkg in deps:
    try:
        __import__(pkg)
    except ImportError:
        subprocess.check_call([sys.executable, '-m', 'pip', 'install', '-q', pkg])
        print(f'Installed {pkg}')
    else:
        print(f'{pkg} OK')

# Add project to Python path
src_path = str(PROJECT_DIR.resolve())
if src_path not in sys.path:
    sys.path.insert(0, src_path)

print('\nAll dependencies ready.')

## Step 1 — Detect Environment & Load Model

The notebook figures out what hardware is available and loads MedGemma accordingly:

1. **Kaggle T4** → `google/medgemma-4b-it` at bfloat16 via HuggingFace Transformers (~8 GB VRAM)
2. **Local CPU with Ollama** → `williamljx/medgemma-4b-it-Q4_K_M-GGUF` (2.5 GB, fully offline)
3. **Neither available** → pre-computed evidence pack with 46 extraction results baked in

Both live backends trace to the same HAI-DEF open-weight model on Hugging Face. Same prompts, same schema validation, same grounding logic — the only difference is precision (bf16 vs Q4_K_M quantization).

In [None]:
import json, time

BACKEND = None  # 'hf', 'ollama', or None (evidence mode)
hf_model = None
hf_tokenizer = None
ollama_model = None

is_kaggle = os.path.exists('/kaggle')

# --- Try HuggingFace (Kaggle T4 or any CUDA GPU) ---
try:
    import torch
    if torch.cuda.is_available():
        print(f'GPU detected: {torch.cuda.get_device_name(0)}')
        print(f'VRAM: {torch.cuda.get_device_properties(0).total_memory / 1e9:.1f} GB')

        # Authenticate with HuggingFace
        try:
            if is_kaggle:
                from kaggle_secrets import UserSecretsClient
                hf_token = UserSecretsClient().get_secret('HF_TOKEN')
            else:
                hf_token = os.environ.get('HF_TOKEN')

            if hf_token:
                from huggingface_hub import login
                login(token=hf_token, add_to_git_credential=False)
                print('HuggingFace authenticated.')

                from src.hf_backend import load_medgemma
                hf_model, hf_tokenizer = load_medgemma()
                BACKEND = 'hf'
                print('\nBackend: HuggingFace Transformers (GPU)')
            else:
                print('No HF_TOKEN found. Set it as a Kaggle Secret or env var.')
        except Exception as e:
            print(f'HuggingFace loading failed: {e}')
    else:
        print('No CUDA GPU detected.')
except ImportError:
    print('PyTorch not installed (expected on local without ML stack).')

# --- Fallback: Try Ollama (local CPU) ---
if BACKEND is None:
    try:
        from src.navigator import check_ollama, select_model
        models = check_ollama()
        if models:
            ollama_model = select_model(models)
            BACKEND = 'ollama'
            print(f'\nBackend: Ollama (CPU) \u2014 model: {ollama_model}')
    except Exception:
        pass

# --- Final fallback: Evidence mode ---
if BACKEND is None:
    print('\nBackend: EVIDENCE MODE (pre-computed results)')
    print('For live inference, either:')
    print('  - Enable GPU + add HF_TOKEN secret (Kaggle)')
    print('  - Start Ollama with MedGemma model (local)')

## Step 2 — Smoke Test (Live Extraction)

This runs the **full pipeline end-to-end** on a single clinical note to prove the model is loaded and the pipeline works:

`Clinical Note → MedGemma 4B → Evidence Grounding → Schema Validation`

The output shows extracted diagnoses, medications, follow-ups, and red flags — each with a grounding ratio indicating how many evidence spans were verified as exact substrings of the source note.

If no model is available, the notebook displays a pre-computed extraction (note_002 — Elevated Cardiac Enzymes case) to demonstrate what the pipeline produces.

In [None]:
from src.grounding import grounding_report

if BACKEND in ('hf', 'ollama'):
    golden_path = PROJECT_DIR / 'data' / 'golden_note.txt'
    test_note = golden_path.read_text(encoding='utf-8')
    print(f'Smoke test note: {len(test_note)} chars')
    print('Running extraction...\n')

    t0 = time.time()

    if BACKEND == 'hf':
        from src.navigator import generate_packet_hf
        packet, raw = generate_packet_hf(
            test_note, model=hf_model, tokenizer=hf_tokenizer, return_raw=True
        )
    else:
        from src.navigator import generate_packet
        packet, raw = generate_packet(
            test_note, model=ollama_model, return_raw=True
        )

    elapsed = time.time() - t0

    if packet is None:
        print(f'SMOKE TEST FAILED \u2014 could not parse output in {elapsed:.1f}s')
        print(f'Raw output preview:\n{raw[:500]}')
    else:
        d = packet.model_dump()
        report = grounding_report(packet, test_note)
        print(f'SMOKE TEST PASSED in {elapsed:.1f}s')
        print(f'  Diagnoses:   {len(d["diagnoses"]):>2}  (grounded: {report["diagnoses_grounded_ratio"]:.0%})')
        print(f'  Medications: {len(d["medications"]):>2}  (grounded: {report["meds_grounded_ratio"]:.0%})')
        print(f'  Follow-ups:  {len(d["followups"]):>2}')
        print(f'  Red flags:   {len(d["red_flags"]):>2}')
        print(f'  Missing info:{len(d["missing_info"]):>2}')
        print(f'  Overall grounded: {report["overall_grounded_ratio"]:.0%}')
else:
    print('Skipping live smoke test (no model available).\n')
    print('Pre-computed sample (note_002):')
    sample_path = PROJECT_DIR / 'eval' / 'results' / 'samples' / 'note_002.json'
    if sample_path.exists():
        sample = json.loads(sample_path.read_text(encoding='utf-8'))
        print(f'  Diagnoses:   {len(sample.get("diagnoses", []))}')
        print(f'  Medications: {len(sample.get("medications", []))}')
        print(f'  Follow-ups:  {len(sample.get("followups", []))}')
    print('\nFull eval: 46/50 parsed, 34s median, 94% diagnosis grounding.')

## Step 3 — 50-Note Evaluation Results

Full evaluation across 50 MTSamples clinical notes (CC0 license). These results were generated offline and are included in the repo at `eval/results/` for reproducibility.

**What was measured:**
- **Parse rate** — did MedGemma produce valid, schema-compliant JSON?
- **Latency** — how long per note on CPU (no GPU)?
- **Grounding accuracy** — what percentage of cited evidence spans are verified exact substrings of the source note?

**Success criteria:** ≥80% parse rate, ≥80% diagnosis grounding, median latency <120s. All three passed.

**A note on the two backends:** The smoke test above (Step 2) runs MedGemma at bfloat16 on GPU via HuggingFace Transformers. The evaluation results below were generated using the same model quantized to Q4_K_M (2.5 GB) via Ollama on CPU — the edge deployment target. Same prompts, same schema validation, same grounding logic. Minor metric differences reflect quantization trade-offs, not pipeline changes.

In [None]:
results_dir = PROJECT_DIR / 'eval' / 'results'

with open(results_dir / 'metrics_summary.json') as f:
    summary = json.load(f)

print('=' * 60)
print('  DISCHARGE NAVIGATOR \u2014 EVALUATION SUMMARY')
print('=' * 60)
print(f'  Model:      {summary["model"]}')
print(f'  Quant:      {summary["quantization"]}')
print(f'  Inference:  {summary["inference"]}')
print(f'  Dataset:    {summary["dataset"]}')
print(f'  Notes:      {summary["total_notes"]}')
print('-' * 60)
print(f'  Parse rate:           {summary["json_valid_rate"]:.0%}  ({summary["json_valid_count"]}/{summary["total_notes"]})')
print(f'  Median latency:       {summary["median_latency_s"]:.0f}s')
print(f'  P95 latency:          {summary["p95_latency_s"]:.0f}s')
print(f'  Dx grounded (mean):   {summary["diagnoses_grounded_mean"]:.0%}')
print(f'  Meds grounded (mean): {summary["medications_grounded_mean"]:.0%}')
print(f'  Overall grounded:     {summary["overall_grounded_mean"]:.0%}')
print('-' * 60)
for k, v in summary['success_criteria'].items():
    status = 'PASS' if v else 'FAIL'
    print(f'  {k}: {status}')
print('=' * 60)

In [None]:
from IPython.display import Image, display

hist_path = results_dir / 'latency_histogram.png'
if hist_path.exists():
    display(Image(filename=str(hist_path), width=800))
else:
    print('Histogram not found.')

## Step 4 — Interactive Demo

Launches the full Gradio application with four tabs:

| Tab | What it shows |
|-----|---------------|
| **Evidence Explorer** | Load any note → see extractions with highlighted evidence spans. Filter by diagnoses, medications, or follow-ups. View patient summary, red flags, and missing information with severity tiers. |
| **Performance Dashboard** | Quantitative metrics with pass/fail badges from the 50-note evaluation. |
| **Edge Cases** | The 4 notes that failed — what went wrong and why it's safe (schema gate caught everything). |
| **How It Works** | Glossary, pipeline walkthrough, and confidence/grounding explainer for clinicians. |

The demo works in all three modes — live inference produces fresh extractions, evidence mode uses the pre-computed pack. A public share link is generated automatically on Kaggle.

**The product is not the extraction — it's the verification interface.**

In [None]:
from src.demo_app import build_app, LAUNCH_KWARGS

app = build_app()

app.launch(
    share=True,
    server_name='0.0.0.0',
    server_port=7860,
    **LAUNCH_KWARGS,
)