An agentic clinical reasoning system that deploys MedGemma 1.5 4B across 6 distinct roles within a dynamic 13-tool pipeline for contextual radiology reporting. Built for the Kaggle MedGemma Impact Challenge 2026.
Live Demo: https://contextrad.irad.app
Radiology diagnosis is never made from images alone. Every clinical interpretation depends on context — patient history, prior imaging, laboratory results, pathology findings, and the trajectory of disease over time. Yet most medical AI tools analyze images in isolation.
ContextRAD bridges this gap by orchestrating MedGemma as a multi-role reasoning engine integrated with clinical speech recognition, document OCR, GradCAM attention visualization, and human-in-the-loop editing.
User Inputs Agentic Pipeline (13 tools) Outputs
───────────── ────────────────────────── ─────────
Patient EHR ──→ 1. init_patient ┌─ Individual study reports
Documents ──→ 2. extract_documents (OCR) ├─ GradCAM attention heatmaps
MedASR Audio ──→ 3. transcribe_audio (MedASR) ├─ Attention difference maps
T0 Image ──→ 4. analyze_t0 (MedGemma) ├─ Pathology report
T1 Image ──→ 5. analyze_t1 (MedGemma) ├─ Structured EHR data
Pathology ──→ 6. localize_t0 (GradCAM + MedGemma) ├─ Synthesized comparison
7. localize_t1 (GradCAM + MedGemma) ├─ Integrated final report
8. analyze_pathology (MedGemma) ├─ Quality gate assessment
9. compute_comparison (GradCAM attention diff) └─ Full audit trail
10. parse_ehr (MedGemma)
11. synthesize_report (MedGemma)
12. overall_analysis (MedGemma)
13. review_and_validate (quality gate)
| Role | Description | Tool # |
|---|---|---|
| Radiology image analysis | Independent structured FINDINGS/IMPRESSION reports per study | 4, 5 |
| Finding localization | Anatomical location extraction with GradCAM attention heatmaps | 6, 7 |
| Histopathology analysis | Pathology slide reports leveraging SigLIP's pre-training | 8 |
| EHR parsing | Structured JSON extraction from free-text clinical narratives | 10 |
| Comparison synthesis | Text integration of independent reports with attention difference data | 11 |
| Overall analysis | Two-stage comprehensive integration of all clinical sources | 12 |
- MedASR (HAI-DEF) — Clinical speech recognition for hands-free dictation
- OCR — Document text extraction via pdfplumber + pytesseract
- GradCAM — Attention heatmaps from SigLIP's vision encoder (Q/K projections, last 4 layers, 16 heads)
- Attention Difference Maps — Red (increased attention) vs blue (decreased) between studies
cd radiology-twin
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
# Authenticate with Hugging Face (MedGemma requires access approval)
huggingface-cli login
# Run the application
streamlit run app.pyThree quantization modes via BitsAndBytes to fit resource-constrained environments:
- Full (bfloat16) — Maximum quality
- INT8 quantized — Balanced speed and quality
- INT4 quantized — Suitable for consumer GPUs
All processing runs locally with zero data transmission.
radiology-twin/
├── app.py # Streamlit UI (6-step wizard)
├── src/
│ ├── medgemma.py # MedGemma inference engine (6 roles)
│ ├── agent.py # 13-tool agentic pipeline orchestrator
│ ├── gradcam.py # GradCAM attention heatmap extraction (SigLIP)
│ ├── compare.py # Attention-based longitudinal comparison
│ ├── ocr.py # Clinical document OCR (pdfplumber + Tesseract)
│ ├── imaging.py # DICOM/NIfTI/PNG loader and preprocessor
│ ├── registration.py # 3D volume registration (SimpleITK)
│ └── transcription.py # MedASR audio transcription
├── media/ # Logo assets
├── .streamlit/config.toml # Streamlit configuration
├── requirements.txt # Python dependencies
├── WRITEUP.md # Competition writeup
└── README.md
- Radiology Images: PNG, JPEG, DICOM (.dcm), NIfTI (.nii/.nii.gz)
- Pathology Slides: PNG, JPEG, TIFF, BMP histopathology images
- Audio: Real-time browser recording via MedASR, or typed text
- Clinical Documents: PDF (text or scanned), PNG/JPEG/TIFF/BMP with automatic OCR
- EHR: Free-text clinical history (typed or dictated)
- Extraction-based prompts with 50–80 token limits per field
- Independent study analysis — MedGemma never sees two images simultaneously
- Deterministic comparison via GradCAM attention difference maps
- Multi-stage post-processing (disclaimer removal, CoT leakage, repetition detection)
- Every AI-generated section has inline edit buttons
- Quality gate always flags
requires_human_review=True - Full audit trail of every tool call with reasoning and timing
- Every analysis is flagged for human clinical review
- Confidence scoring with quality gate
- No diagnostic decisions are made without clinician oversight
- Fully local — no external API calls, no data leaves the device
This project was built for the Kaggle MedGemma Impact Challenge 2026.