Skip to content

ahmedirad/contextrad

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ContextRAD — Agentic Radiology Reporting Powered by MedGemma 1.5

An agentic clinical reasoning system that deploys MedGemma 1.5 4B across 6 distinct roles within a dynamic 13-tool pipeline for contextual radiology reporting. Built for the Kaggle MedGemma Impact Challenge 2026.

Live Demo: https://contextrad.irad.app

Why ContextRAD

Radiology diagnosis is never made from images alone. Every clinical interpretation depends on context — patient history, prior imaging, laboratory results, pathology findings, and the trajectory of disease over time. Yet most medical AI tools analyze images in isolation.

ContextRAD bridges this gap by orchestrating MedGemma as a multi-role reasoning engine integrated with clinical speech recognition, document OCR, GradCAM attention visualization, and human-in-the-loop editing.

Architecture

User Inputs                 Agentic Pipeline (13 tools)               Outputs
─────────────              ──────────────────────────                ─────────
Patient EHR  ──→  1. init_patient                                   ┌─ Individual study reports
Documents    ──→  2. extract_documents (OCR)                        ├─ GradCAM attention heatmaps
MedASR Audio ──→  3. transcribe_audio (MedASR)                      ├─ Attention difference maps
T0 Image     ──→  4. analyze_t0 (MedGemma)                          ├─ Pathology report
T1 Image     ──→  5. analyze_t1 (MedGemma)                          ├─ Structured EHR data
Pathology    ──→  6. localize_t0 (GradCAM + MedGemma)               ├─ Synthesized comparison
                  7. localize_t1 (GradCAM + MedGemma)               ├─ Integrated final report
                  8. analyze_pathology (MedGemma)                    ├─ Quality gate assessment
                  9. compute_comparison (GradCAM attention diff)     └─ Full audit trail
                  10. parse_ehr (MedGemma)
                  11. synthesize_report (MedGemma)
                  12. overall_analysis (MedGemma)
                  13. review_and_validate (quality gate)

MedGemma Roles

Role Description Tool #
Radiology image analysis Independent structured FINDINGS/IMPRESSION reports per study 4, 5
Finding localization Anatomical location extraction with GradCAM attention heatmaps 6, 7
Histopathology analysis Pathology slide reports leveraging SigLIP's pre-training 8
EHR parsing Structured JSON extraction from free-text clinical narratives 10
Comparison synthesis Text integration of independent reports with attention difference data 11
Overall analysis Two-stage comprehensive integration of all clinical sources 12

Additional Tools

  • MedASR (HAI-DEF) — Clinical speech recognition for hands-free dictation
  • OCR — Document text extraction via pdfplumber + pytesseract
  • GradCAM — Attention heatmaps from SigLIP's vision encoder (Q/K projections, last 4 layers, 16 heads)
  • Attention Difference Maps — Red (increased attention) vs blue (decreased) between studies

Setup

cd radiology-twin

python3 -m venv venv
source venv/bin/activate

pip install -r requirements.txt

# Authenticate with Hugging Face (MedGemma requires access approval)
huggingface-cli login

# Run the application
streamlit run app.py

Edge AI Deployment

Three quantization modes via BitsAndBytes to fit resource-constrained environments:

  • Full (bfloat16) — Maximum quality
  • INT8 quantized — Balanced speed and quality
  • INT4 quantized — Suitable for consumer GPUs

All processing runs locally with zero data transmission.

Project Structure

radiology-twin/
├── app.py                   # Streamlit UI (6-step wizard)
├── src/
│   ├── medgemma.py          # MedGemma inference engine (6 roles)
│   ├── agent.py             # 13-tool agentic pipeline orchestrator
│   ├── gradcam.py           # GradCAM attention heatmap extraction (SigLIP)
│   ├── compare.py           # Attention-based longitudinal comparison
│   ├── ocr.py               # Clinical document OCR (pdfplumber + Tesseract)
│   ├── imaging.py           # DICOM/NIfTI/PNG loader and preprocessor
│   ├── registration.py      # 3D volume registration (SimpleITK)
│   └── transcription.py     # MedASR audio transcription
├── media/                   # Logo assets
├── .streamlit/config.toml   # Streamlit configuration
├── requirements.txt         # Python dependencies
├── WRITEUP.md               # Competition writeup
└── README.md

Supported Inputs

  • Radiology Images: PNG, JPEG, DICOM (.dcm), NIfTI (.nii/.nii.gz)
  • Pathology Slides: PNG, JPEG, TIFF, BMP histopathology images
  • Audio: Real-time browser recording via MedASR, or typed text
  • Clinical Documents: PDF (text or scanned), PNG/JPEG/TIFF/BMP with automatic OCR
  • EHR: Free-text clinical history (typed or dictated)

Anti-Hallucination Design

  • Extraction-based prompts with 50–80 token limits per field
  • Independent study analysis — MedGemma never sees two images simultaneously
  • Deterministic comparison via GradCAM attention difference maps
  • Multi-stage post-processing (disclaimer removal, CoT leakage, repetition detection)

Human-in-the-Loop

  • Every AI-generated section has inline edit buttons
  • Quality gate always flags requires_human_review=True
  • Full audit trail of every tool call with reasoning and timing

Safety

  • Every analysis is flagged for human clinical review
  • Confidence scoring with quality gate
  • No diagnostic decisions are made without clinician oversight
  • Fully local — no external API calls, no data leaves the device

License

This project was built for the Kaggle MedGemma Impact Challenge 2026.

About

ContextRAD — Agentic Radiology Reporting Powered by MedGemma 1.5 | Kaggle MedGemma Impact Challenge 2026

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages