MedVLM: Chest X-Ray Report Generation

MedVLM is a research vision-language model for generating radiology-style reports from frontal chest X-rays. The project was rebuilt from the original training notebook into a GitHub-ready package with reusable training, evaluation, and demo scripts.

This is not a clinical tool and must not be used for diagnosis.

Author: Aaryan Kakad

Highlights

Hybrid visual encoder: ResNet-50 feature extractor plus compact transformer encoder.
Causal transformer decoder with cross-attention over image tokens.
GPT-2 BPE tokenizer with explicit BOS/EOS/PAD report tokens.
Patient-level UID split to avoid train/validation leakage.
Optional pathology-weighted focal loss and balanced sampler for rare findings.
Attention-map utilities and notebook-derived result summaries.

For a full intuition-first walkthrough of the model internals, read architecture.md.

Notebook Results

These numbers are copied from the saved notebook output and should be read as exploratory validation checks, not clinical benchmark claims.

Item	Notebook output
Dataset rows	3,818 frontal IU X-Ray rows
Final hybrid split	3,435 train samples / 383 validation samples
Model size	74.36M parameters
Best logged validation loss	1.2631 at step 1926
20-sample baseline evaluation	27.5 predicted words vs 37.9 ground-truth words
Baseline term checks	cardiomegaly 0/2, effusion 8/17, pneumonia 0/2
Improved sampled checks	effusion 24/28, pneumothorax 11/22, pneumonia 1/2, opacity 1/5
Cardiomegaly follow-up check	7/10 cardiac-term detections on sampled cardiomegaly cases

The old notebook used a few presentation figures with optimistic labels. This repo keeps non-patient summary figures for provenance, but the code in src/medvlm/evaluation.py now computes sample-aligned term precision/recall counts.

Repository Layout

.
|-- assets/results/              # Non-patient summary figures from the notebook
|-- docs/                        # Results notes and model card
|-- notebooks/                   # Original exploratory notebook
|-- scripts/                     # Train, evaluate, and Gradio demo entrypoints
|-- src/medvlm/                  # Reusable package code
|-- architecture.md              # End-to-end architecture explanation
`-- tests/                       # Lightweight smoke tests

Setup

Clone the repository:

git clone https://github.com/AKMessi/medvlm.git
cd medvlm

Use Python 3.10 to 3.12. PyTorch does not currently support every Python version, so avoid Python 3.13+ for this project.

python3.12 -m venv .venv
source .venv/bin/activate
pip install -e ".[dev,app]"

Download the IU X-Ray data from Kaggle and point --data-root to the folder containing indiana_projections.csv, indiana_reports.csv, and images/images_normalized/.

kaggle datasets download -d raddar/chest-xrays-indiana-university -p data --unzip

Training

Baseline hybrid run:

python scripts/train.py \
  --data-root data/chest-xrays-indiana-university \
  --encoder-type hybrid \
  --epochs 10 \
  --batch-size 4 \
  --grad-accum-steps 4

Rare-finding focused run:

python scripts/train.py \
  --data-root data/chest-xrays-indiana-university \
  --encoder-type hybrid \
  --loss focal \
  --balanced-sampler \
  --epochs 5

Checkpoints are written to outputs/ and are ignored by git.

Evaluation

python scripts/evaluate.py \
  --checkpoint outputs/best_model_hybrid.pt \
  --data-root data/chest-xrays-indiana-university \
  --max-samples 50 \
  --output-csv outputs/validation_predictions.csv

Demo

python scripts/gradio_app.py --checkpoint outputs/best_model_hybrid.pt

Result Gallery

More extracted figures are listed in docs/GALLERY.md and stored in assets/results/.

License

Dataset files, radiology reports, X-ray images, and patient-derived qualitative figures are not covered by the Apache-2.0 code license. The IU/Open-i chest X-ray data is distributed under CC BY-NC-ND 4.0, so download and use it under its original terms. See THIRD_PARTY_NOTICES.md.

Notes

Model weights are not committed. Use GitHub Releases, Hugging Face Hub, or another artifact store for trained checkpoints.
The original notebook is preserved at notebooks/medvlm_training_original.ipynb with outputs stripped for safe redistribution.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
assets/results		assets/results
docs		docs
notebooks		notebooks
scripts		scripts
src/medvlm		src/medvlm
tests		tests
.gitignore		.gitignore
CITATION.cff		CITATION.cff
LICENSE		LICENSE
MODEL_CARD.md		MODEL_CARD.md
NOTICE		NOTICE
README.md		README.md
THIRD_PARTY_NOTICES.md		THIRD_PARTY_NOTICES.md
architecture.md		architecture.md
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MedVLM: Chest X-Ray Report Generation

Highlights

Notebook Results

Repository Layout

Setup

Training

Evaluation

Demo

Result Gallery

License

Notes

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

MedVLM: Chest X-Ray Report Generation

Highlights

Notebook Results

Repository Layout

Setup

Training

Evaluation

Demo

Result Gallery

License

Notes

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages