Codebase for Pragmatic Radiology Report Generation

We expose the following functionalities discussed in the paper

Report cleaning with large language models
Training an image classifier for detecting positive findings
Finetuning LLaMA on predicted conditions and indications
Generating reports with Pragmatic-LLaMA
Evaluating generated reports

This code mainly works for MIMIC-CXR [1], but can be adapted slightly to work with other chest X-ray datasets. We will discuss the relevant changes in each section. Make sure to obtain the license if you are working with MIMIC-CXR.

Installing virtual environments

There are two environments to be installed: one for the project in general requirements.txt and one specifically for evaluation eval_requirements.txt. We need a separate environment for evaluation due to version conflicts coming from RadGraph.

Obtaining the relevant data and models

The input to our model is a tuple of (image, indication). Please follow the relevant dataset instructions to obtain them, especially the indication section. For MIMIC-CXR, you can use create_section_files.py to extract the indication, findings, and impression sections.

Our code uses CheXbert [2] to label reports and RadGraph [3] for evaluation. Please download the two model checkpoints here and put them under "./models/".

Training an image classifier for detecting positive findings

Please refer to the instructions under /image_model/ for how to train an image classifier to detect positive findings according to the CheXbert labels.

Report cleaning with large language models

deepspeed --num_gpus=<insert> report_cleaning.py --chexbert_path <insert> --dataset_path <insert> --output_dir <insert>

This cleaning method works best for one sentence at a time. The process to split sentences and keep track of their report IDs can be specific to the dataset, so we leave that implementation to the user.

Finetuning LLaMA on predicted conditions and indications

We first format the indication and groundtruth CheXbert labels into a JSON file that can be used to finetune LLaMA.

python format_llama_input.py --indication_path <insert> --impression_path <insert> --outpath <insert>

Follow Alpaca's finetuning instructions to finetune a LLaMA model to generate radiology reports. Put the path to the above JSON file for --data_path.

Generating reports with Pragmatic-LLaMA

Insert the path to your finetuned Pragmatic-LLaMA model, path to indications, path to the directory containing the vision model and tuned classification thresholds, and specify an output path for predicted vision labels. This helps save time on subsequent runs on the same images by not having to re-run the classifier.

python pragmatic_llama_inference.py --llama_path <insert> --indication_path <insert> --vision_path <insert> --image_path <insert> --vision_out_path <insert> --outpath <insert>

Evaluating generated reports

Once you have generated reports using Pragmatic-LLaMA or any other model, they can be evaluate using the command below. Note that --out_path should be a CSV file. If it ends with "filename.csv", there will also be an output file that ends with "filename_avg.csv" that contains the average (over all evaluated reports) scores of metrics. filename.csv itself saves the scores per-report. We graciously borrow much of the evaluation code from Yu, Endo, and Krishnan et al. [4]

python evaluate.py --gt_path <insert> --gen_path <insert> --out_path <insert>

[1] Johnson, Alistair, Pollard, Tom, Mark, Roger, Berkowitz, Seth, and Steven Horng. "MIMIC-CXR Database" (version 2.0.0). PhysioNet (2019). https://doi.org/10.13026/C2JT1Q.

[2] Smit, Akshay, Saahil Jain, Pranav Rajpurkar, Anuj Pareek, Andrew Y. Ng, and Matthew P. Lungren. "CheXbert: combining automatic labelers and expert annotations for accurate radiology report labeling using BERT." arXiv preprint arXiv:2004.09167 (2020).

[3] Jain, Saahil, Ashwin Agrawal, Adriel Saporta, Steven QH Truong, Du Nguyen Duong, Tan Bui, Pierre Chambon et al. "Radgraph: Extracting clinical entities and relations from radiology reports." arXiv preprint arXiv:2106.14463 (2021).

[4] Yu, Feiyang, Mark Endo, Rayan Krishnan, Ian Pan, Andy Tsai, Eduardo Pontes Reis, Eduardo Kaiser Ururahy Nunes Fonseca et al. "Evaluating progress in automatic chest x-ray radiology report generation." Patterns 4, no. 9 (2023).

Name		Name	Last commit message	Last commit date
Latest commit History 49 Commits
CXRMetric		CXRMetric
image_model		image_model
prompts		prompts
README.md		README.md
eval_requirements.txt		eval_requirements.txt
evaluate.py		evaluate.py
finetune.py		finetune.py
finetune.sh		finetune.sh
format_llama_input.py		format_llama_input.py
pragmatic_llama_inference.py		pragmatic_llama_inference.py
report_cleaning.py		report_cleaning.py
requirements.txt		requirements.txt
utils_finetune.py		utils_finetune.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Codebase for Pragmatic Radiology Report Generation

Installing virtual environments

Obtaining the relevant data and models

Training an image classifier for detecting positive findings

Report cleaning with large language models

Finetuning LLaMA on predicted conditions and indications

Generating reports with Pragmatic-LLaMA

Evaluating generated reports

About

Releases

Packages

Contributors 3

Languages

ChicagoHAI/llm_radiology

Folders and files

Latest commit

History

Repository files navigation

Codebase for Pragmatic Radiology Report Generation

Installing virtual environments

Obtaining the relevant data and models

Training an image classifier for detecting positive findings

Report cleaning with large language models

Finetuning LLaMA on predicted conditions and indications

Generating reports with Pragmatic-LLaMA

Evaluating generated reports

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages