Overview

Official implementation of DE-CaGI
Causal Gradient Intervention · Debiased Learning · Evidence-Grounded Medical Visual Question Answering

Overview

DE-CaGI is the official implementation of:

Causal Gradient Intervention for Debiased and Evidence-Grounded Medical Visual Question Answering

Medical Visual Question Answering (Med-VQA) integrates medical image understanding with clinical language reasoning to automatically answer natural-language questions grounded on medical images. While recent deep learning methods have achieved strong performance on standard Med-VQA benchmarks, they often struggle to provide query-consistent and verifiable visual evidence. Under scarce evidence-level supervision and biases induced by imbalanced data distributions, models can over-rely on shortcut signals from language priors and visual priors, replacing critical visual evidence with prior-driven cues and consequently distorting both predicted answers and evidence.

DE-CaGI is a causal gradient intervention framework that achieves debiased learning and evidence grounding at the optimization level. DE-CaGI constructs auxiliary branches to characterize shortcut learning, explicitly estimates bias gradients driven by language and visual priors, and suppresses shortcut-related gradient components when updating the backbone representation module. Building on the debiased updates, DE-CaGI further introduces visual evidence gradients induced by multitask evidence supervision and imposes evidence-consistency constraints on the backbone update direction, guiding the model toward representations aligned with annotated evidence while reducing shortcut effects. Experiments on VQA-RAD and SLAKE demonstrate stable improvements on both open-ended and closed-ended questions, competitive overall accuracy, and better qualitative evidence alignment. The overall architecture of the proposed method is depicted in the figure below.

_{Overall architecture of DE-CaGI.}

The source code is publicly available at:

https://github.com/cloneiq/DE-CaGI

Key Features

Optimization-level debiasing: estimate bias gradients driven by language/visual priors and suppress shortcut-related components during backbone updates.
Evidence-grounded updates: introduce visual evidence gradients induced by multitask evidence supervision (e.g., detection / segmentation / other evidence tasks) and constrain update directions for evidence consistency.
Plug-and-play: works with common Med-VQA pipelines (image encoder + text encoder + fusion / decoder).
Benchmarks: validated on VQA-RAD and SLAKE, improving open-ended, closed-ended, and overall accuracy with better evidence alignment.

Quick Start

Clone the Repository

git clone https://github.com/cloneiq/DE-CaGI.git
cd DE-CaGI

Install Requirements

pip install -r requirements.txt

Prepare Datasets

Prepare the datasets according to the instructions in Data Preparation.

Train and Test

Run training and testing scripts as described in Train & Test.

Project Structure

DE-CaGI/
├── checkpoints/
├── dataset/
│   ├── slake/
│   │   ├── imgs/
│   │   ├── train.json
│   │   ├── valid.json
│   │   └── test.json
│   └── rad/
│       ├── ....
│       └── ....
├── main.py
├── train/
└── test.py

Data Preparation

Datasets

Please download the following datasets and place the files under the dataset/ directory.

Dataset	Description	Download
SLAKE	An English-Chinese bilingual Med-VQA benchmark containing 642 radiology images, including CT, MRI, and X-ray images, and 14,028 question-answer pairs, plus pixel-level masks and a medical knowledge graph.	SLAKE
VQA-RAD	A clinician-curated dataset built from MedPix, providing 315 radiology images and 3,515 question-answer pairs for visual question answering.	VQA-RAD

Train & Test

# Train
python main.py
# Test
python test.py

Results

Results on VQA-RAD and SLAKE

Methods	Venue	VQA-RAD Open	VQA-RAD Closed	VQA-RAD Overall	SLAKE Open	SLAKE Closed	SLAKE Overall
MEVE-BAN	MICCAI’19	40.33	73.90	59.20	75.19	71.49	77.66
MEVE-SAN	MICCAI’19	39.57	72.92	58.09	74.57	77.88	75.87
M3AE	MICCAI’22	63.10	83.31	75.40	79.83	86.30	82.37
PubMedCLIP	EACL’23	60.10	80.00	72.10	78.40	82.50	80.10
VG-CALF	Neurocomputing’25	67.00±0.47	85.50±0.38	76.10±0.96	81.40±0.24	83.80±0.43	83.30±0.13
UnICLAM	MedIA’25	59.80	82.60	73.20	81.10	85.70	83.10
CKRA	TMI’25	67.43±0.98	85.83±0.55	78.84±0.44	81.20±0.23	89.82±0.26	84.37±0.16
DeBCF	MICCAI’23	58.60±1.10	80.90±0.80	71.60±1.00	80.80±0.90	84.90±0.70	82.60±0.90
Tri-VQA	BIBM’24	60.34	82.72	73.84	81.55	85.58	83.13
MedCFVQA	VLM4Bio (at ACL'24)	-	-	56.30	-	-	85.11
CCIS-MVQA	TMI’24	68.78±0.23	79.24±0.16	75.06	80.12±0.11	86.72±0.07	84.08
DE-CaGI	Ours	69.94±0.19	84.93±0.21	78.98±0.08	84.48±0.14	88.72±0.17	86.21±0.11

Future Work

Scale DE-CaGI to larger and more open-ended Med-VQA benchmarks across institutions, modalities, and answer spaces.
Expand weak and pseudo evidence supervision (e.g., self-training, report-derived cues) to reduce reliance on scarce annotations.
Refine the intervention with finer gradient decomposition and adaptive suppression/injection across layers, tokens, and regions.
Strengthen cross-modal evidence consistency with tighter region–text alignment objectives for more verifiable evidence grounding.

Contributing

We welcome pull requests and issues. You can contribute by:

Reporting bugs or reproduction issues.
Improving documentation and usage instructions.
Adding support for additional Med-VQA datasets.
Extending DE-CaGI to new evidence supervision settings.
Benchmarking DE-CaGI with future SOTA Med-VQA methods.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Contact

First Author: Bing Liu, Kunming University of Science and Technology, Kunming, Yunnan CHINA, email: LB_violet2023@outlook.com

Corresponding Author: Lijun Liu, Associate Professor (Ph.D.), Kunming University of Science and Technology, Kunming, Yunnan CHINA, email: cloneiq@kust.edu.cn

_{Maintained for debiased, evidence-grounded, and reliable Medical Visual Question Answering research.}

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
dataset/slake		dataset/slake
imgs		imgs
models		models
utils		utils
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt
test.py		test.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Overview

Key Features

Quick Start

Clone the Repository

Install Requirements

Prepare Datasets

Train and Test

Project Structure

Data Preparation

Datasets

Train & Test

Results

Results on VQA-RAD and SLAKE

Future Work

Contributing

License

Contact

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Overview

Key Features

Quick Start

Clone the Repository

Install Requirements

Prepare Datasets

Train and Test

Project Structure

Data Preparation

Datasets

Train & Test

Results

Results on VQA-RAD and SLAKE

Future Work

Contributing

License

Contact

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages