Official code for Concept-Guided Noisy Negative Suppression for Zero-Shot Classification and Grounding of Chest X-Ray Findings. MICCAI 2026 early acceptance (top 9%).
Create the conda environment from the exported file:
conda env create -f environment.yml
conda activate connsIf you prefer to start from an existing environment, install the pip packages with:
pip install -r requirements.txtThe exported environment is named conns. flash-attn is included because the Rad-DINO encoder path uses it.
Place model folders and checkpoints under the repository root:
external/rad-dino-maira-2/
external/BiomedVLP-CXR-BERT-specialized/
trained_models/conns.pth
Download the external encoders from Hugging Face. The NLI model is loaded online from cross-encoder/nli-deberta-v3-small.
huggingface-cli download microsoft/rad-dino-maira-2 --local-dir external/rad-dino-maira-2
huggingface-cli download microsoft/BiomedVLP-CXR-BERT-specialized --local-dir external/BiomedVLP-CXR-BERT-specializedDownload the CoNNS checkpoint from GoogleDrive and save it as trained_models/conns.pth.
Place ChestX-Det10 under data/ChestX-Det10/. Download it from:
https://github.com/Deepwise-AILab/ChestX-Det10-Dataset
Expected ChestX-Det10 files:
data/ChestX-Det10/test.json
data/ChestX-Det10/test_imgs/
For local debug, data/ChestX-Det10 can be a symlink to an existing raw copy. The released scripts read only through data/ChestX-Det10/....
Place the other evaluation datasets at the default paths used by the scripts:
data/MS-CXR/MS_CXR_Local_Alignment_v1.1.0.csv
data/MS-CXR/preprocess/test.json
data/raw_dataset/MIMIC-CXR-JPG/files/
data/NIH-CXR/CARZero/test_list.txt
data/NIH-CXR/CARZero/chestxray14_test_text.json
data/NIH-CXR/images/
data/CheXpert/test_labels.csv
data/CheXpert/
data/Open-I/CARZero/custom.csv
data/Open-I/CARZero/openi_multi_label_text.json
data/Open-I/images/images_normalized/
data/PadChest-GR/master_table.csv
data/PadChest-GR/grounded_reports_20240819.json
data/PadChest-GR/PadChest_GR_8bit_short896/
You may refer to CARZero for data preparation except for PadChest-GR, which is at BIMCV.
For PadChest-GR evaluation helper files:
python3 data_preparation/evaluation/prepare_padchest_gr.pyTo reproduce the reported results, use the provided checkpoint at trained_models/conns.pth and run evaluation directly as below.
Run all released tasks:
bash evaluate.shUse PYTHON=/path/to/env/bin/python if the active shell is not already using the conns environment.
Run one task:
bash evaluate.sh classification_chexpertDownload MIMIC-CXR-JPG v2.1.0 from:
https://physionet.org/content/mimic-cxr-jpg/2.1.0/
Keep the official structure under data/raw_dataset/MIMIC-CXR-JPG/, including files/p10 ... files/p19, mimic-cxr-2.0.0-metadata.csv.gz, and mimic-cxr-2.0.0-split.csv.gz.
The repository keeps CoNNS training metadata under:
data/conns_training/concepts.json
data/conns_training/mimic_conns_training.csv
data/conns_training/reports_extract_concepts/
data/conns_training/yes_expressions/
data/conns_training/no_expressions/
Create the MIMIC CoNNS training CSV:
python3 data_preparation/training/create_mimic_conns_training.pyRun entity extraction with an OpenAI-compatible local LLM server:
python3 data_preparation/training/extract_entities.py \
--input-dir data/raw_dataset/MIMIC-CXR-JPG/reports \
--output-dir data/conns_training/reports_extract_concepts \
--base-url http://localhost:8000/v1 \
--model ./Llama-3.3-70B-Instruct-NVFP4 \
--workers 16 \
--skip-existingThen verify JSON and build expression statistics:
python3 data_preparation/training/verify_extracted_json.py
python3 data_preparation/training/build_expression_stats.pyThe extracted JSON must use the current prompt schema with evidential_segment and characteristics; older JSON files with only analysis need to be regenerated.
Run from the repository root:
bash train.shUse PYTHON=/path/to/env/bin/python TORCHRUN=/path/to/env/bin/torchrun if the active shell is not already using the conns environment.
Some codes are borrowed from the amazing projects: RadZero, CARZero.
