-
[2025.04.04] Our paper was selected as the Highlight paper of CVPR 2025.
-
[2025.02.29] Our paper was successfully accepted by CVPR 2025.
-
[2025.02.08] We release the official code of VPS, a new interpretation mechanism.
-
[2024.09.30] We begin to investigate the potential of interpretability in object detection.
For our interpretation method, the packages we use are relatively common. Please mainly install pytorch, etc.
We provide code to explain Grounding DINO, but please install its dependencies first: https://github.com/IDEA-Research/GroundingDINO.
For explaining Florence-2, please install its dependencies: https://huggingface.co/microsoft/Florence-2-large-ft
For explaining traditional detectors, please install MMDetection v3.3: https://github.com/open-mmlab/mmdetection/
In addition, please follow the datasets/readme.md and ckpt/readme.md to organize the dataset and download the weights of the relevant detectors.
You can experience the interpretability of a single image directly in the Jupyter notebook.
- Grounding DINO Interpretation (Detection): tutorial
- Florence-2 Interpretation (Detection): tutorial
- Florence-2 Interpretation (Visaul Grounding): tutorial
We provide some results of our approach on interpreting object detection models.
Note: The tank picture is from the Internet.
Prepare the datasets following here.
Download the benchmark files and put them into ./datasets from https://huggingface.co/datasets/RuoyuChen/VPS_benchmark.
Run (more instructions are in fold ./scripts):
./script/groundingdino_coco_correct.shVisualization:
python -m visualization.visualize_ours \
--explanation-dir submodular_results/grounding-dino-coco-correctly/slico-1.0-1.0-division-number-100 \
--Datasets datasets/coco/val2017Evaluation faithfulness:
python -m evals.eval_AUC_faithfulness \
--explanation-dir submodular_results/grounding-dino-coco-correctly/slico-1.0-1.0-division-number-100Evaluation location:
python -m evals.eval_energy_pg \
--Datasets datasets/coco/val2017 \
--explanation-dir submodular_results/grounding-dino-coco-correctly/slico-1.0-1.0-division-number-100SMDL-Attribution: SOTA attribution method based on submodular subset selection
Grounding DINO: an open-set object detector.
Florence-2: a novel vision foundation model with a unified, prompt-based representation for a variety of computer vision and vision-language tasks.
MMDetection V3.3: an open source object detection toolbox based on PyTorch.
@article{chen2024interpreting,
title={Interpreting Object-level Foundation Models via Visual Precision Search},
author={Chen, Ruoyu and Liang, Siyuan and Li, Jingzhi and Liu, Shiming and Li, Maosen and Huang, Zheng and Zhang, Hua and Cao, Xiaochun},
journal={arXiv preprint arXiv:2411.16198},
year={2024}
}



