[Paper] [Code] [Video] [DeepREAL Lab]
This repository holds the Pytorch implementation of Beyond Accuracy: Ensuring Correct Predictions with Correct Rationales by Tang Li, Mengmeng Ma, and Xi Peng. If you find our paper and code useful in your research, please consider citing:
@inproceedings{li2024beyond,
title={Beyond Accuracy: Ensuring Correct Predictions with Correct Rationales},
author={Li, Tang and Ma, Mengmeng and Peng, Xi},
booktitle={Proceedings of the Annual Conference on Neural Information Processing Systems (NeurIPS)},
year={2024}
}
Can we trust Large Foundation Models (LFMs) in their predictions? Our findings say NO! There are many unsafe prediction examples:
To address this issues, we propose Double-Correct Predictions (DCP). Please refer to our paper for method details.
- Fine-tuned on ImageNet: DCP-ViT-B/32
This repository reproduces our results on ImageNet, CIFAR-10/100, CUB, Caltech101, OxfordPets, Food101, SUN397, and Stanford Cars datasets, please download these datasets as needed. Our code is build upon Python3 and Pytorch v2.0.1 on Ubuntu 18.04. Please install all required packages by running:
pip install -r requirements.txt
Our structured rationales capture the major attributes and their sub-attributes that lead to the recognition of objects. Our dataset offers over 4,000 unique rationales covering all 1,000 categories from ImageNet. The dataset is in .JSON format:
./DCP/Rationale Dataset/rationale_imagenet.json
To curate customized rationale datasets, you will need to add your OpenAI API token and run the following notebook. Note that in notebook showcase our best prompt for this task, you can change to any category list as you want or modify the prompts as needed.
./DCP/generate_graph.ipynb
OpenAI will update their API library, please modify the code accordingly if needed.
Before pretraining, please replace the paths in load.py to your own datasets and run:
sh run_cross_recon.sh
Note that we parse the ontology graphs in the rationale dataset into visual concepts in ./DCP/descriptors/my_imagenet.json.
We provide example code for reproducing zero-shot prediction accuracy and rationale disentanglability:

To evaluate the zero-shot prediction accuracy, please run:
./DCP/evaluation.ipynb
To evaluate rationale disentanglability, please run:
./DCP/disentanglability.ipynb
Part of our code is borrowed from the following repositories.
- Visual Classification via Description from Large Language Models
- Generic Attention-model Explainability for Interpreting Bi-Modal and Encoder-Decoder Transformers
- Interpreting CLIP's Image Representation via Text-Based Decomposition
We thank to the authors for releasing their codes. Please also consider citing their works.
