Skip to content

Letian2003/C-VQA

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

24 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

C-VQA: Counterfactual Reasoning VQA Dataset

This is the code and data for the paper What If the TV Was Off? Examining Counterfactual Reasoning Abilities of Multi-modal Language Models.

Dataset

The dataset directory is C-VQA. You can find the questions in .csv files.

Download Images

After cloning:

pip install gdown
bash download_images.sh

Scripts

The scripts directory contains all required scripts for running models in the paper.

Before you run a script, install the corresponding model and get the weights. Then put the script in the root directory of the model.

Please change PATH_TO_IMAGES in the scripts to the actual directory of images.

Please change PATH_TO_MODEL in the scripts for ViperGPT with different code generators to the actual directory of models.

For example, to run BLIP on C-VQA, run this command in the root directory of LLaVA:

python run_eval_lavis.py --model-name blip2_t5 --model-type pretrain_flant5xxl --query PATH_TO_CSV_FILE

You can find more commands in scripts/README.

After you get the results, run format_response.py to convert raw responses to formatted responses (a single number or a single yes or no). Then run calc_acc.py to get quantitative results of the formatted responses. Remenber to fill in file names in these two scripts.

Download Code Generator Models

Change YOUR_HUGGINGFACE_TOKEN in download_model.py to your huggingface token. Then run:

pip install huggingface_hub
python download_model.py

You can add more code generators in download_model.py by adding models in repo_ids and local_dirs.

Citation

If this code is useful for your research, please consider citing our work.

@InProceedings{zhang2023cvqa,
    author    = {Zhang, Letian and Zhai, Xiaotong and Zhao, Zhongkai and Wen, Xin and Zhao, Bingchen},
    title     = {What If the TV Was Off? Examining Counterfactual Reasoning Abilities of Multi-Modal Language Models},
    booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops},
    year      = {2023}
}