By Leonard Salewski, A. Sophia Koepke, Hendrik Lensch and Zeynep Akata. Published in Springer LNAI xxAI and also presented at the CVPR 2022 Workshop on Explainable AI for Computer Vision (XAI4CV). A preprint is available on arXiv.
This repository is the official implementation of CLEVR-X: A Visual Reasoning Dataset for Natural Language Explanations. It contains code to generate the CLEVR-X dataset and a PyTorch dataset implementation.
Below is an example from the CLEVR dataset extended with CLEVR-X's natural language explanation:
Question: There is a purple metallic ball; what number of cyan objects are right of it?
Answer: 1
Explanation: There is a cyan cylinder which is on the right side of the purple metallic ball.
This repository contains instructions for:
The generated CLEVR-X dataset is available here: CLEVR-X dataset (~1.21 GB).
The download includes two JSON files, which contain the explanations for all CLEVR train and CLEVR validation questions (CLEVR_train_explanations_v0.7.10.json
and CLEVR_val_explanations_v0.7.10.json
respectively).
The general layout of the JSON files follows the original CLEVR JSON files. The info
key contains general information, whereas the questions
key contains the dataset itself. The latter is a list of dictionaries, where each dictionary is one sample of the CLEVR-X dataset.
Furthermore, we provide two python pickle files at the same link. Those contain a list of the image indices of the CLEVR-X train and CLEVR-X validation subsets (which are both part of the CLEVR train subset.)
Note, that we do not provide the images of the CLEVR dataset, which can be downloaded from the original CLEVR project page.
As stated above, the two python pickle files (train_images_ids_v0.7.10-recut.pkl
and dev_images_ids_v0.7.10-recut.pkl
) contain the image indices of all CLEVR-X train explanations and all CLEVR-X validation explanations.
To obtain the train samples, iterate through the samples in CLEVR_train_explanations_v0.7.10.json
and use those samples, whose image_index
is in the list contained in train_images_ids_v0.7.10-recut.pkl
.
To obtain the validation samples, iterate through the samples in CLEVR_train_explanations_v0.7.10.json
and use those samples, whose image_index
is in the list contained in dev_images_ids_v0.7.10-recut.pkl
.
All samples from the CLEVR validation subset (CLEVR_val_explanations_v0.7.10.json
) are used for the CLEVR-X test subset.
The following sections explain how to generate the CLEVR-X dataset.
The required libraries for generating the CLEVR-X dataset can be found in the environment.yaml file. To create an environment and to install the requirements use conda:
conda env create --file environment.yaml
Activate it with:
conda activate clevr_explanations
As CLEVR-X uses the same questions and images as CLEVR, it is necessary to download the CLEVR dataset. Follow the instructions on the CLEVR dataset website to download the original dataset (images, scene graphs and questions & answers).
The extracted files should be located in a folder called CLEVR_v1.0
also known as $CLEVR_ROOT
.
For further instructions and information about the original CLEVR code, it could also be helpful to refer to the CLEVR GitHub repository.
First change into the question_generation
directory:
cd question_generation
To generate explanations for the CLEVR training subset run this command:
python generate_explanations.py \
--input_scene_file $CLEVR_ROOT/scenes/CLEVR_train_scenes.json \
--input_questions_file $CLEVR_ROOT/questions/CLEVR_train_questions.json \
--output_explanations_file $CLEVR_ROOT/questions/CLEVR_train_explanations_v0.7.13.json \
--seed "43" \
--metadata_file ./metadata.json
This generation takes about 6 hours on an Intel(R) Xeon(R) Gold 5220 CPU @ 2.20GHz.
Note, setting the --log_to_dataframe
flag to true
may increase the generation time significantly, but allows dumping (parts of) the dataset as an HTML table.
First change into the question_generation
directory:
cd question_generation
To generate explanations for the CLEVR validation subset run this command:
python generate_explanations.py \
--input_scene_file $CLEVR_ROOT/scenes/CLEVR_val_scenes.json \
--input_questions_file $CLEVR_ROOT/questions/CLEVR_val_questions.json \
--output_explanations_file $CLEVR_ROOT/questions/CLEVR_val_explanations_v0.7.13.json \
--seed "43" \
--metadata_file ./metadata.json
This generation takes less than 1 hour on an Intel(R) Xeon(R) Gold 5220 CPU @ 2.20GHz.
Note, setting the --log_to_dataframe
flag to true
may increase the generation time significantly, but allows dumping (parts of) the dataset as an HTML table.
Both commands use the --input_scene_file
, --input_questions_file
and the --metadata_file
provided by the original CLEVR dataset. You can use any name for the --output_explanations_file
argument, but the dataloader expects it in the format CLEVR_<split>_explanations_<version>.json
.
Note, that the original CLEVR test set does not have publically accessible scene graphs and functional programs. Thus, we use the CLEVR validation set as the CLEVR-X test subset. The following code generates a new split of the CLEVR training set into the CLEVR-X training and validation subsets:
cd question_generation
python dev_split.py --root $CLEVR_ROOT
As each image comes with ten questions, the split is performed alongside the images instead of individual dataset samples. The code stores the image indices of each split in two separate python pickle files (named train_images_ids_v0.7.10-recut.pkl
and dev_images_ids_v0.7.10-recut.pkl
). We have published our files alongside with the dataset download and recommend using those indices.
Different baselines and VQA-X models achieve the following performance on CLEVR-X:
Model name | Accuracy | BLEU | METEOR | ROUGE-L | CIDEr |
---|---|---|---|---|---|
Random Words | 3.6% | 0.0 | 8.4 | 11.4 | 5.9 |
Random Explanations | 3.6% | 10.9 | 16.6 | 35.3 | 30.4 |
PJ-X | 80.3% | 78.8 | 52.5 | 85.8 | 566.8 |
FM | 63.0% | 87.4 | 58.9 | 93.4 | 639.8 |
For more information on the baselines and models, check the respective publications and our CLEVR-X publication itself.
For information on the license please look into the LICENSE
file.
If you use CLEVR-X in any of your works, please use the following bibtex entry to cite it:
@inproceedings{salewski2022clevrx,
title = {CLEVR-X: A Visual Reasoning Dataset for Natural Language Explanations},
author = {Leonard Salewski and A. Sophia Koepke and Hendrik P. A. Lensch and Zeynep Akata},
booktitle = {xxAI - Beyond explainable Artificial Intelligence},
pages = {85--104},
year = {2022},
publisher = {Springer}
}
You can also find our work on Google Scholar and Semantic Scholar.