We show some results on synthetic data from CLEVR. These results use a significantly smaller model, and can be trained on a single GPU, allowing for easier reproducibility and idea validation.
Overall Accuracy | Count | Exist |
Compare Number | Query Attribute | Compare Attribute | Url | Size |
---|---|---|---|---|---|---|---|
99.7 | 99.3 | 99.9 | 99.4 | 99.9 | 99.9 | model | 426MB |
The training is in two stages: first we pre-train the detection only on a subset of CLEVR, that we call CLEVR-Medium, which contains sentences that contain at most one reference to each object in the scene. Then we fine-tune on the full dataset, with the additional question answering loss.
The config for this dataset can be found in configs/clevr.json and is also shown below:
{
"combine_datasets": ["clevr"],
"combine_datasets_val": ["clevr"],
"clevr_img_path": "",
"clevr_ann_path": "/path/to/clevr/clevr_annotations/full",
"split_qa_heads": 1,
"do_qa": 1,
"clevr_variant": "normal"
}
- Download the original CLEVR images from : CLEVR webpage and update the
clevr_img_path
to the folder containing the images. - Download our pre-processed annotations that are converted to coco format: Pre-processed annotations and update the
clevr_ann_path
to this folder with pre-processed annotations. The scripts to reproduce these annotations are located in scripts/clevr.
You can run an evaluation of our pre-train model on the val set as follows:
python main.py --batch_size 64 --dataset_config configs/clevr.json --num_queries 25 --text_encoder_type distilroberta-base --backbone resnet18 --resume https://zenodo.org/record/4721981/files/clevr_checkpoint.pth --eval
Alternatively, you can also dump the model's predictions on the test (or val) set. For that you'll need the questions from the CLEVR webpage.
python scripts/eval_clevr.py --batch_size 64 --dataset_config configs/clevr.json --resume https://zenodo.org/record/4721981/files/clevr_checkpoint.pth --clevr_eval_path /path/to/CLEVR_v1.0/questions/ --split test
The config for this step can be found in configs/clevr_pretrain.json.
Adjust clevr_img_path
and clevr_ann_path
according to your own path.
The training command for this step is the following (change the output dir if you need):
mkdir step1
python main.py --dataset_config configs/clevr_pretrain.json --backbone "resnet18" --num_queries 25 --batch_size 64 --schedule linear_with_warmup --text_encoder_type distilroberta-base --output_dir step1 --epochs 30 --lr_drop 20
First, adjust the clevr_ann_path
and clevr_img_path
as in Step 1
The training command for this step is the following (change the output dir if you need):
mkdir step2
python main.py --dataset_config configs/clevr.json --backbone "resnet18" --num_queries 25 --batch_size 64 --schedule linear_with_warmup --text_encoder_type distilroberta-base --output_dir step2 --load step1/checkpoint.pth --epochs 30 --lr_drop 20
Before Fine-tuning | After Fine-tuning | Url | Size |
---|---|---|---|
59.9 | 81.7 | model | 426MB |
The CLEVR-Humans evaluates tests the generalization capabilities of the model to free-form, human-generated questions. We evaluate performance of the model trained on regular CLEVR, before and after fine-tuning.
The config for this dataset can be found in configs/clevr_humans.json and is also shown below:
{
"combine_datasets": ["clevr_question"],
"combine_datasets_val": ["clevr_question"],
"clevr_img_path": "",
"clevr_ann_path": "/path/to/CLEVR-Humans/",
"split_qa_heads": 1,
"clevr_variant": "humans",
"no_detection": 1,
"do_qa": 1
}
The images are the same as regular CLEVR, but you need to download the annotations from the CLEVR Humans webpage. Edit the clevr_ann_path
accordingly.
You can run an evaluation of our pre-train model on the val set as follows:
python main.py --batch_size 64 --dataset_config configs/clevr_humans.json --num_queries 25 --text_encoder_type distilroberta-base --backbone resnet18 --resume https://zenodo.org/record/4721981/files/clevr_humans_checkpoint.pth --eval
Alternatively, you can also dump the model's predictions on the test set.
python scripts/eval_clevr.py --batch_size 64 --dataset_config configs/clevr_humans.json --resume https://zenodo.org/record/4721981/files/clevr_humans_checkpoint.pth --clevr_eval_path /path/to/CLEVR-Humans/ --split test
Assuming your CLEVR model is in step2/checkpoint.pth
, you can fine-tune on CLEVR-Humans as follows:
mkdir humans
python main.py --dataset_config configs/clevr_human.json --backbone "resnet18" --num_queries 25 --batch_size 64 --schedule linear_with_warmup --text_encoder_type distilroberta-base --output_dir humans --load step2/checkpoint.pth --epochs 60 --lr_drop 40
TestA | TestB | Url | Size |
---|---|---|---|
99.8 | 76.7 | model | 426MB |
The CLEVR-CoGenT evaluates tests the compositonal generalization capabilities of the model.
The config for this dataset can be found in configs/clevr_cogent.json and is also shown below:
{
"combine_datasets": ["clevr"],
"combine_datasets_val": ["clevr"],
"clevr_img_path": "",
"clevr_ann_path": "/path/to/clevr/clevr_annotations/cogent_full",
"split_qa_heads": 1,
"do_qa": 1,
"clevr_variant": "cogent"
}
Download the original CLEVR-CoGenT images from : CLEVR webpage and update the clevr_img_path
to the folder containing the images.
You can run an evaluation of our pre-train model on the valA set as follows:
python main.py --batch_size 64 --dataset_config configs/clevr.json --num_queries 25 --text_encoder_type distilroberta-base --backbone resnet18 --resume https://zenodo.org/record/4721981/files/clevr_checkpoint.pth --eval
Alternatively, you can also dump the model's predictions on the testB set. For that you'll need the questions from the CLEVR webpage.
python scripts/eval_clevr.py --batch_size 64 --dataset_config configs/clevr_cogent.json --resume https://zenodo.org/record/4721981/files/clevr_cogent_checkpoint.pth --clevr_eval_path /path/to/CLEVR_CoGenT_v1.0/questions/ --split testA
Replace testA
with testB
to get the predictions on the other test set.
The training is similar to regular CLEVR. Follow the instructions there, using respectively configs/clevr_cogent_pretrain.json for step 1 and configs/clevr_cogent.json for step2.