Separating Skills and Concepts for Novel VQA

This repository contains the PyTorch code for the CVPR 2021 paper: Separating Skills and Concepts for Novel Visual Question Answering.

Citation

If you find this repository useful in your research, please consider citing:

@inproceedings{whitehead2021skillconcept,
  author = {Whitehead, Spencer and Wu, Hui and Ji, Heng and Feris, Rogerio and Saenko, Kate},
  title = {Separating Skills and Concepts for Novel Visual Question Answering},
  booktitle = {Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
  pages = {5632--5641},
  year = {2021}
}

Setup

Requirements

Python >= 3.6
PyTorch >= 1.6.0 with CUDA
SpaCy >= 2.3.2 and download/install en_vectors_web_lg to obtain the GloVe vectors
PyYAML

Data Download and Organization

To setup the visual features, question files, and annotation files, please refer to the 'Setup' portion of the MCAN repository (under Prerequisites). Follow this procedure exactly, until the datasets directory has the structure shown in their repository.

Concept and Reference Set Preprocessing

The scripts for running the concept discovery and reference set preprocessing yourself will be added to this repository. For the time being, we provide preprocessed files that contain concepts, skill labels (if applicable), and reference sets for each question:

train2014 and val2014 files with concepts and skill labels: This does not include reference sets and can be used for novel composition evaluation, even independently of this repository.
train2014 and val2014 question files with concepts, skill labels, and reference sets: This is the same as the above, but it includes reference sets and should be used when running the code for our approach.

You should decompress the zip file and place the JSON files in the datasets/vqa directory:

|-- datasets
    |-- coco_extract
    |   |-- ...
    |-- vqa
    |   |-- train2014_scr_questions.json
    |   |-- train2014_scr_annotations.json
    |   |-- val2014_sc_questions.json
    |   |-- val2014_sc_annotations.json
    |   |-- ...

Training

The base of the command to run the training is:

python run.py --RUN train ...

Some pertinent arguments to add are:

--VERSION: the model/experiment name that will be used to save the outputs.
--MODEL={'small', 'large'}: whether you want to use a small or large transformer model (see cfgs/small_model.yml or cfgs/large_model.yml for details). In the paper, use small, which is the default.
--USE_GROUNDING=True: whether to utilize our concept grounding loss. Default is True
--CONCEPT: specifies which concepts should not have any labeled data appear in training (e.g., --CONCEPT vehicle,car,bus,train). When used in combination with --SKILL, then the composition of the specified skill and concept(s) have their labeled data removed from training. For example, if we have --SKILL count and --CONCEPT car,bus,train, then the labels for compositions of counting and car,bus,train will not be used.
--SKILL_CONT_LOSS=True: whether utilize our skill matching loss. Note: USE_GROUNDING must be True in order to use the skill matching loss in the current implementation.
--SKILL: specifies which skill for the skill-concept composition(s) should not have any labeled data appear in training (e.g., --SKILL count).

During training, the lastest model checkpoints are saved to ckpts/ckpt_<VERSION>/last_epoch.pkl and the training logs are saved to results/log/log_run_<VERSION>.txt. Validation predictions after every epoch will be saved in the results/cache/ directory. Additionally, accuracies on novel compositions (or novel concepts) are also evaluated after each epoch.

Evaluating Novel Compositions/Concepts

While performance on novel compositions/concepts are evaluated after every epoch, they can also be evaluated separately.

Given a file containing the model predictions on the val2014 data (in the VQA v2 evaluation format), run the following to get results on the novel compositions/concepts:

python run.py --RUN valNovel --RESULT_EVAL_FILE <PREDICTION_FILE> --CONCEPT <LIST_OF_CONCEPTS> --SKILL <SKILL>

where --CONCEPT and --SKILL should be the same as the held out compositions/concepts from training (i.e., exact same arguments). If both, --CONCEPT and --SKILL are supplied, then that novel skill-concept composition is evaluated. If only, --CONCEPT is supplied, then that novel concept is evaluated.

To obtain a file with model predictions, run:

python run.py --RUN val --CKPT_PATH <PATH_TO_MODEL_CKPT>

Acknowledgements

This repository is adapted from the MCAN repository. We thank the authors for providing their code.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
cfgs		cfgs
core		core
datasets		datasets
misc		misc
results		results
utils		utils
LICENSE		LICENSE
MCAN_LICENSE		MCAN_LICENSE
README.md		README.md
requirements.txt		requirements.txt
run.py		run.py
setup.sh		setup.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Licenses found

Repository files navigation

Separating Skills and Concepts for Novel VQA

Citation

Setup

Requirements

Data Download and Organization

Concept and Reference Set Preprocessing

Training

Evaluating Novel Compositions/Concepts

Acknowledgements

About

Licenses found

Releases

Packages

Languages

License

Licenses found

SpencerWhitehead/novelvqa

Folders and files

Latest commit

History

Repository files navigation

Separating Skills and Concepts for Novel VQA

Citation

Setup

Requirements

Data Download and Organization

Concept and Reference Set Preprocessing

Training

Evaluating Novel Compositions/Concepts

Acknowledgements

About

Resources

License

Licenses found

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages