Fine-grained Vision-and-Language Understanding Evaluations

This is the implementation of the approaches described in the paper:

Emanuele Bugliarello, Laurent Sartran, Aishwarya Agrawal, Lisa Anne Hendricks and Aida Nematzadeh. Measuring Progress in Fine-grained Vision-and-Language Understanding. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics, Jul 2023.

We provide the code for reproducing our results on open source models.

Models

models/ contains the source code for our evaluated models.

We added the SVO.py, VALSE.py, VSR.py, Winoground.py scripts to each model's source code to evaluate them on our set of fine-grained tasks.

Evaluation

We provide all of our evaluation scripts in experiments/.

For ALBEF, BLIP, PEVL and X-VLM, task configuration files are stored in configs_ours/ folders.

If you do not have the Winoground data, run the following first:

from datasets import load_dataset

auth_token = ""  # FIXME: Set your HuggingFace authentication token.
test_dset = load_dataset('facebook/winoground', use_auth_token=auth_token)['test']

Workdir

The following shows the structure of our working directory.

Ours is set to BASE_DIR="/workdir" in the scripts shared in experiments/. Update it according to your setup.

Click to expand

checkpoints/
    | ALBEF/
    |   | ALBEF_4M.pth
    |   | ALBEF.pth
    | BLIP/
    |   | model_base.pth
    |   | model_base_14M.pth
    |   | model_base_capfilt_large.pth
    |   | model_large.pth
    | ClipCap/
    |   | clipcap_cc_weights.pt
    |   | clipcap_coco_weights.pt
    | PEVL/
    |   | grounding.pth
    |   | pevl_pretrain.pth
    |   | vrd.pth
    | X-VLM/
    |   | 16m_base_model_state_step_199999.th
    |   | 4m_base_model_state_step_199999.th
    | backbones/
    |   | huggingface/bert-base-uncased/
    |   |   | config.json
    |   |   | pytorch_model.bin
    |   |   | tokenizer_config.json
    |   |   | vocab.txt
    |   | hub/
    |   |   | swin_base_patch4_window7_224_22k.pth
data/
    | svo_probes/
    |   | annotations/test.jsonl
    |   | images/
    | VALSE/data/
    |   | images/
    |   | actant-swap.json
    |   | ...
    | vsr/
    |   | annotations/
    |   |   | dev.jsonl
    |   |   | test.jsonl
    |   | images/
envs/
    | albef/
    | blip/
    | lavis/
    | x-vlm/
fine-grained-evals/

For reference, data/ provides the text files used in our evaluation. We remark that VSR has been updated after our experiments.

envs/ contains venv environments based on the requirements.txt files of each model.

NB: You might need to pip install datasets.

License

This work is licensed under the MIT license. See LICENSE for details. Third-party software and data sets are subject to their respective licenses.
If you find our code/data/models or ideas useful in your research, please consider citing the paper:

@inproceedings{bugliarello-etal-2023-measuring,
    title = "Measuring Progress in Fine-grained Vision-and-Language Understanding",
    author = "Bugliarello, Emanuele   and
      Sartrain, Laurent  and
      Agrawal, Aishwarya  and
      Hendricks, Lisa Anne  and
      Nematzadeh, Aida",
    booktitle = "Proceedings of the 61th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
    month = jul,
    year = "2023",
    address = "Toronto, Canada",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/2305.07558",
}

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
data		data
experiments		experiments
models		models
notebooks		notebooks
outputs		outputs
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Fine-grained Vision-and-Language Understanding Evaluations

Models

Evaluation

Workdir

License

About

Releases

Packages

Languages

License

e-bug/fine-grained-evals

Folders and files

Latest commit

History

Repository files navigation

Fine-grained Vision-and-Language Understanding Evaluations

Models

Evaluation

Workdir

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages