Skip to content

Evaluation codes of "From Images to Textual Prompts: Zero-shot VQA with Frozen Large Language Models".

Notifications You must be signed in to change notification settings

CR-Gjx/Img2Prompt

Repository files navigation

Evaluation Code of Img2Prompt

From Images to Textual Prompts: Zero-shot VQA with Frozen Large Language Models

This is the eveluation code for Img2Prompt-VQA paper. We public it evaluation codes.

Demo

We include an interactive demo Colab notebook to show Img2Prompt-VQA inference workflow:

  1. Image-question matching: compute the relevancy score of the image patches wrt the question, and remove the generated noisy captions with low relevancy score.
  2. Image captioning: generate question-guided captions based on the relevancy score.
  3. Question Generation: generate questions based on the synthetic answers and captions.
  4. Large Language Model: Pre-trained lagre language models, e.g. OPT/GPT-3

Zero-Shot Evaluation

Model End-to-End Training? VQAv2 val VQAv2 test OK-VQA test AOK-VQA val AOK-VQA test
Frozen-7B 29.5 - 5.9 - -
Flamingo-9B - 51.8 44.7 - -
Flamingo-80B - 56.3 50.6 - -
Img2Prompt-VQA-OPT13B x 57.1 57.3 39.9 33.3 33.0
Img2Prompt-VQA-OPT30B x 59.5 60.4 41.8 36.9 36.0
Img2Prompt-VQA-OPT66B x 59.9 60.3 43.2 38.7 38.2
Img2Prompt-VQA-OPT175B x 60.6 61.9 45.6 42.9 40.7

To reproduce these evaluation results of Img2LLM-VQA with different LLMs, you can follow the next steps:

Firstly, you should download the generated caption question files from this link, and put them in the caption_question_files folder. For example, you can download 'okvqa_question.json', 'okvqa_caption.json' and 'okvqa_ans_to_cap_dict.json' for reproducing results of okvqa results.

Then download the 2014_coco val anotation file in link, and put it in annotation_new folder

Then you can run the shell in folder VL_captioning to reproduce results, e.g.

$ ./run_okvqa.sh

Citation

If you find this code to be useful for your research, please consider citing.

@article{guo2022images,
  title={From Images to Textual Prompts: Zero-shot VQA with Frozen Large Language Models},
  author={Guo, Jiaxian and Li, Junnan and Li, Dongxu and Tiong, Anthony Meng Huat and Li, Boyang and Tao, Dacheng and Hoi, Steven CH},
  journal={arXiv preprint arXiv:2212.10846},
  year={2022}
}

About

Evaluation codes of "From Images to Textual Prompts: Zero-shot VQA with Frozen Large Language Models".

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published