Skip to content

Q&A Prompts: Discovering Rich Visual Clues through Mining Question-Answer Prompts for VQA requiring Diverse World Knowledge [ECCV'24]

Notifications You must be signed in to change notification settings

WHB139426/QA-Prompts

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Q&A Prompts-ECCV'24

Q&A Prompts: Discovering Rich Visual Clues through Mining Question-Answer Prompts for VQA requiring Diverse World Knowledge. This is the official implementation of the [Paper] accepted by ECCV'24.

Install

  1. Clone this repository and navigate to QA-Prompts folder
git clone https://github.com/WHB139426/QA-Prompts.git
cd QA-Prompts
  1. Install Package
conda create -n qaprompts python=3.9.16
conda activate qaprompts
pip install -r requirements.txt

Datasets

We prepare the annotations of [A-OKVQA] in ./annotations. You can directly download the annotation files from [🤗HF]

The images can be downloaded from [COCO2017], and you should organize the data as follows,

├── coco2017
│   └── train2017
│   └── val2017
│   └── test2017
├── QA-Prompts
│   └── annotations
│     └── aokvqa_v1p0_train.json
│     └── sub_qa.json
│     └── ...
│   └── datasets
│   └── models
│   └──...

Pretrained Weights of InstructBLIP

You can prepare the pretrained weights of InstructBLIP-Vicuna-7B according to [InstructBLIP].

Since we have changed the structure of the code of the model, we recommend you download the pretrained weights of EVA-CLIP, Vicuna-7b-v1.1 and QFormer directly in [🤗HF]. The pretrained weights should be organize as follows,

├── QA-Prompts
│   └── experiments
│     └── eva_vit_g.pth
│     └── qformer_vicuna.pth
│     └── query_tokens_vicuna.pth
│     └── vicuna-7b
│     └── llm_proj_vicuna.pth

Training

We recommend using GPUs with memory > 24G. Otherwise, you may need to extract the vision features in advance to save the memory usage of EVA-CLIP and avoid OOM.

CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python -m torch.distributed.launch --nproc_per_node=8 --master_port=1111 finetune_ans.py

About

Q&A Prompts: Discovering Rich Visual Clues through Mining Question-Answer Prompts for VQA requiring Diverse World Knowledge [ECCV'24]

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages