VLLaVO: Mitigating Visual Gap through LLMs

This is the offical code of VLLaVO.

Firstly, we extract the descriptions of an images by VLMs (CLIP and BLIP), then finetuning LLM (LLaMA) with the descriptions. The finetuned LLM can be used to do classification.

Prepare model

All model we used in the following list:

CLIP https://huggingface.co/laion/CLIP-ViT-H-14-laion2B-s32B-b79K
BLIP https://huggingface.co/Salesforce/blip-image-captioning-large
LLaMA2-7B-chat-hf https://huggingface.co/meta-llama/Llama-2-7b-chat-hf
FLAN-T5 https://huggingface.co/t5-base

Prepare dataset

Directly download existing dataset

We offer the dataset used in our paper on the following link: dataset link

Description Extract

If you want to extract the descriptions to construct the dataset by yourself, the following codes can be used.

CUDA_VISIBLE_DEVICES=1 python descriptions_extractor.py -s dataset/office_home/image_list/Product.txt --save_path ../datasets/Office_home --base_path dataset/office_home/

Finetune

See DG_llama.sh in ./script/bash_command for LLM model llama2.

Evaluate

See classification_llama.sh for LLM model llama2.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
datasets		datasets
figs		figs
lens		lens
script		script
templates		templates
utils		utils
readme.md		readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

datasets

datasets

figs

figs

lens

lens

script

script

templates

templates

utils

utils

readme.md

readme.md

Repository files navigation

VLLaVO: Mitigating Visual Gap through LLMs

Prepare model

Prepare dataset

Directly download existing dataset

Description Extract

Finetune

Evaluate

About

Releases

Packages

Languages

LL-a-VO/VLLaVO

Folders and files

Latest commit

History

Repository files navigation

VLLaVO: Mitigating Visual Gap through LLMs

Prepare model

Prepare dataset

Directly download existing dataset

Description Extract

Finetune

Evaluate

About

Resources

Stars

Watchers

Forks

Languages