Skip to content

NilakshanKunananthaseelan/LaViP

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LaViP: Language-Grounded visual Prompting

This repo contains the official implementation of our AAAI 2024 paper

Abstract

We introduce a language-grounded visual prompting method to adapt the visual encoder of vision-language models for downstream tasks. By capitalizing on language integration, we devise a parameter-efficient strategy to adjust the input of the visual encoder, eliminating the need to modify or add to the model's parameters. Due to this design choice, our algorithm can operate even in black-box scenarios, showcasing adaptability in situations where access to the model's parameters is constrained. We will empirically demonstrate that, compared to prior art, grounding visual prompts with language enhances both the accuracy and speed of adaptation. Moreover, our algorithm excels in base-to-novel class generalization, overcoming limitations of visual prompting and exhibiting the capacity to generalize beyond seen classes. We thoroughly assess and evaluate our method across a variety of image recognition datasets, such as EuroSAT, UCF101, DTD, and CLEVR, spanning different learning situations, including few-shot learning, base-to-novel class generalization, and transfer learning.


Research Highlights

  • Language-Grounded Input-Dependent Dynamic Visual Prompting: To our best knowledge, this is the first paper that explores the language-grounded input-dependent visual prompting without using an external model. For this, we devise low-rank learnable vectors. The language integration both imporves classification accuracy and 3x convergence speed.
  • New Algorithm for Extending Visual Prompting to Generlize Beyond Seen Classes: We propose a Kronecker based encoding fusion method to generalize visual prompitng method to beyond seen classes during training.
  • Support for Visual Prompting in gradient free environment: By equipping language grounded input-aware prompts, LaViP can be utilized in a gradient free environment(ie:where access to model parameters and backpropgation are forbidden) with more thanr 15x faster convergence.


Experiments

  • main performance (Tab. 1, Tab. 2 and Tab.3 of paper)

    • 12 transfer learning benchmarks - [Caltech101, OxfordPets, StanfordCars, Flowers102, Food101, FGVCAircraft, SUN397, DTD, EuroSAT, Resisc45, CLEVR, UCF101]

Setup

  • Run the following commands to create the environment.
    • Note that we slightly modifed the Dassl.pytorch to mm_dassl for flexible experiments.
# Clone this repo
git clone https://github.com/NilakshanKunananthaseelan/LaViP.git
cd lavip_5492

# Create a conda environment
conda create -y -n lavip python=3.9.16

# Activate the environment
conda activate lavip

# Install torch and torchvision
# Please refer to https://pytorch.org/ if you need a different cuda version
conda install pytorch==1.13.1 torchvision==0.14.1 torchaudio==0.13.1 pytorch-cuda=11.7 -c pytorch -c nvidia


# Install dependencies
cd mm_dassl
pip install -r requirements.txt

Data preparation


Run

transfer learning benchmarks

  • Move to LaViP/scripts/method_name directory

  • On the targeted dataset, run the commands with dataset-specific configs as below:

# for few-shot learning {1:dataset,2:epoch,3:init_lr}
sh fsl_train dtd 300 1.0

# for base-to-nove generalization {1:dataset, 2:epoch, 3:init_lr}
sh base2new_train.sh dtd 300 1.0

# for transfer learning {1:dataset, 2:epoch, 3:init_lr}
sh train_full.sh dtd 300 1.0

# for gradient free learning {1:dataset, 2:epoch, 3:moms, 4:gamma, 5:spsa_c}
sh bb_train.sh dtd 300 0.9 0.2 0.005  
# for VP (white-box), specify {1:dataset, 2:epoch, 3:lr}
sh tl_bench.sh svhn 1000 40.0


Acknowledgements

Our overall experimental pipeline is based on CoOp, CoCoOp repository. For baseline construction, we bollowed/refered the code from repositories of VP, MaPLe,BAR, and BlackVIP. We appreciate the authors (Zhou et al., Bahng et al.,Khattak, M. U. et al., Tsai et al., Oh C et al.) for sharing their code.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published