Benchmarking Robustness of Adaptation Methods on Pre-trained Vision-Language Models

This repository contains the code for the paper Benchmarking Robustness of Adaptation Methods on Pre-trained Vision-Language Models. We assess the robustness of 11 widely-used adaptation methods across 4 vision-language datasets under multimodal corruptions. Concretely, we introduce 7 benchmark datasets, including 96 visual and 87 textual corruptions, to investigate the robustness of different adaptation methods, the impact of available adaptation examples, and the influence of trainable parameter size during adaptation.

Installation

# git clone 
git clone git@github.com:adarobustness/adaptation_robustness.git
cd adaptation_robustness
# create conda environment (optional)
conda create -n adaptation_robustness python=3.8
conda activate adaptation_robustness
# install requirements
pip install -r requirements.txt
 # install langugae evaluation 
pip install git+https://github.com/bckim92/language-evaluation.git
python -c "import language_evaluation; language_evaluation.download('coco')"
# Download T5/BART backbone checkpoint
python download_backbones.py

Code Structure

├── VL-T5
│   ├── original_scripts
│   │   ├── image_clip_bart # adapt CLIP-BART+adaptation methods
│   │   ├── image_clip_bart_corr_hp # inference CLIP-BArt+adaptation methods with different hyperparameters 
│   │   ├── image_clip_bart_corr_subset # adapt CLIP-BART+adaptation methods with different subsets of adaptation data
│   │   ├── image_clip_bart_corr_test # inference CLIP-BART+adaptation methods
│   │   ├── image_clip_bart_hp # adapt CLIP-BART+adaptation methods with different hyperparameters
│   │   ├── image_clip_t5 # adapt CLIP-T5+adaptation methods
│   │   ├── image_clip_t5_corr_hp # inference CLIP-T5+adaptation methods with different hyperparameters
│   │   ├── image_clip_t5_corr_subset # adapt CLIP-T5+adaptation methods with different subsets of adaptation data
│   │   ├── image_clip_t5_corr_test # inference CLIP-T5+adaptation methods
│   │   └── image_clip_t5_hp # adapt CLIP-T5+adaptation methods with different hyperparameters
│   ├── scripts # scripts for experiments deployment in a more convenient way
│   └── src
│       ├── clip # CLIP model
│       ├── adapters # Adapter, Compacter, Hyperformer
│       ├── lora # LoRA adaptation method 
│       ├── prompt # prompting adaptation methods
│       ├── my_transformers # T5/BART model
│       ├── caption*.py # captioning tasks
│       ├── gqa*.py # GQA tasks
│       ├── nlvr*.py # NLVR tasks
│       ├── vqa*.py # VQA tasks 
│       ├── multitask.py # Entrance of multitask setting 
├── download_backbones.py # download T5/BART backbone checkpoint
├── feature_extraction # extract image features

Data & Pre-trained Models Preparation

Default Directory Structure

This repo assumes the following directory structure for data and pre-trained models:

├── datasets # datasets, extracted features, and corrupted datasets
│   ├── COCO
│   │   ├── images 
│   │   ├── clip_features
│   ├── VG
│   │   ├── images 
│   │   ├── clip_features
│   ├── GQA
│   │   ├── images 
│   │   ├── clip_features
│   ├── nlvr
│   │   ├── images 
│   │   ├── clip_features
│   ├── vqa
│   ├── lxmert
├── VL-T5
│   ├── snap # adaptation checkpoints and output logs
│   │   └── VLBart_multitask # pre-trained CLIP-BART+adaptation model
│   │   └── VLT5_multitask # pre-trained CLIP-T5+adaptation model

Download Datasets & Pre-trained Models

Download processed CLIP features from this link and put extracted files under datasets directory.
Download VQA, NLVR^2, and GQA and organize the data following the default directory structure.
Download pre-trained CLIP-BART+adaptation model from this link and put extracted files under VL-T5/snap directory.

Dataset Corruption

Install corruption
Run python corruption.py ... following instructions from corruption to generate corrupted datasets
Put corrupted datasets under datasets directory following the default directory structure.
Or you can directly download the corrupted data here.

Feature Extraction on Corrupted Datasets

cd feature_extraction
# CORRUPTION_METHOD_NAME: corruption method name, e.g., 'gaussian_noise'
# SEVERITY: corruption severity, e.g., '1', '2', '3', '4', '5'
# GPU_ID: GPU ID, e.g., '0'
# DATASET_NAME: dataset name, e.g., 'coco', 'vg', 'gqa', 'nlvr', 'vqa'
# PREFIX: prefix of the output file, e.g., 'coco', 'vg', 'gqa', 'nlvr', 'vqa'
bash extract_clip_features.sh \
  ${CORRUPTION_METHOD_NAME} \
  ${SEVERITY} \
  ${GPU_ID} \
  ${DATASET_NAME} \
  ${PREFIX}