This repo is the official implementation of our paper "Hierarchical Side-Tuning for Vision Transformers" (arXiv).
Weifeng Lin, Ziheng Wu, Wentao Yang, Mingxin Huang, Jun Huang and Lianwen Jin
- Clone this repo:
git clone https://github.com/AFeng-x/HST.git
cd HST
- Create a conda virtual environment and activate it:
conda create -n HST python=3.8 -y
conda activate HST
- Install PyTorch:
pip3 install torch==1.10.1 torchvision==0.11.2 torchaudio --index-url https://download.pytorch.org/whl/cu113
- Install other requirements:
pip install -r requirements.txt
- FGVC & vtab-1k
You can follow VPT to download them.
- VTAB-1K
Original download link: vtab dataset.
Following SSF to download the extracted vtab-1k dataset for convenience.
The license is in vtab dataset.
- CIFAR-100
wget https://www.cs.toronto.edu/~kriz/cifar-100-python.tar.gz
-
For pre-trained ViT-B/16 or ViT-L/16 models on ImageNet-21K, the model weights will be automatically downloaded. You can also manually download them from ViT.
-
For pre-trained ViT-B/16 on MAE, please download from MAE
To fine-tune a pre-trained ViT model via HST
, pleasse refer to the scripts.
Examples:
bash train_scripts/vit/cifar_100/train_hsn_img21k.sh
You can directly transfer our model in mmdetection and mmsegmentation.
If you find this project useful for your research and applications, please kindly cite our paper:
@article{lin2023hierarchical,
title={Hierarchical side-tuning for vision transformers},
author={Lin, Weifeng and Wu, Ziheng and Yang, Wentao and Huang, Mingxin and Huang, Jun and Jin, Lianwen},
journal={arXiv preprint arXiv:2310.05393},
year={2023}
}
- timm: the codebase we built upon.
- VPT and SSF: the codebase we refer to.
- vtab github repo: VTAB-1k