Expediting Large-Scale Vision Transformer for Dense Prediction without Fine-tuning

Introduction

This is the official implementation of the paper "Expediting Large-Scale Vision Transformer for Dense Prediction without Fine-tuning" on Video Swin Transformer.

Results

Kinetics 400

Method	$\alpha$	t $\times$ h $\times$ w	GFLOPs	FPS	Acc@1	Acc@5	config
Swin-L	-	8 $\times$ 12 $\times$ 12	2107	1.10	84.7	96.6	config
Swin-L + Ours	10	8 $\times$ 6 $\times$ 6	1662	1.66	84.0	96.3	config

Kinetics 600

Method	$\alpha$	t $\times$ h $\times$ w	GFLOPs	FPS	Acc@1	Acc@5	config
Swin-L	-	8 $\times$ 12 $\times$ 12	2107	1.10	86.1	97.3	config
Swin-L + Ours	10	8 $\times$ 6 $\times$ 6	1824	1.53	85.6	97.1	config

Usage

Installation

Please refer to install.md for installation.

We also provide docker file cuda10.1 (image url) and cuda11.0 (image url) for convenient usage.

Data Preparation

Please refer to data_preparation.md for a general knowledge of data preparation. The supported datasets are listed in supported_datasets.md.

We also share our Kinetics-400 annotation file k400_val, k400_train for better comparison.

Inference

# single-gpu testing
python tools/test.py <CONFIG_FILE> <CHECKPOINT_FILE> --eval top_k_accuracy

# multi-gpu testing
bash tools/dist_test.sh <CONFIG_FILE> <CHECKPOINT_FILE> <GPU_NUM> --eval top_k_accuracy

Citation

If you find this project useful in your research, please consider cite:

@article{liang2022expediting,
	author    = {Liang, Weicong and Yuan, Yuhui and Ding, Henghui and Luo, Xiao and Lin, Weihong and Jia, Ding and Zhang, Zheng and Zhang, Chao and Hu, Han},
	title     = {Expediting large-scale vision transformer for dense prediction without fine-tuning},
	journal   = {arXiv preprint arXiv:2210.01035},
	year      = {2022},
}

@article{liu2021video,
  title={Video Swin Transformer},
  author={Liu, Ze and Ning, Jia and Cao, Yue and Wei, Yixuan and Zhang, Zheng and Lin, Stephen and Hu, Han},
  journal={arXiv preprint arXiv:2106.13230},
  year={2021}
}

@article{liu2021Swin,
  title={Swin Transformer: Hierarchical Vision Transformer using Shifted Windows},
  author={Liu, Ze and Lin, Yutong and Cao, Yue and Hu, Han and Wei, Yixuan and Zhang, Zheng and Lin, Stephen and Guo, Baining},
  journal={arXiv preprint arXiv:2103.14030},
  year={2021}
}

Name		Name	Last commit message	Last commit date
Latest commit History 1,411 Commits
.github		.github
configs		configs
demo		demo
docker		docker
docs		docs
docs_zh_CN		docs_zh_CN
figures		figures
mmaction		mmaction
mmcv_custom		mmcv_custom
requirements		requirements
tests		tests
tools		tools
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.pylintrc		.pylintrc
.readthedocs.yml		.readthedocs.yml
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
model_zoo.yml		model_zoo.yml
requirements.txt		requirements.txt
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Expediting Large-Scale Vision Transformer for Dense Prediction without Fine-tuning

Introduction

Results

Kinetics 400

Kinetics 600

Usage

Installation

Data Preparation

Inference

Citation

About

Releases

Packages

Languages

License

Expedit-LargeScale-Vision-Transformer/Expedit-Video-Swin-Transformer

Folders and files

Latest commit

History

Repository files navigation

Expediting Large-Scale Vision Transformer for Dense Prediction without Fine-tuning

Introduction

Results

Kinetics 400

Kinetics 600

Usage

Installation

Data Preparation

Inference

Citation

About

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages