Expediting Large-Scale Vision Transformer for Dense Prediction without Fine-tuning

Introduction

This is the official implementation of the paper "Expediting Large-Scale Vision Transformer for Dense Prediction without Fine-tuning" on OneFormer.

Results

Here we implement our method on Swin backbone. Thus we report the GFLOPs and FPS of backbone.

ADE20K

Method	Backbone	$\alpha$	h $\times$ w	GFLOPs	FPS	PQ	AP	mIoU
OneFormer	Swin-L	-	12 $\times$ 12	1206	3.52	51.3	37.7	56.9
OneFormer + Ours	Swin-L	8	10 $\times$ 10	1029	4.02	51.1	36.8	57
OneFormer + Ours	Swin-L	10	8 $\times$ 8	898	4.58	50.7	36.7	56.5
OneFormer + Ours	Swin-L	8	8 $\times$ 8	846	4.85	50.5	36.4	55.9

Installation Instructions

We use Python 3.8, PyTorch 1.10.1 (CUDA 11.3 build).
We use Detectron2-v0.6.
For complete installation instructions, please see INSTALL.md.

Dataset Preparation

We experiment on ADE20K benchmark. You can try our method on other benchmark such as Cityscapes and COCO 2017.
Please see Preparing Datasets for OneFormer for complete instructions for preparing the datasets.

Evaluation Instructions

You need to download the pretrained model from OneFormer.
You need to pass the value of task token. task belongs to [panoptic, semantic, instance].

python train_net.py --dist-url 'tcp://127.0.0.1:50164' \
    --num-gpus 8 \
    --config-file configs/ade20k/swin/oneformer_hourglass_swin_large_bs16_160k_1280x1280.yaml \
    --eval-only MODEL.IS_TRAIN False MODEL.WEIGHTS <path-to-checkpoint> \
    MODEL.TEST.TASK <task>

License

This project is released under the Apache 2.0 license.

Acknowledgement

The repo is built based on OneFormer. We thank the authors for their great work.

Citation

If you find this project useful in your research, please consider cite:

@article{liang2022expediting,
	author    = {Liang, Weicong and Yuan, Yuhui and Ding, Henghui and Luo, Xiao and Lin, Weihong and Jia, Ding and Zhang, Zheng and Zhang, Chao and Hu, Han},
	title     = {Expediting large-scale vision transformer for dense prediction without fine-tuning},
	journal   = {arXiv preprint arXiv:2210.01035},
	year      = {2022},
}

@inproceedings{jain2023oneformer,
      title={{OneFormer: One Transformer to Rule Universal Image Segmentation}},
      author={Jitesh Jain and Jiachen Li and MangTik Chiu and Ali Hassani and Nikita Orlov and Humphrey Shi},
      journal={CVPR}, 
      year={2023}
    }

Name		Name	Last commit message	Last commit date
Latest commit History 50 Commits
colab		colab
configs		configs
datasets		datasets
demo		demo
images		images
oneformer		oneformer
tools		tools
.gitignore		.gitignore
GETTING_STARTED.md		GETTING_STARTED.md
INSTALL.md		INSTALL.md
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
train_net.py		train_net.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Expediting Large-Scale Vision Transformer for Dense Prediction without Fine-tuning

Introduction

Results

ADE20K

Installation Instructions

Dataset Preparation

Evaluation Instructions

License

Acknowledgement

Citation

About

Releases

Packages

Languages

License

Expedit-LargeScale-Vision-Transformer/Expedit-OneFormer

Folders and files

Latest commit

History

Repository files navigation

Expediting Large-Scale Vision Transformer for Dense Prediction without Fine-tuning

Introduction

Results

ADE20K

Installation Instructions

Dataset Preparation

Evaluation Instructions

License

Acknowledgement

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages