Expediting Large-Scale Vision Transformer for Dense Prediction without Fine-tuning

Introduction

This is the official implementation of the paper "Expediting Large-Scale Vision Transformer for Dense Prediction without Fine-tuning" on Mask2Former.

Results

Here we implement our method on Swin backbone. Thus we report the GFLOPs and FPS of backbone.

COCO

Panoptic Segmentation

Method	Backbone	$\alpha$	h $\times$ w	GFLOPs	FPS	PQ	config
Mask2Former	Swin-L	-	12 $\times$ 12	937	4.3	57.8	config
Mask2Former + Ours	Swin-L	10	8 $\times$ 8	663	5.9	56.8	config

Instance Segmentation

Method	Backbone	$\alpha$	h $\times$ w	GFLOPs	FPS	mask AP	config
Mask2Former	Swin-L	-	12 $\times$ 12	937	4.3	50.1	config
Mask2Former + Ours	Swin-L	12	8 $\times$ 8	705	5.4	49.1	config

ADE20K

Panoptic Segmentation

Method	Backbone	$\alpha$	h $\times$ w	GFLOPs	FPS	PQ	config
Mask2Former	Swin-L	-	12 $\times$ 12	937	4.4	48.0	config
Mask2Former + Ours	Swin-L	14	8 $\times$ 8	769	5.3	47.6	config

Semantic Segmentation

Method	Backbone	$\alpha$	h $\times$ w	GFLOPs	FPS	mIoU	config
Mask2Former	Swin-L	-	12 $\times$ 12	937	4.3	55.8	config
Mask2Former + Ours	Swin-L	8	8 $\times$ 8	620	6.2	55.5	config

Video Instance Segmentation

YouTubeVIS 2019

Method	Backbone	$\alpha$	h $\times$ w	GFLOPs	FPS	AP	config
Mask2Former	Swin-L	-	12 $\times$ 12	8957	0.51	60.4	config
Mask2Former + Ours	Swin-L	14	8 $\times$ 8	7631	0.60	60.2	config

YouTubeVIS 2021

Method	Backbone	$\alpha$	h $\times$ w	GFLOPs	FPS	AP	config
Mask2Former	Swin-L	-	12 $\times$ 12	7159	0.63	52.6	config
Mask2Former + Ours	Swin-L	14	8 $\times$ 8	6253	0.72	51.8	config

Installation

See installation instructions.

Getting Started

See Preparing Datasets for Mask2Former.

See Getting Started with Mask2Former.

Acknowledgement

The repo is built based on Mask2Former. We thank the authors for their great work.

Citation

If you find this project useful in your research, please consider cite:

@article{liang2022expediting,
	author    = {Liang, Weicong and Yuan, Yuhui and Ding, Henghui and Luo, Xiao and Lin, Weihong and Jia, Ding and Zhang, Zheng and Zhang, Chao and Hu, Han},
	title     = {Expediting large-scale vision transformer for dense prediction without fine-tuning},
	journal   = {arXiv preprint arXiv:2210.01035},
	year      = {2022},
}

@article{cheng2021mask2former,
  title={Masked-attention Mask Transformer for Universal Image Segmentation},
  author={Bowen Cheng and Ishan Misra and Alexander G. Schwing and Alexander Kirillov and Rohit Girdhar},
  journal={arXiv},
  year={2021}
}

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
configs		configs
datasets		datasets
demo		demo
demo_video		demo_video
figures		figures
mask2former		mask2former
mask2former_video		mask2former_video
scripts		scripts
tools		tools
.gitignore		.gitignore
ADVANCED_USAGE.md		ADVANCED_USAGE.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
GETTING_STARTED.md		GETTING_STARTED.md
INSTALL.md		INSTALL.md
LICENSE		LICENSE
MODEL_ZOO.md		MODEL_ZOO.md
README.md		README.md
requirements.txt		requirements.txt
train_net.py		train_net.py
train_net_video.py		train_net_video.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Expediting Large-Scale Vision Transformer for Dense Prediction without Fine-tuning

Introduction

Results

COCO

Panoptic Segmentation

Instance Segmentation

ADE20K

Panoptic Segmentation

Semantic Segmentation

Video Instance Segmentation

YouTubeVIS 2019

YouTubeVIS 2021

Installation

Getting Started

Acknowledgement

Citation

About

Releases

Packages

Languages

License

Expedit-LargeScale-Vision-Transformer/Expedit-Mask2Former

Folders and files

Latest commit

History

Repository files navigation

Expediting Large-Scale Vision Transformer for Dense Prediction without Fine-tuning

Introduction

Results

COCO

Panoptic Segmentation

Instance Segmentation

ADE20K

Panoptic Segmentation

Semantic Segmentation

Video Instance Segmentation

YouTubeVIS 2019

YouTubeVIS 2021

Installation

Getting Started

Acknowledgement

Citation

About

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages