Skip to content

[NeurIPS2022] This is the official implementation of the paper "Expediting Large-Scale Vision Transformer for Dense Prediction without Fine-tuning" on Mask2Former.

License

Notifications You must be signed in to change notification settings

Expedit-LargeScale-Vision-Transformer/Expedit-Mask2Former

Repository files navigation

Expediting Large-Scale Vision Transformer for Dense Prediction without Fine-tuning

Introduction

This is the official implementation of the paper "Expediting Large-Scale Vision Transformer for Dense Prediction without Fine-tuning" on Mask2Former.

framework framework

Results

Here we implement our method on Swin backbone. Thus we report the GFLOPs and FPS of backbone.

COCO

Panoptic Segmentation

Method Backbone $\alpha$ h $\times$ w GFLOPs FPS PQ config
Mask2Former Swin-L - 12 $\times$ 12 937 4.3 57.8 config
Mask2Former + Ours Swin-L 10 8 $\times$ 8 663 5.9 56.8 config

Instance Segmentation

Method Backbone $\alpha$ h $\times$ w GFLOPs FPS mask AP config
Mask2Former Swin-L - 12 $\times$ 12 937 4.3 50.1 config
Mask2Former + Ours Swin-L 12 8 $\times$ 8 705 5.4 49.1 config

ADE20K

Panoptic Segmentation

Method Backbone $\alpha$ h $\times$ w GFLOPs FPS PQ config
Mask2Former Swin-L - 12 $\times$ 12 937 4.4 48.0 config
Mask2Former + Ours Swin-L 14 8 $\times$ 8 769 5.3 47.6 config

Semantic Segmentation

Method Backbone $\alpha$ h $\times$ w GFLOPs FPS mIoU config
Mask2Former Swin-L - 12 $\times$ 12 937 4.3 55.8 config
Mask2Former + Ours Swin-L 8 8 $\times$ 8 620 6.2 55.5 config

Video Instance Segmentation

YouTubeVIS 2019

Method Backbone $\alpha$ h $\times$ w GFLOPs FPS AP config
Mask2Former Swin-L - 12 $\times$ 12 8957 0.51 60.4 config
Mask2Former + Ours Swin-L 14 8 $\times$ 8 7631 0.60 60.2 config

YouTubeVIS 2021

Method Backbone $\alpha$ h $\times$ w GFLOPs FPS AP config
Mask2Former Swin-L - 12 $\times$ 12 7159 0.63 52.6 config
Mask2Former + Ours Swin-L 14 8 $\times$ 8 6253 0.72 51.8 config

Installation

See installation instructions.

Getting Started

See Preparing Datasets for Mask2Former.

See Getting Started with Mask2Former.

Acknowledgement

The repo is built based on Mask2Former. We thank the authors for their great work.

Citation

If you find this project useful in your research, please consider cite:

@article{liang2022expediting,
	author    = {Liang, Weicong and Yuan, Yuhui and Ding, Henghui and Luo, Xiao and Lin, Weihong and Jia, Ding and Zhang, Zheng and Zhang, Chao and Hu, Han},
	title     = {Expediting large-scale vision transformer for dense prediction without fine-tuning},
	journal   = {arXiv preprint arXiv:2210.01035},
	year      = {2022},
}
@article{cheng2021mask2former,
  title={Masked-attention Mask Transformer for Universal Image Segmentation},
  author={Bowen Cheng and Ishan Misra and Alexander G. Schwing and Alexander Kirillov and Rohit Girdhar},
  journal={arXiv},
  year={2021}
}

About

[NeurIPS2022] This is the official implementation of the paper "Expediting Large-Scale Vision Transformer for Dense Prediction without Fine-tuning" on Mask2Former.

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages