This is the official implementation of the paper "Expediting Large-Scale Vision Transformer for Dense Prediction without Fine-tuning" on Mask2Former.
Here we implement our method on Swin backbone. Thus we report the GFLOPs and FPS of backbone.
Method | Backbone | h |
GFLOPs | FPS | PQ | config | |
---|---|---|---|---|---|---|---|
Mask2Former | Swin-L | - | 12 |
937 | 4.3 | 57.8 | config |
Mask2Former + Ours | Swin-L | 10 | 8 |
663 | 5.9 | 56.8 | config |
Method | Backbone | h |
GFLOPs | FPS | mask AP | config | |
---|---|---|---|---|---|---|---|
Mask2Former | Swin-L | - | 12 |
937 | 4.3 | 50.1 | config |
Mask2Former + Ours | Swin-L | 12 | 8 |
705 | 5.4 | 49.1 | config |
Method | Backbone | h |
GFLOPs | FPS | PQ | config | |
---|---|---|---|---|---|---|---|
Mask2Former | Swin-L | - | 12 |
937 | 4.4 | 48.0 | config |
Mask2Former + Ours | Swin-L | 14 | 8 |
769 | 5.3 | 47.6 | config |
Method | Backbone | h |
GFLOPs | FPS | mIoU | config | |
---|---|---|---|---|---|---|---|
Mask2Former | Swin-L | - | 12 |
937 | 4.3 | 55.8 | config |
Mask2Former + Ours | Swin-L | 8 | 8 |
620 | 6.2 | 55.5 | config |
Method | Backbone | h |
GFLOPs | FPS | AP | config | |
---|---|---|---|---|---|---|---|
Mask2Former | Swin-L | - | 12 |
8957 | 0.51 | 60.4 | config |
Mask2Former + Ours | Swin-L | 14 | 8 |
7631 | 0.60 | 60.2 | config |
Method | Backbone | h |
GFLOPs | FPS | AP | config | |
---|---|---|---|---|---|---|---|
Mask2Former | Swin-L | - | 12 |
7159 | 0.63 | 52.6 | config |
Mask2Former + Ours | Swin-L | 14 | 8 |
6253 | 0.72 | 51.8 | config |
See installation instructions.
See Preparing Datasets for Mask2Former.
See Getting Started with Mask2Former.
The repo is built based on Mask2Former. We thank the authors for their great work.
If you find this project useful in your research, please consider cite:
@article{liang2022expediting,
author = {Liang, Weicong and Yuan, Yuhui and Ding, Henghui and Luo, Xiao and Lin, Weihong and Jia, Ding and Zhang, Zheng and Zhang, Chao and Hu, Han},
title = {Expediting large-scale vision transformer for dense prediction without fine-tuning},
journal = {arXiv preprint arXiv:2210.01035},
year = {2022},
}
@article{cheng2021mask2former,
title={Masked-attention Mask Transformer for Universal Image Segmentation},
author={Bowen Cheng and Ishan Misra and Alexander G. Schwing and Alexander Kirillov and Rohit Girdhar},
journal={arXiv},
year={2021}
}