Skip to content

Latest commit

 

History

History
767 lines (717 loc) · 36.7 KB

MODEL_ZOO.md

File metadata and controls

767 lines (717 loc) · 36.7 KB

Mask2Former Model Zoo and Baselines

Introduction

This file documents a collection of models reported in our paper. All numbers were obtained on Big Basin servers with 8 NVIDIA V100 GPUs & NVLink (except Swin-L models are trained with 16 NVIDIA V100 GPUs).

How to Read the Tables

  • The "Name" column contains a link to the config file. Running train_net.py --num-gpus 8 with this config file will reproduce the model (except Swin-L models are trained with 16 NVIDIA V100 GPUs with distributed training on two nodes).
  • The model id column is provided for ease of reference. To check downloaded file integrity, any model on this page contains its md5 prefix in its file name.

Detectron2 ImageNet Pretrained Models

It's common to initialize from backbone models pre-trained on ImageNet classification tasks. The following backbone models are available:

Note: below are available pretrained models in Detectron2 that we do not use in our paper.

Third-party ImageNet Pretrained Models

Our paper also uses ImageNet pretrained models that are not part of Detectron2, please refer to tools to get those pretrained models.

License

All models available for download through this document are licensed under the Creative Commons Attribution-NonCommercial 4.0 International License.

COCO Model Zoo

Panoptic Segmentation

Name Backbone epochs PQ AP mIoU model id download
Mask2Former R50 50 51.9 41.7 61.7 47430278_4 model
Mask2Former R101 50 52.6 42.6 62.4 47992113_1 model
Mask2Former Swin-T 50 53.2 43.3 63.2 48558700_1 model
Mask2Former Swin-S 50 54.6 44.7 64.2 48558700_3 model
Mask2Former Swin-B 50 55.1 45.2 65.1 48558700_5 model
Mask2Former Swin-B (IN21k) 50 56.4 46.3 67.1 48558700_7 model
Mask2Former (200 queries) Swin-L (IN21k) 100 57.8 48.6 67.4 47429163_0 model

Instance Segmentation

Name Backbone epochs AP Boundary AP model id download
Mask2Former R50 50 43.7 30.6 47430277_2 model
Mask2Former R101 50 44.2 31.1 47992113_0 model
Mask2Former Swin-T 50 45.0 31.8 48558700_0 model
Mask2Former Swin-S 50 46.3 32.9 48558700_2 model
Mask2Former Swin-B 50 46.7 33.2 48558700_4 model
Mask2Former Swin-B (IN21k) 50 48.1 34.4 48558700_6 model
Mask2Former (200 queries) Swin-L (IN21k) 100 50.1 36.2 48235555 model

Cityscapes Model Zoo

Panoptic Segmentation

Name Backbone iterations PQ AP mIoU model id download
Mask2Former R50 90k 62.1 37.3 77.5 48267400_0 model
Mask2Former R101 90k 62.4 37.7 78.6 48267400_11 model
Mask2Former Swin-T 90k 63.9 39.1 80.5 48333144_2 model
Mask2Former Swin-S 90k 64.8 40.7 81.8 48381916 model
Mask2Former Swin-B (IN21k) 90k 66.1 42.8 82.7 48333157_2 model
Mask2Former (200 queries) Swin-L (IN21k) 90k 66.6 43.6 82.9 48318254_2 model

Instance Segmentation

Name Backbone iterations AP AP50 model id download
Mask2Former R50 90k 37.4 61.9 48267400_8 model
Mask2Former R101 90k 38.5 63.9 48267400_16 model
Mask2Former Swin-T 90k 39.7 66.9 48333144_4 model
Mask2Former Swin-S 90k 41.8 70.4 48333149_4 model
Mask2Former Swin-B (IN21k) 90k 42.0 68.8 48333157_4 model
Mask2Former (200 queries) Swin-L (IN21k) 90k 43.7 71.4 49111004_2 model

Semantic Segmentation

Name Backbone iterations mIoU mIoU (ms+flip) model id download
Mask2Former R50 90k 79.4 82.2 48267400_4 model
Mask2Former R101 90k 80.1 81.9 48267400_13 model
Mask2Former Swin-T 90k 82.1 83.0 48333144_3 model
Mask2Former Swin-S 90k 82.6 83.6 48333149_3 model
Mask2Former Swin-B (IN21k) 90k 83.3 84.5 48333157_3 model
Mask2Former Swin-L (IN21k) 90k 83.3 84.3 48318254_5 model

ADE20K Model Zoo

Panoptic Segmentation

Name Backbone iterations PQ AP mIoU model id download
Mask2Former R50 160k 39.7 26.5 46.1 48243028_0 model
Mask2Former (200 queries) Swin-L (IN21k) 160k 48.1 34.2 54.5 48267279 model

Instance Segmentation

Name Backbone iterations AP model id download
Mask2Former R50 160k 26.4 47429167_7 model
Mask2Former (200 queries) R50 160k 34.9 49040271_0 model

Semantic Segmentation

Name Backbone iterations mIoU mIoU (ms+flip) model id download
Mask2Former R50 160k 47.2 49.2 47429167_5 model
Mask2Former R101 160k 47.8 50.1 48243040_0 model
Mask2Former Swin-T 160k 47.7 49.6 48333144_5 model
Mask2Former Swin-S 160k 51.3 52.4 48333149_5 model
Mask2Former Swin-B 160k 52.4 53.7 48333153_5 model
Mask2Former Swin-B (IN21k) 160k 53.9 55.1 48333157_5 model
Mask2Former Swin-L (IN21k) 160k 56.1 57.3 48004474_0 model

Mapillary Vistas Model Zoo

Panoptic Segmentation

Name Backbone iterations PQ mIoU model id download
Mask2Former R50 300k 36.3 50.7 49392417_0 model
Mask2Former (200 queries) Swin-L (IN21k) 300k 45.5 60.8 48267065_4 model

Semantic Segmentation

Name Backbone iterations mIoU mIoU (ms+flip) model id download
Mask2Former R50 300k 57.4 59.0 49189528_1 model
Mask2Former Swin-L (IN21k) 300k 63.2 64.7 49189528_0 model

Video Instance Segmentation

YouTubeVIS 2019

Name Backbone iterations AP model id download
Mask2Former R50 6k 46.4 51130652_3 model
Mask2Former R101 6k 49.2 50897581_1 model
Mask2Former Swin-T 6k 51.5 50897611_3 model
Mask2Former Swin-S 6k 54.3 50897661_2 model
Mask2Former Swin-B (IN21k) 6k 59.5 50897733_2 model
Mask2Former (200 queries) Swin-L (IN21k) 6k 60.4 50908813_0 model

YouTubeVIS 2021

Name Backbone iterations AP model id download
Mask2Former R50 8k 40.6 51130652_7 model
Mask2Former R101 8k 42.4 50897581_8 model
Mask2Former Swin-T 8k 45.9 50897611_7 model
Mask2Former Swin-S 8k 48.6 50897661_7 model
Mask2Former Swin-B (IN21k) 8k 52.0 50897733_9 model
Mask2Former (200 queries) Swin-L (IN21k) 8k 52.6 50908813_6 model