Semantic Segmentation

Easy to use and customizable SOTA Semantic Segmentation models with abundant datasets in PyTorch

Features

Applicable to following tasks:
- Scene Parsing
- Human Parsing
- Face Parsing
20+ Datasets
10+ SOTA Backbones
10+ SOTA Semantic Segmentation Models
PyTorch, ONNX, TFLite and OpenVINO Inference

Model Zoo

Supported Backbones:

ResNet (CVPR 2016)
ResNetD (ArXiv 2018)
MobileNetV2 (CVPR 2018)
MobileNetV3 (ICCV 2019)
PVTv2 (ArXiv 2021)
ResT (ArXiv 2021)
MicroNet (ICCV 2021)

Supported Heads/Methods:

FCN (CVPR 2015)
UPerNet (ECCV 2018)
BiSeNetv1 (ECCV 2018)
FPN (CVPR 2019)
SFNet (ECCV 2020)
SegFormer (ArXiv 2021)
FaPN (ICCV 2021)
CondNet (IEEE SPL 2021)

Supported Standalone Models:

FCHarDNet (ICCV 2019)
BiSeNetv2 (IJCV 2021)
DDRNet (ArXiv 2021)

Supported Modules:

PPM (CVPR 2017)
PSA (ArXiv 2021)

ADE20K-val (Scene Parsing)

Method	Backbone	mIoU (%)	Params ^(M)	GFLOPs ^(512x512)	Weights
SegFormer	MiT-B1	43.1	14	16	pt
	MiT-B2	47.5	28	62	pt
	MiT-B3	50.0	47	79	pt

CityScapes-val (Scene Parsing)

Method	Backbone	mIoU (%)	Params (M)	GFLOPs	Img Size	Weights
SegFormer	MiT-B0	78.1	4	126	1024x1024	N/A
	MiT-B1	80.0	14	244	1024x1024	N/A
FaPN	ResNet-50	80.0	33	-	512x1024	N/A
SFNet	ResNetD-18	79.0	13	-	1024x1024	N/A
FCHarDNet	HarDNet-70	77.7	4	35	1024x1024	pt
DDRNet	DDRNet-23slim	77.8	6	36	1024x2048	pt

HELEN-val (Face Parsing)

Method	Backbone	mIoU (%)	Params ^(M)	GFLOPs ^(512x512)	FPS ^(GTX1660ti)	Weights
BiSeNetv1	MobileNetV2-1.0	58.22	5	5	160	pt
BiSeNetv1	ResNet-18	58.50	14	13	263	pt
BiSeNetv2	-	58.58	18	15	195	pt
FCHarDNet	HarDNet-70	59.38	4	4	130	pt
DDRNet	DDRNet-23slim	61.11	6	5	180	pt\|tflite(fp32)\|tflite(fp16)\|tflite(int8)
SegFormer	MiT-B0	59.31	4	8	75	pt
SFNet	ResNetD-18	61.00	14	31	56	pt

Backbones

Model	Variants	ImageNet-1k Top-1 Acc (%)	Params (M)	GFLOPs	Weights
MicroNet	M1\|M2\|M3	51.4`\|`59.4`\|`62.5	1`\|`2`\|`3	6M`\|`12M`\|`21M	download
MobileNetV2	1.0	71.9	3	300M	download
MobileNetV3	S\|L	67.7`\|`74.0	3`\|`5	56M`\|`219M	S\|L

ResNet	18\|50\|101	69.8`\|`76.1`\|`77.4	12`\|`25`\|`44	2`\|`4`\|`8	download
ResNetD	18\|50\|101	-	12`\|`25`\|`44	2`\|`4`\|`8	download
MiT	B1\|B2\|B3	-	14`\|`25`\|`45	2`\|`4`\|`8	download
PVTv2	B1\|B2\|B4	78.7`\|`82.0`\|`83.6	14`\|`25`\|`63	2`\|`4`\|`10	download
ResT	S\|B\|L	79.6`\|`81.6`\|`83.6	14`\|`30`\|`52	2`\|`4`\|`8	download

Notes: Download backbones' weights for HarDNet-70 and DDRNet-23slim.

Supported Datasets

Dataset	Type	Categories	Train ^Images	Val ^Images	Test ^Images	Image Size ^(HxW)
COCO-Stuff	General Scene Parsing	171	118,000	5,000	20,000	-
ADE20K	General Scene Parsing	150	20,210	2,000	3,352	-
PASCALContext	General Scene Parsing	59	4,996	5,104	9,637	-

SUN RGB-D	Indoor Scene Parsing	37	2,666	2,619	5,050^+labels	-

Mapillary Vistas	Street Scene Parsing	65	18,000	2,000	5,000	1080x1920
CityScapes	Street Scene Parsing	19	2,975	500	1,525^+labels	1024x2048
CamVid	Street Scene Parsing	11	367	101	233^+labels	720x960

MHPv2	Multi-Human Parsing	59	15,403	5,000	5,000	-
MHPv1	Multi-Human Parsing	19	3,000	1,000	980^+labels	-
LIP	Multi-Human Parsing	20	30,462	10,000	-	-
CCIHP	Multi-Human Parsing	22	28,280	5,000	5,000	-
CIHP	Multi-Human Parsing	20	28,280	5,000	5,000	-
ATR	Single-Human Parsing	18	16,000	700	1,000^+labels	-

HELEN	Face Parsing	11	2,000	230	100^+labels	-
LaPa	Face Parsing	11	18,176	2,000	2,000^+labels	-
iBugMask	Face Parsing	11	21,866	-	1,000^+labels	-
CelebAMaskHQ	Face Parsing	19	24,183	2,993	2,824^+labels	512x512
FaceSynthetics	Face Parsing (Synthetic)	19	100,000	1,000	100^+labels	512x512

SUIM	Underwater Imagery	8	1,525	-	110^+labels	-

Check DATASETS to find more segmentation datasets.

Datasets Structure (click to expand)

Datasets should have the following structure:

data
|__ ADEChallenge
    |__ ADEChallengeData2016
        |__ images
            |__ training
            |__ validation
        |__ annotations
            |__ training
            |__ validation

|__ CityScapes
    |__ leftImg8bit
        |__ train
        |__ val
        |__ test
    |__ gtFine
        |__ train
        |__ val
        |__ test

|__ CamVid
    |__ train
    |__ val
    |__ test
    |__ train_labels
    |__ val_labels
    |__ test_labels
    
|__ VOCdevkit
    |__ VOC2010
        |__ JPEGImages
        |__ SegmentationClassContext
        |__ ImageSets
            |__ SegmentationContext
                |__ train.txt
                |__ val.txt
    
|__ COCO
    |__ images
        |__ train2017
        |__ val2017
    |__ labels
        |__ train2017
        |__ val2017

|__ MHPv1
    |__ images
    |__ annotations
    |__ train_list.txt
    |__ test_list.txt

|__ MHPv2
    |__ train
        |__ images
        |__ parsing_annos
    |__ val
        |__ images
        |__ parsing_annos

|__ LIP
    |__ LIP
        |__ TrainVal_images
            |__ train_images
            |__ val_images
        |__ TrainVal_parsing_annotations
            |__ train_segmentations
            |__ val_segmentations

    |__ CIHP/CCIHP
        |__ instance-leve_human_parsing
            |__ Training
                |__ Images
                |__ Category_ids
            |__ Validation
                |__ Images
                |__ Category_ids

    |__ ATR
        |__ humanparsing
            |__ JPEGImages
            |__ SegmentationClassAug

|__ SUIM
    |__ train_val
        |__ images
        |__ masks
    |__ TEST
        |__ images
        |__ masks

|__ SunRGBD
    |__ SUNRGBD
        |__ kv1/kv2/realsense/xtion
    |__ SUNRGBDtoolbox
        |__ traintestSUNRGBD
            |__ allsplit.mat

|__ Mapillary
    |__ training
        |__ images
        |__ labels
    |__ validation
        |__ images
        |__ labels

|__ SmithCVPR2013_dataset_resized (HELEN)
    |__ images
    |__ labels
    |__ exemplars.txt
    |__ testing.txt
    |__ tuning.txt

|__ CelebAMask-HQ
    |__ CelebA-HQ-img
    |__ CelebAMask-HQ-mask-anno
    |__ CelebA-HQ-to-CelebA-mapping.txt

|__ LaPa
    |__ train
        |__ images
        |__ labels
    |__ val
        |__ images
        |__ labels
    |__ test
        |__ images
        |__ labels

|__ ibugmask_release
    |__ train
    |__ test

|__ FaceSynthetics
    |__ dataset_100000
    |__ dataset_1000
    |__ dataset_100

Note: For PASCALContext, download the annotations from here and put it in VOC2010.

Note: For CelebAMask-HQ, run the preprocess script. python3 scripts/preprocess_celebamaskhq.py --root <DATASET-ROOT-DIR>.

Augmentations (click to expand)

Check out the notebook here to test the augmentation effects.

Pixel-level Transforms:

ColorJitter (Brightness, Contrast, Saturation, Hue)
Gamma, Sharpness, AutoContrast, Equalize, Posterize
GaussianBlur, Grayscale

Spatial-level Transforms:

Affine, RandomRotation
HorizontalFlip, VerticalFlip
CenterCrop, RandomCrop
Pad, ResizePad, Resize
RandomResizedCrop

Usage

Requirements

python >= 3.6
torch >= 1.8.1
torchvision >= 0.9.1

Other requirements can be installed with pip install -r requirements.txt.

Configuration (click to expand)

Create a configuration file in configs. Sample configuration for ADE20K dataset can be found here. Then edit the fields you think if it is needed. This configuration file is needed for all of training, evaluation and prediction scripts.

Training (click to expand)

To train with a single GPU:

$ python tools/train.py --cfg configs/CONFIG_FILE.yaml

To train with multiple gpus, set DDP field in config file to true and run as follows:

$ python -m torch.distributed.launch --nproc_per_node=2 --use_env tools/train.py --cfg configs/<CONFIG_FILE_NAME>.yaml

Evaluation (click to expand)

Make sure to set MODEL_PATH of the configuration file to your trained model directory.

$ python tools/val.py --cfg configs/<CONFIG_FILE_NAME>.yaml

To evaluate with multi-scale and flip, change ENABLE field in MSF to true and run the same command as above.

Inference

To make an inference, edit the parameters of the config file from below.

Change MODEL >> NAME and VARIANT to your desired pretrained model.
Change DATASET >> NAME to the dataset name depending on the pretrained model.
Set TEST >> MODEL_PATH to pretrained weights of the testing model.
Change TEST >> FILE to the file or image folder path you want to test.
Testing results will be saved in SAVE_DIR.

## example using ade20k pretrained models
$ python tools/infer.py --cfg configs/ade20k.yaml

Example test results:

Convert to other Frameworks (ONNX, CoreML, OpenVINO, TFLite)

To convert to ONNX and CoreML, run:

$ python tools/export.py --cfg configs/<CONFIG_FILE_NAME>.yaml

To convert to OpenVINO and TFLite, see torch_optimize.

Inference (ONNX, OpenVINO, TFLite)

## ONNX Inference
$ python scripts/onnx_infer.py --model <ONNX_MODEL_PATH> --img-path <TEST_IMAGE_PATH>

## OpenVINO Inference
$ python scripts/openvino_infer.py --model <OpenVINO_MODEL_PATH> --img-path <TEST_IMAGE_PATH>

## TFLite Inference
$ python scripts/tflite_infer.py --model <TFLite_MODEL_PATH> --img-path <TEST_IMAGE_PATH>

References (click to expand)

Citations (click to expand)

@article{xie2021segformer,
  title={SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers},
  author={Xie, Enze and Wang, Wenhai and Yu, Zhiding and Anandkumar, Anima and Alvarez, Jose M and Luo, Ping},
  journal={arXiv preprint arXiv:2105.15203},
  year={2021}
}

@misc{xiao2018unified,
  title={Unified Perceptual Parsing for Scene Understanding}, 
  author={Tete Xiao and Yingcheng Liu and Bolei Zhou and Yuning Jiang and Jian Sun},
  year={2018},
  eprint={1807.10221},
  archivePrefix={arXiv},
  primaryClass={cs.CV}
}

@article{hong2021deep,
  title={Deep Dual-resolution Networks for Real-time and Accurate Semantic Segmentation of Road Scenes},
  author={Hong, Yuanduo and Pan, Huihui and Sun, Weichao and Jia, Yisong},
  journal={arXiv preprint arXiv:2101.06085},
  year={2021}
}

@misc{zhang2021rest,
  title={ResT: An Efficient Transformer for Visual Recognition}, 
  author={Qinglong Zhang and Yubin Yang},
  year={2021},
  eprint={2105.13677},
  archivePrefix={arXiv},
  primaryClass={cs.CV}
}

@misc{huang2021fapn,
  title={FaPN: Feature-aligned Pyramid Network for Dense Image Prediction}, 
  author={Shihua Huang and Zhichao Lu and Ran Cheng and Cheng He},
  year={2021},
  eprint={2108.07058},
  archivePrefix={arXiv},
  primaryClass={cs.CV}
}

@misc{wang2021pvtv2,
  title={PVTv2: Improved Baselines with Pyramid Vision Transformer}, 
  author={Wenhai Wang and Enze Xie and Xiang Li and Deng-Ping Fan and Kaitao Song and Ding Liang and Tong Lu and Ping Luo and Ling Shao},
  year={2021},
  eprint={2106.13797},
  archivePrefix={arXiv},
  primaryClass={cs.CV}
}

@article{Liu2021PSA,
  title={Polarized Self-Attention: Towards High-quality Pixel-wise Regression},
  author={Huajun Liu and Fuqiang Liu and Xinyi Fan and Dong Huang},
  journal={Arxiv Pre-Print arXiv:2107.00782 },
  year={2021}
}

@misc{chao2019hardnet,
  title={HarDNet: A Low Memory Traffic Network}, 
  author={Ping Chao and Chao-Yang Kao and Yu-Shan Ruan and Chien-Hsiang Huang and Youn-Long Lin},
  year={2019},
  eprint={1909.00948},
  archivePrefix={arXiv},
  primaryClass={cs.CV}
}

@inproceedings{sfnet,
  title={Semantic Flow for Fast and Accurate Scene Parsing},
  author={Li, Xiangtai and You, Ansheng and Zhu, Zhen and Zhao, Houlong and Yang, Maoke and Yang, Kuiyuan and Tong, Yunhai},
  booktitle={ECCV},
  year={2020}
}

@article{Li2020SRNet,
  title={Towards Efficient Scene Understanding via Squeeze Reasoning},
  author={Xiangtai Li and Xia Li and Ansheng You and Li Zhang and Guang-Liang Cheng and Kuiyuan Yang and Y. Tong and Zhouchen Lin},
  journal={ArXiv},
  year={2020},
  volume={abs/2011.03308}
}

@ARTICLE{Yucondnet21,
  author={Yu, Changqian and Shao, Yuanjie and Gao, Changxin and Sang, Nong},
  journal={IEEE Signal Processing Letters}, 
  title={CondNet: Conditional Classifier for Scene Segmentation}, 
  year={2021},
  volume={28},
  number={},
  pages={758-762},
  doi={10.1109/LSP.2021.3070472}
}

Name		Name	Last commit message	Last commit date
Latest commit History 77 Commits
assests		assests
configs		configs
scripts		scripts
semseg		semseg
tools		tools
.gitignore		.gitignore
DATASETS.md		DATASETS.md
LICENSE		LICENSE
README.md		README.md
aug_test.ipynb		aug_test.ipynb
requirements.txt		requirements.txt
tutorial.ipynb		tutorial.ipynb

Model	Variants	ImageNet-1k Top-1 Acc (%)	Params (M)	GFLOPs	Weights
MicroNet	M1\|M2\|M3	51.4`\|`59.4`\|`62.5	1`\|`2`\|`3	6M`\|`12M`\|`21M	download
MobileNetV2	1.0	71.9	3	300M	download
MobileNetV3	S\|L	67.7`\|`74.0	3`\|`5	56M`\|`219M	S\|L

ResNet	18\|50\|101	69.8`\|`76.1`\|`77.4	12`\|`25`\|`44	2`\|`4`\|`8	download
ResNetD	18\|50\|101	-	12`\|`25`\|`44	2`\|`4`\|`8	download
MiT	B1\|B2\|B3	-	14`\|`25`\|`45	2`\|`4`\|`8	download
PVTv2	B1\|B2\|B4	78.7`\|`82.0`\|`83.6	14`\|`25`\|`63	2`\|`4`\|`10	download
ResT	S\|B\|L	79.6`\|`81.6`\|`83.6	14`\|`30`\|`52	2`\|`4`\|`8	download

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Semantic Segmentation

Features

Model Zoo

Supported Datasets

Usage

About

Releases

Packages

Languages

License

JsutCheng/semantic-segmentation

Folders and files

Latest commit

History

Repository files navigation

Semantic Segmentation

Features

Model Zoo

Supported Datasets

Usage

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages