Skip to content

Innovation4x/SelfieSeg

Repository files navigation

SelfieSeg

Selfie/Portrait Segmentation Models

Dependencies

  • Tensorflow(>=1.14.0), Python 3
  • Keras(>=2.2.4), Kito, Scipy, Dlib
  • Opencv(>=3.4), PIL, Matplotlib
  • PyTorch(>=1.9.0)
  • Mediapipe

Dataset Links

  1. Portseg_128
  2. Portrait_256
  3. PFCN
  4. AISegment
  5. Baidu_Aug
  6. Supervisely
  7. Pascal_Person
  8. Supervisely Portrait

Also checkout the datset: UCF Selfie

Mobile-Unet Architecture

SelfieSegMNV2 and SelfieSegMNV3 uses a upsampling block with Transpose Convolution with a stride of 2 for the decoder part.

Additionaly, it uses dropout regularization to prevent overfitting. It also helps our network to learn more robust features during training.

Selfie Segmentation code

1. SelfieSegMNV2.py

Here the inputs and outputs are images of size 128x128. The backbone is mobilenetv2 with depth multiplier 0.5 as encoder (feature extractor).

2. SelfieSegMNV3.py

Here the inputs and outputs are images of size 224x224. The backbone is mobilenetv3 with depth multiplier 0.5 as encoder (feature extractor).

3. SelfieSegPN.py (PortraitNet)

The decoder module consists of refined residual block with depthwise convolution and up-sampling blocks with transpose convolution. Also, it uses elementwise addition instead of feature concatenation in the decoder part. The encoder of the model is mobilnetev2 and it uses a four channel input, unlike the ohter models, for leveraging temporal consistency. As a result, the output video segmentaion appears more stabilized compared to other models. Also, it was observed that depthwise convolution and elementwise addition in decoder greatly improves the speed of the model.

  • Dataset: Portrait-mix (PFCN+Baidu+Supervisely)
  • Size: 224x224

4. SelfieSegSN.py (SINet: Extreme Lightweight Portrait Segmentation)

SINet is an lightweight portrait segmentaion dnn architecture for mobile devices. The model which contains around 86.9 K parameters is able to run at 100 FPS on iphone (input size -224) , while maintaining the accuracy under an 1% margin from the state-of-the-art portrait segmentation method. The proposed portrait segmentation model conatins two new modules for fast and accurate segmentaion viz. information blocking decoder structure and spatial squeeze modules.

  1. Information Blocking Decoder: It measures the confidence in a low-resolution feature map, and blocks the influence of high-resolution feature maps in highly confident pixels. This prevents noisy information to ruin already certain areas, and allows the model to focuson regions with high uncertainty.

  2. Spatial Squeeze Modules: The S2 module is an efficient multipath network for feature extraction. Existing multi-path structures deal with the various size of long-range dependencies by managing multiple receptive fields. However, this increases latency in real implementations, due to having unsuitable structure with regard to kernel launching and synchronization. To mitigate this problem, they squeeze the spatial resolution from each feature map by average pooling, and show that this is more effective than adopting multi-receptive fields.

Besides the aforementioned features, the SINet architecture uses depthwise separable convolution and PReLU actiavtion in the encoder modules. They also use Squeeze-and-Excitation (SE) blocks that adaptively recalibrates channel-wise feature responses by explicitly modelling interdependencies between channels, for improving the model accuracy. For training, they used cross entropy loss with additional boundary refinement. In general it is faster and smaller than most of the portrait segmentaion models; but in terms of accuracy it falls behind portrait-net model by a small margin. The model seems to be faster than mobilentv3 in iOS; but in android it seems likely to make only a marginal difference(due to optimized tflite swish operator).

We trained the sinet model with aisegment + baidu portrait dataset using input size 320 and cross entropy loss function, for 600 epochs and achieved an mIOU of 97.5%. The combined dataset consists of around 80K images(train+val), after data augmentaion. The final trained model has a size of 480kB and 86.91K parameters.

5. SelfieSegDLV3.py

DeepLabV3 models with ResNet-101 backbones (https://pytorch.org/hub/pytorch_vision_deeplabv3_resnet101/)

6. SelfieSegFCN.py

Fully-Convolutional Network model with ResNet-101 backbones. (https://pytorch.org/hub/pytorch_vision_fcn_resnet101/)

7. SelfieSegMP.py

MediaPipe Selfie Segmentation segments the prominent humans in the scene. It can run in real-time on both smartphones and laptops. The intended use cases include selfie effects and video conferencing, where the person is close (< 2m) to the camera. (https://google.github.io/mediapipe/solutions/selfie_segmentation.html)

License

This project is licensed under the terms of the MIT license.

Versioning

Version 1.0

Acknowledgments

About

Selfie/Portrait Segmentation Models

Resources

License

MIT, MIT licenses found

Licenses found

MIT
LICENSE
MIT
LICENSE.1

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages