SelfieSeg

Selfie/Portrait Segmentation Models

Dependencies

Tensorflow(>=1.14.0), Python 3
Keras(>=2.2.4), Kito, Scipy, Dlib
Opencv(>=3.4), PIL, Matplotlib
PyTorch(>=1.9.0)
Mediapipe

Dataset Links

Also checkout the datset: UCF Selfie

Mobile-Unet Architecture

SelfieSegMNV2 and SelfieSegMNV3 uses a upsampling block with Transpose Convolution with a stride of 2 for the decoder part.

Additionaly, it uses dropout regularization to prevent overfitting. It also helps our network to learn more robust features during training.

Selfie Segmentation code

1. SelfieSegMNV2.py

Here the inputs and outputs are images of size 128x128. The backbone is mobilenetv2 with depth multiplier 0.5 as encoder (feature extractor).

2. SelfieSegMNV3.py

Here the inputs and outputs are images of size 224x224. The backbone is mobilenetv3 with depth multiplier 0.5 as encoder (feature extractor).

3. SelfieSegPN.py (PortraitNet)

The decoder module consists of refined residual block with depthwise convolution and up-sampling blocks with transpose convolution. Also, it uses elementwise addition instead of feature concatenation in the decoder part. The encoder of the model is mobilnetev2 and it uses a four channel input, unlike the ohter models, for leveraging temporal consistency. As a result, the output video segmentaion appears more stabilized compared to other models. Also, it was observed that depthwise convolution and elementwise addition in decoder greatly improves the speed of the model.

Dataset: Portrait-mix (PFCN+Baidu+Supervisely)
Size: 224x224

4. SelfieSegSN.py (SINet: Extreme Lightweight Portrait Segmentation)

SINet is an lightweight portrait segmentaion dnn architecture for mobile devices. The model which contains around 86.9 K parameters is able to run at 100 FPS on iphone (input size -224) , while maintaining the accuracy under an 1% margin from the state-of-the-art portrait segmentation method. The proposed portrait segmentation model conatins two new modules for fast and accurate segmentaion viz. information blocking decoder structure and spatial squeeze modules.

Information Blocking Decoder: It measures the confidence in a low-resolution feature map, and blocks the influence of high-resolution feature maps in highly confident pixels. This prevents noisy information to ruin already certain areas, and allows the model to focuson regions with high uncertainty.
Spatial Squeeze Modules: The S2 module is an efficient multipath network for feature extraction. Existing multi-path structures deal with the various size of long-range dependencies by managing multiple receptive fields. However, this increases latency in real implementations, due to having unsuitable structure with regard to kernel launching and synchronization. To mitigate this problem, they squeeze the spatial resolution from each feature map by average pooling, and show that this is more effective than adopting multi-receptive fields.

Besides the aforementioned features, the SINet architecture uses depthwise separable convolution and PReLU actiavtion in the encoder modules. They also use Squeeze-and-Excitation (SE) blocks that adaptively recalibrates channel-wise feature responses by explicitly modelling interdependencies between channels, for improving the model accuracy. For training, they used cross entropy loss with additional boundary refinement. In general it is faster and smaller than most of the portrait segmentaion models; but in terms of accuracy it falls behind portrait-net model by a small margin. The model seems to be faster than mobilentv3 in iOS; but in android it seems likely to make only a marginal difference(due to optimized tflite swish operator).

We trained the sinet model with aisegment + baidu portrait dataset using input size 320 and cross entropy loss function, for 600 epochs and achieved an mIOU of 97.5%. The combined dataset consists of around 80K images(train+val), after data augmentaion. The final trained model has a size of 480kB and 86.91K parameters.

5. SelfieSegDLV3.py

DeepLabV3 models with ResNet-101 backbones (https://pytorch.org/hub/pytorch_vision_deeplabv3_resnet101/)

6. SelfieSegFCN.py

Fully-Convolutional Network model with ResNet-101 backbones. (https://pytorch.org/hub/pytorch_vision_fcn_resnet101/)

7. SelfieSegMP.py

MediaPipe Selfie Segmentation segments the prominent humans in the scene. It can run in real-time on both smartphones and laptops. The intended use cases include selfie effects and video conferencing, where the person is close (< 2m) to the camera. (https://google.github.io/mediapipe/solutions/selfie_segmentation.html)

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
images		images
models		models
LICENSE		LICENSE
LICENSE.1		LICENSE.1
README.md		README.md
SelfieSegDLV3.py		SelfieSegDLV3.py
SelfieSegFCN.py		SelfieSegFCN.py
SelfieSegMNV2.py		SelfieSegMNV2.py
SelfieSegMNV3.py		SelfieSegMNV3.py
SelfieSegMP.py		SelfieSegMP.py
SelfieSegPN.py		SelfieSegPN.py
SelfieSegSN.py		SelfieSegSN.py

License

Licenses found

Innovation4x/SelfieSeg

Folders and files

Latest commit

History

Repository files navigation

SelfieSeg

Dependencies

Dataset Links

Mobile-Unet Architecture

Selfie Segmentation code

License

Versioning

Acknowledgments

About

Resources

License

Licenses found

Stars

Watchers

Forks

Languages