Selfie/Portrait Segmentation Models
- Tensorflow(>=1.14.0), Python 3
- Keras(>=2.2.4), Kito, Scipy, Dlib
- Opencv(>=3.4), PIL, Matplotlib
- PyTorch(>=1.9.0)
- Mediapipe
Also checkout the datset: UCF Selfie
SelfieSegMNV2 and SelfieSegMNV3 uses a upsampling block with Transpose Convolution with a stride of 2 for the decoder part.
Additionaly, it uses dropout regularization to prevent overfitting. It also helps our network to learn more robust features during training.
1. SelfieSegMNV2.py
Here the inputs and outputs are images of size 128x128. The backbone is mobilenetv2 with depth multiplier 0.5 as encoder (feature extractor).
2. SelfieSegMNV3.py
Here the inputs and outputs are images of size 224x224. The backbone is mobilenetv3 with depth multiplier 0.5 as encoder (feature extractor).
3. SelfieSegPN.py (PortraitNet)
The decoder module consists of refined residual block with depthwise convolution and up-sampling blocks with transpose convolution. Also, it uses elementwise addition instead of feature concatenation in the decoder part. The encoder of the model is mobilnetev2 and it uses a four channel input, unlike the ohter models, for leveraging temporal consistency. As a result, the output video segmentaion appears more stabilized compared to other models. Also, it was observed that depthwise convolution and elementwise addition in decoder greatly improves the speed of the model.
- Dataset: Portrait-mix (PFCN+Baidu+Supervisely)
- Size: 224x224
4. SelfieSegSN.py (SINet: Extreme Lightweight Portrait Segmentation)
SINet is an lightweight portrait segmentaion dnn architecture for mobile devices. The model which contains around 86.9 K parameters is able to run at 100 FPS on iphone (input size -224) , while maintaining the accuracy under an 1% margin from the state-of-the-art portrait segmentation method. The proposed portrait segmentation model conatins two new modules for fast and accurate segmentaion viz. information blocking decoder structure and spatial squeeze modules.
-
Information Blocking Decoder: It measures the confidence in a low-resolution feature map, and blocks the influence of high-resolution feature maps in highly confident pixels. This prevents noisy information to ruin already certain areas, and allows the model to focuson regions with high uncertainty.
-
Spatial Squeeze Modules: The S2 module is an efficient multipath network for feature extraction. Existing multi-path structures deal with the various size of long-range dependencies by managing multiple receptive fields. However, this increases latency in real implementations, due to having unsuitable structure with regard to kernel launching and synchronization. To mitigate this problem, they squeeze the spatial resolution from each feature map by average pooling, and show that this is more effective than adopting multi-receptive fields.
Besides the aforementioned features, the SINet architecture uses depthwise separable convolution and PReLU actiavtion in the encoder modules. They also use Squeeze-and-Excitation (SE) blocks that adaptively recalibrates channel-wise feature responses by explicitly modelling interdependencies between channels, for improving the model accuracy. For training, they used cross entropy loss with additional boundary refinement. In general it is faster and smaller than most of the portrait segmentaion models; but in terms of accuracy it falls behind portrait-net model by a small margin. The model seems to be faster than mobilentv3 in iOS; but in android it seems likely to make only a marginal difference(due to optimized tflite swish operator).
We trained the sinet model with aisegment + baidu portrait dataset using input size 320 and cross entropy loss function, for 600 epochs and achieved an mIOU of 97.5%. The combined dataset consists of around 80K images(train+val), after data augmentaion. The final trained model has a size of 480kB and 86.91K parameters.
5. SelfieSegDLV3.py
DeepLabV3 models with ResNet-101 backbones (https://pytorch.org/hub/pytorch_vision_deeplabv3_resnet101/)
6. SelfieSegFCN.py
Fully-Convolutional Network model with ResNet-101 backbones. (https://pytorch.org/hub/pytorch_vision_fcn_resnet101/)
7. SelfieSegMP.py
MediaPipe Selfie Segmentation segments the prominent humans in the scene. It can run in real-time on both smartphones and laptops. The intended use cases include selfie effects and video conferencing, where the person is close (< 2m) to the camera. (https://google.github.io/mediapipe/solutions/selfie_segmentation.html)
This project is licensed under the terms of the MIT license.
Version 1.0
- https://github.com/anilsathyan7/Portrait-Segmentation
- https://www.tensorflow.org/model_optimization
- https://www.tensorflow.org/lite/performance/gpu_advanced
- https://github.com/cainxx/image-segmenter-ios
- https://github.com/gallifilo/final-year-project
- https://github.com/dong-x16/PortraitNet
- https://github.com/ZHKKKe/MODNet
- https://github.com/clovaai/ext_portrait_segmentation
- https://github.com/tantara/JejuNet
- https://github.com/lizhengwei1992/mobile_phone_human_matting
- https://github.com/dailystudio/ml/tree/master/deeplab
- https://github.com/PINTO0309/TensorflowLite-UNet
- https://github.com/xiaochus/MobileNetV3
- https://github.com/yulu/GLtext
- https://github.com/berak/opencv_smallfry/blob/master/java_dnn
- https://github.com/HasnainRaz/SemSegPipeline
- https://github.com/onnx/tensorflow-onnx
- https://github.com/onnx/keras-onnx
- https://machinethink.net/blog/mobilenet-v2/
- On-Device Neural Net Inference with Mobile GPUs
- AI Benchmark: All About Deep Learning on Smartphones in 2019
- Searching for MobileNetV3
- Google AI Blog: MobilenetV3
- Youtube Stories: Mobile Real-time Video Segmentation
- Facebook SparkAR: Background Segmentation
- Learning to Predict Depth on the Pixel 3 Phones
- uDepth: Real-time 3D Depth Sensing on the Pixel 4
- iOS Video Depth Maps Tutorial
- Huawei: Portrait Segmentation
- Deeplab Image Segmentation
- Tensorflow - Image segmentation
- Official Tflite Segmentation Demo
- Tensorflowjs - Tutorials
- Hyperconnect - Tips for fast portrait segmentation
- Prismal Labs: Real-time Portrait Segmentation on Smartphones
- Keras Documentation
- Boundary-Aware Network for Fast and High-Accuracy Portrait Segmentation
- Fast Deep Matting for Portrait Animation on Mobile Phone
- Adjust Local Brightness for Image Augmentation
- Pyimagesearch - Super fast color transfer between images
- OpenCV with Python Blueprints
- Pysource - Background Subtraction
- Learn OpenCV - Seamless Cloning using OpenCV
- Deep Image Harmonization
- Tfjs Examples - Webcam Transfer Learning
- Opencv Samples: DNN-Classification
- Deep Learning In OpenCV
- BodyPix - Person Segmentation in the Browser
- High-Resolution Network for Photorealistic Style Transfer
- Tflite Benchmark Tool
- TensorFlow Lite Android Support Library
- TensorFlow Lite Hexagon delegate
- Tensorflow lite gpu delegate inference using opengl and SSBO in android
- Udacity: Intel Edge AI Fundamentals Course
- Udacity: Introduction to TensorFlow Lite
- Android: Hair Segmentation with GPU
- Image Effects for Android using OpenCV: Image Blending
- Converting Bitmap to ByteBuffer (float) in Tensorflow-lite Android
- Real-time Hair Segmentation and Recoloring on Mobile GPUs
- PortraitNet: Real-time portrait segmentation network for mobile device
- ONNX2Keras Converter
- Google: Coral AI
- Hacking Google Coral Edge TPU
- Peter Warden's Blog: How to Quantize Neural Networks with TensorFlow
- Tensorflow: Post Training Quantization
- Qualcomm Hexagon 685 DSP is a Boon for Machine Learning
- How Qualcomm Brought Tremendous Improvements in AI Performance to the Snapdragon 865
- TF-TRT 2.0 Workflow With A SavedModel
- NVIDIA-AI-IOT: Deepstream_Python_Applications
- Awesome Tflite: Models, Samples, Tutorials, Tools & Learning Resources.
- Google: Machine Learning Bootcamp for Mobile Developers
- Machinethink: New mobile neural network architectures
- Deeplab Tflite Tfhub
- MediaPipe with Custom tflite Model
- Google Mediapipe Github