deeplearning_course_notes/05_Convolutional_Neural_Networks_part_2.md at master · ShangyuanZ/deeplearning_course_notes · GitHub

#5. Convolutional Neural Networks part 2

Limitations about trational CNN:

Mostly on centered images
Only a single object per image
Not enough for many real world vision tasks:
- Localisation
- Object Detection
- Semantic segmentation
- Instance segmentation

Localisation

Single object per image
Predict coordinates of a bounding box (x,y,w,h)
Evaluate via IoU

Object detection

We don't know the number of objects in the image.

Object proposal : find region of interest (ROIs) in the image
Object classification : classify the object in these regions

Two main families:

A grid in the image where each cell is a proposal (SSD, YOLO)
Region proposal (SPP, MultiBox, Faster RCNN)

YOLO

SDD : single-shot detector

Box proposals

Instead of having a predefined set of box proposals, find them on the image:

Selective search: from pixels (not learnt)
Faster-RCNN : region proposal network(RPN)

Crop-and-resize operator (Rol-Pooling)

Fast-RCNN

Faster-RCNN

State of the art

comparison between different algo
Mask RCNN, light-head R-CNN for best accuracy
Yolo, SSD, Light-Head R-CNN for fast inference

Segmentation

Mask R-CNN

State of art

Mask-RCNN and other architectures
Focal loss, Feature Pyramid Networks, etc
Retina Net
MegDet