#5. Convolutional Neural Networks part 2
Limitations about trational CNN:
- Mostly on centered images
- Only a single object per image
- Not enough for many real world vision tasks:
- Localisation
- Object Detection
- Semantic segmentation
- Instance segmentation
- Single object per image
- Predict coordinates of a bounding box (x,y,w,h)
- Evaluate via IoU
We don't know the number of objects in the image.
- Object proposal : find region of interest (ROIs) in the image
- Object classification : classify the object in these regions
Two main families:
- A grid in the image where each cell is a proposal (SSD, YOLO)
- Region proposal (SPP, MultiBox, Faster RCNN)
Instead of having a predefined set of box proposals, find them on the image:
- Selective search: from pixels (not learnt)
- Faster-RCNN : region proposal network(RPN)
Crop-and-resize operator (Rol-Pooling)
- comparison between different algo
- Mask RCNN, light-head R-CNN for best accuracy
- Yolo, SSD, Light-Head R-CNN for fast inference
- Mask-RCNN and other architectures
- Focal loss, Feature Pyramid Networks, etc
- Retina Net
- MegDet