# Object Detection for Auto Driving

In [8]:
import os
import warnings
warnings.filterwarnings('ignore')
from matplotlib.pyplot import imshow
import scipy.io
import scipy.misc
import tensorflow as tf
from keras import backend as K
from keras.models import load_model
from yolo_utils import read_classes, read_anchors, generate_colors, preprocess_image, draw_boxes, scale_boxes
from yad2k.models.keras_yolo import yolo_head, yolo_boxes_to_corners

## 1 Problem

Dataset is provided by [drive.ai](https://www.drive.ai/). Images were gathered from cameras mounted to the front of cars. We want to use YOLO algorithm to recognize objects in images. Recognized objects are labelled by a square box. In the notebook, I did following:
- F
- max
- 

### Definition of a box
$b_x$ and $b_y$ define center of box and $b_h$ and $b_w$ define size of box. If there are 80 categories to recognize, I can either represent the category of object by:
- $i)$ label c as an integer from 1 to 80: 6 elements to represent a box
- $ii)$ one hot vector with $c_{th} $ place as 1 and all others as 0s: 85 elements to represent a box

<img src="nb_images/box_label.png" style="width:500px;height:250;">

## 2 YOLO

YOLO ("you only look once") requires only one forward propagation pass through the network to make predictions. Thus it "only looks once" at the image.

### 2.1 Model details

- The **input** is m images in tensor of shape (m, 608, 608, 3)
- The **output** is a list of boxes along with the recognized classes (m, 19, 19, 5, 85). Each image is cut into 19*19 cells. Each cell has five boxes. Each bounding box is represented by 6 numbers $(p_c, b_x, b_y, b_h, b_w, c)$ as explained above. If $c$ is expanded into an 80-dimensional vector, each bounding box is then represented by 85 numbers. 

If the center/midpoint of an object falls into a grid cell, that grid cell is responsible for detecting that object. A cell can have maximum of 5 objects centered inside.

YOLO architecture: IMAGE (m, 608, 608, 3) -> DEEP CNN -> ENCODING (m, 19, 19, 5, 85).

<img src="nb_images/architecture.png" style="width:700px;height:400;">

### 2.2 - Filtering boxes with class scores

Each cell gives 5 boxes. So the model can predict 19x19x5=1805 boxes by just looking once at the image. So we need
- First, only keep boxes with high class score (more confident about detecting an object)
- Second, only keep one box when several overlapping boxes are detecting the same object
<img src="nb_images/anchor_map.png" style="width:200px;height:200;">

**yolo_filter_boxes( box_confidence, boxes, box_class_probs, threshold)** will filter boxes:

Step 1:
Scores of every class are calculated by $p_c$ * ($c_1$, $c_2$, ..., $c_{79}$, $c_{80}$)
- "box_confidence" is $p_c$, a tensor of shape (19, 19, 5, 1)
- "box_class_probs" is ( $c_1$, $c_2$, ..., $c_{79}$, $c_{80}$), a tensor of shape (19, 19, 5, 80)
- "boxes" is sizes of all the boxes, containing $(b_x, b_y, b_h, b_w)$, a tensor of shape (19, 19, 5, 4)

Step 2:
In every box, find the index and value of class with max score. Index is saved as "box_classes" and value is saved as "box_class_scores". Create a filtering mask based on "box_class_scores" by using "threshold".

Step 3:
Apply filtering mask to all boxes and got boxes with scores higher than threshold.
- "scores" -- tensor of shape (number_selected_boxes, 1), containing the class probability score for selected boxes
- "boxes" -- tensor of shape (number_selected_boxes, 4), containing $(b_x, b_y, b_h, b_w)$ coordinates of selected boxes
- "classes" -- tensor of shape (number_selected_boxes, 1), containing the index of the class detected by the selected boxes

In [7]:
def yolo_filter_boxes(box_confidence, boxes, box_class_probs, threshold = .6):
    # Step 1: Compute box scores.
    box_scores = box_confidence * box_class_probs
    
    # Step 2: find the index and value of class with max score.
    box_classes = K.argmax(box_scores, axis=-1)
    box_class_scores = K.max(box_scores, axis=-1, keepdims=False)
    # Create a filtering mask based on "box_class_scores" by using "threshold". The mask have the
    # same dimension as box_class_scores, and be True for the boxes you want to keep 
    filtering_mask = box_class_scores>=threshold
    
    # Step 3: Apply the mask to scores, boxes and classes, select box with score higher than threshold
    scores = tf.boolean_mask(box_class_scores, filtering_mask)
    boxes = tf.boolean_mask(boxes, filtering_mask)
    classes = tf.boolean_mask(box_classes, filtering_mask)
    
    return scores, boxes, classes