## Object Detection On Command
---

### Overview

This project will explore YOLO algorithm with Resnet50 using transfer learning, a method to use a pre-trained neural network and adapt the model to perform different tasks with different dataset, and design a model that is compatible with real-time video processing. PASCAL VOC (Visual Object Classes) 2012 dataset  will be in use of training models, and the model will be tested on real-time video data collected from webcams.


### The Road Ahead

We break the notebook into separate steps.  Feel free to use the links below to navigate the notebook.

* [Step 1](#step1): Understand PASCAL VOC 2012 Data
* [Step 2](#step2): Model Architecture
* [Step 3](#step3): Training
* [Step 4](#step4): -
* [Step 5](#step5): -
* [Step 6](#step6): -
* [Step 7](#step7): -
* [Step 8](#step8): -
---
<a id='step1'></a>
## Step 1: Understand PASCAL VOC 2012 Data

### Combining annotation files into one CSV file

PASCAL VOC 2012 dataset has corresponding .xml file with annotation for each image jpg file.
Due to its complexity, we need to combine all these .xml files into one single csv file via panda modules.

In [54]:
import matplotlib.pyplot as plt
import numpy as np
import os, cv2
%matplotlib inline
os.getcwd()


'/home/ubuntu/VOCdevkit'

### Define 20 classes for PASCAL VOC 2012 dataset

There are 20 class labels in PASCAL VOC 2012 dataset.

In [2]:
LABELS = ['aeroplane',  'bicycle', 'bird',  'boat',      'bottle', 
          'bus',        'car',      'cat',  'chair',     'cow',
          'diningtable','dog',    'horse',  'motorbike', 'person',
          'pottedplant','sheep',  'sofa',   'train',   'tvmonitor']

### Define Anchor Boxes - YOLO Algorithm

Anchor box is a determinative concept of YOLO Algorithm. In [this github forum](https://github.com/pjreddie/darknet/issues/568), the user “vkmenon” defines anchor box as follows: “… most bounding boxes have certain height-width ratios. So instead of directly predicting a bounding box, YOLOv2 (and v3) predict off-sets from a predetermined set of boxes with particular height-width ratios - those predetermined set of boxes are the anchor boxes.”

These predetermined boxes can be calculated with K-means clustering. The number and size of the boxes have to be set in a way to maximize the IoU (Intersection over Union) of bounding boxes. For more details, please refer to [this page](https://fairyonice.github.io/Part_1_Object_Detection_with_Yolo_for_VOC_2014_data_anchor_box_clustering.html) by Yumi, whose tutorial is forming a profound basis of this project.

The anchor boxes below are selected by the author of [the blog](https://fairyonice.github.io/Part_1_Object_Detection_with_Yolo_for_VOC_2014_data_anchor_box_clustering.html).

In [3]:
ANCHORS = np.array([1.07709888,  1.78171903,  # anchor box 1, width , height
                    2.71054693,  5.12469308,  # anchor box 2, width,  height
                   10.47181473, 10.09646365,  # anchor box 3, width,  height
                    5.48531347,  8.11011331]) # anchor box 4, width,  height

### Read in images and annotations



In [4]:
train_image_folder = "../VOCdevkit/VOC2012/JPEGImages/"
train_annot_folder = "../VOCdevkit/VOC2012/Annotations/"

np.random.seed(1)
from backend import parse_annotation
train_image, seen_train_labels = parse_annotation(train_annot_folder,
                                                  train_image_folder, 
                                                  labels=LABELS)

print("N train = {}".format(len(train_image)))

Using TensorFlow backend.


N train = 17125


### Generate Batches

In [27]:
from backend import SimpleBatchGenerator

BATCH_SIZE        = 200
IMAGE_H, IMAGE_W  = 416, 416
GRID_H,  GRID_W   = 13 , 13
TRUE_BOX_BUFFER   = 50
BOX               = int(len(ANCHORS)/2)
CLASS             = len(LABELS)


generator_config = {
    'IMAGE_H'         : IMAGE_H, 
    'IMAGE_W'         : IMAGE_W,
    'GRID_H'          : GRID_H,  
    'GRID_W'          : GRID_W,
    'LABELS'          : LABELS,
    'ANCHORS'         : ANCHORS,
    'BATCH_SIZE'      : BATCH_SIZE,
    'TRUE_BOX_BUFFER' : TRUE_BOX_BUFFER,
}


def normalize(image):
    return image / 255.
train_batch_generator = SimpleBatchGenerator(train_image, generator_config,
                                             norm=normalize, shuffle=True)



---
<a id='step2'></a>
## Step 2: Model Architecture


### Base Model : Resnet50

In [6]:
from keras.applications import ResNet50
from keras.layers import Input
input_image = Input(shape=(IMAGE_H, IMAGE_W, 3),name="input_image")
true_boxes  = Input(shape=(1, 1, 1, TRUE_BOX_BUFFER , 4),name="input_hack")
base_model= ResNet50(include_top=False,weights='imagenet',input_shape= (IMAGE_H, IMAGE_W, 3))
base_model.trainable = False

base_model.summary()

____________________________________________________________________________________________________
Layer (type)                     Output Shape          Param #     Connected to                     
input_1 (InputLayer)             (None, 416, 416, 3)   0                                            
____________________________________________________________________________________________________
conv1 (Conv2D)                   (None, 208, 208, 64)  9472        input_1[0][0]                    
____________________________________________________________________________________________________
bn_conv1 (BatchNormalization)    (None, 208, 208, 64)  256         conv1[0][0]                      
____________________________________________________________________________________________________
activation_1 (Activation)        (None, 208, 208, 64)  0           bn_conv1[0][0]                   
___________________________________________________________________________________________

### Classifier Model

In [52]:
from keras.models import Sequential, Model
from keras.layers import Reshape, Activation, Conv2D, Input, MaxPooling2D, BatchNormalization, Flatten, Dense, Lambda, Dropout
from backend import ConvBatchLReLu


clf_model = Sequential()

clf_model.add(Dense(4 + 1 + CLASS, input_shape=(GRID_H, GRID_W, BOX, 4 + 1 + CLASS), init="uniform",activation="relu", name = "dense_1"))
clf_model.add(Dense(4 + 1 + CLASS, activation="relu", kernel_initializer="uniform", name = "dense_2"))
#clf_model.add(Dense(GRID_H*GRID_W*(BOX * (4 + 1 + CLASS)), activation='linear',name = "dense_3"))

clf_model.add(Reshape((GRID_H, GRID_W, BOX, 4 + 1 + CLASS),name="final_output"))
clf_model.summary()


_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_1 (Dense)              (None, 13, 13, 4, 25)     650       
_________________________________________________________________
dense_2 (Dense)              (None, 13, 13, 4, 25)     650       
_________________________________________________________________
final_output (Reshape)       (None, 13, 13, 4, 25)     0         
Total params: 1,300
Trainable params: 1,300
Non-trainable params: 0
_________________________________________________________________


  


In [56]:
#Code by George Seif: https://towardsdatascience.com/transfer-learning-for-image-classification-using-keras-c47ccf09c8c8

def build_finetune_model(base_model, dropout, fc_layers, num_classes):
    for layer in base_model.layers:
        layer.trainable = False

    x = base_model.output
    x = Flatten()(x)
    for fc in fc_layers:
        # New FC layer, random init
        x = Dense(fc, activation='relu')(x) 
        x = Dropout(dropout)(x)

    # New softmax layer
    predictions = Dense(num_classes, activation='softmax')(x) 
    
    finetune_model = Model(inputs=base_model.input, outputs=predictions)

    return finetune_model


FC_LAYERS = [1024, 1024]
dropout = 0.5

finetune_model = build_finetune_model(base_model, 
                                      dropout=dropout, 
                                      fc_layers=FC_LAYERS, 
                                      num_classes=4+1+CLASS)
finetune_model.summary()


____________________________________________________________________________________________________
Layer (type)                     Output Shape          Param #     Connected to                     
input_1 (InputLayer)             (None, 416, 416, 3)   0                                            
____________________________________________________________________________________________________
conv1 (Conv2D)                   (None, 208, 208, 64)  9472        input_1[0][0]                    
____________________________________________________________________________________________________
bn_conv1 (BatchNormalization)    (None, 208, 208, 64)  256         conv1[0][0]                      
____________________________________________________________________________________________________
activation_1 (Activation)        (None, 208, 208, 64)  0           bn_conv1[0][0]                   
___________________________________________________________________________________________

### Initialize Weights

In [59]:
from backend import initialize_weight
layer = finetune_model.layers[-1]

initialize_weight(layer,sd=1/(GRID_H*GRID_W))

## Load pre-trained YOLOv2 weights

In [63]:
#from backend import set_pretrained_weight
#path_to_weight = "./yolov2.weights"
#nb_conv        = 22
#model          = set_pretrained_weight(finetune_model,nb_conv, path_to_weight)
#layer          = model.layers[-1] # the last convolutional layer
#initialize_weight(layer,sd=1/(GRID_H*GRID_W))

---
<a id='step3'></a>
## Step 3: Training

### Loss Function


In [60]:
from backend import custom_loss_core 
GRID_W             = 13
GRID_H             = 13
BATCH_SIZE         = 34
LAMBDA_NO_OBJECT = 1.0
LAMBDA_OBJECT    = 5.0
LAMBDA_COORD     = 1.0
LAMBDA_CLASS     = 1.0
    
def custom_loss(y_true, y_pred):
    return(custom_loss_core(
                     y_true,
                     y_pred,
                     true_boxes,
                     GRID_W,
                     GRID_H,
                     BATCH_SIZE,
                     ANCHORS,
                     LAMBDA_COORD,
                     LAMBDA_CLASS,
                     LAMBDA_NO_OBJECT, 
                     LAMBDA_OBJECT))

### Training

In [61]:
from keras.callbacks import EarlyStopping, ModelCheckpoint
from keras.optimizers import SGD, Adam, RMSprop

dir_log = "logs/"
try:
    os.makedirs(dir_log)
except:
    pass


BATCH_SIZE   = 32
generator_config['BATCH_SIZE'] = BATCH_SIZE

early_stop = EarlyStopping(monitor='loss', 
                           min_delta=0.001, 
                           patience=3, 
                           mode='min', 
                           verbose=1)

checkpoint = ModelCheckpoint('weights_yolo_on_voc2012.h5', 
                             monitor='loss', 
                             verbose=1, 
                             save_best_only=True, 
                             mode='min', 
                             period=1)


optimizer = Adam(lr=0.5e-4, beta_1=0.9, beta_2=0.999, epsilon=1e-08, decay=0.0)
#optimizer = SGD(lr=1e-4, decay=0.0005, momentum=0.9)
#optimizer = RMSprop(lr=1e-4, rho=0.9, epsilon=1e-08, decay=0.0)

finetune_model.compile(loss=custom_loss, optimizer=optimizer)

In [62]:
finetune_model.fit_generator(generator        = train_batch_generator, 
                    steps_per_epoch  = len(train_batch_generator), 
                    epochs           = 50, 
                    verbose          = 1,
                    #validation_data  = valid_batch,
                    #validation_steps = len(valid_batch),
                    callbacks        = [early_stop, checkpoint], 
                    max_queue_size   = 3)

Epoch 1/50


ValueError: Error when checking model input: the list of Numpy arrays that you are passing to your model is not the size the model expected. Expected to see 1 array(s), but instead got the following list of 2 arrays: [array([[[[ 0.49411765,  0.45882353,  0.55294118],
         [ 0.40784314,  0.39607843,  0.49019608],
         [ 0.3254902 ,  0.34117647,  0.43921569],
         ..., 
         [ 0.81176471,  0.81568627...