<h1>Object detection using YOLO</h1>

<p>In this tutorial we will implement object detection using pretrained YOLO model.</p>

<h3>Step 1: Download pretrained YOLO model and weights</h3>

<b>Get Weights</b><br/>
<code>wget http://pjreddie.com/media/files/yolo.weights</code>

<b>Get Model Configuration</b><br/>
<code>wget https://raw.githubusercontent.com/pjreddie/darknet/master/cfg/yolo.cfg</code>

<h2>Convert YOLO configuration to JSON</h2>

<p>To simplify building the model from the cfg file we will first convert the configureation into python dictionary</p>

<p>Lets explore the configuration file a litte and analyze what each of the sections mean. There are 6 types of sections in the configuration file</p>
<ul>
    <li>
        <b>net:</b> Contains hyperparameters and the input shape of model.
    </li>
    <li><b>convolutional:</b> Simple convolutional layer.</li>
    <li><b>maxpool:</b> Simple maxpool layer.</li>
    <li>
        <b>route:</b> This generates a shortcut by concatenating previous layers (inception).
    </li>
    <li>
        <b>reorg:</b> This operation moves elements from a channel into filter. For ex. a single channel 2x2 block will be rearranged into 1x1x4 block.
    </li>
    <li><b>region:</b> Contains hyperparameters from box filtering and non-max supression.</li>
</ul>

In [1]:
config_filepath = "./yolo.cfg"

BYTE_SIZE = 4

In [2]:
KEYS = ["net", "convolutional", "maxpool", "route", "reorg", "region"]

def convert_config_file_to_json(path):
    config_file = open(path)
    data = []
    
    def is_comment(line):
        return line.startswith("#")
    
    block = {}
    for line in config_file:
        line = line.strip()
        if line and not is_comment(line):
            if line.strip("[").strip("]") in KEYS:
                if block:
                    data.append(block)
                    
                block = {
                    "layer": line.strip("[").strip("]")
                }
            else:
                key, val = line.split("=")
                block[key.strip()] = val.strip()
                
    if block:
        data.append(block)
                
    return data
        

<h2>Lets build YOLO model in Keras</h2>

In [3]:
import tensorflow as tf
from keras.layers import Activation
from keras.layers import BatchNormalization
from keras.layers import Conv2D
from keras.layers import MaxPooling2D
from keras.layers import Lambda
from keras.layers.advanced_activations import LeakyReLU
from keras.regularizers import l2


def convolutional_block(X_IN, info, parameters):
    weights = get_conv_weights(info, X_IN)
    batch_normalize = info.get('batch_normalize')
    filters = int(info.get('filters'))
    size = int(info.get('size'))
    stride = int(info.get('stride'))
    X = Conv2D(
        filters, 
        (size, size),
        kernel_regularizer=l2(float(parameters.get('decay'))),
        use_bias=not batch_normalize,
        strides=(stride, stride),
        weights=weights[0],
        padding='same' if int(info.get('pad')) == 1 else 'valid'
    )(X_IN)

    if info.get('batch_normalize'):
        X = BatchNormalization(weights=weights[1])(X)
    
    if info.get('activation') == 'leaky':
        X = LeakyReLU(alpha=0.1)(X)
    return X

def maxpool_block(X_IN, info):
    size = int(info.get('size'))
    stride = int(info.get('stride'))
    return MaxPooling2D(
        pool_size=(size, size),
        strides=(stride, stride),
        padding='same'
    )(X_IN)


def space_to_depth_x2(x):
    """Thin wrapper for Tensorflow space_to_depth with block_size=2."""
    # Import currently required to make Lambda work.
    # See: https://github.com/fchollet/keras/issues/5088#issuecomment-273851273
    import tensorflow as tf
    return tf.space_to_depth(x, block_size=2)

def reorg_block(X_IN, info):
    return Lambda(space_to_depth_x2)(X_IN)


def get_conv_weights(layer, prev_layer):
    size = int(layer["size"])
    filters = int(layer["filters"])
    channels = prev_layer.shape[-1]
    weights_shape = (size, size, int(channels), filters)
    darknet_w_shape = (filters, int(channels), size, size)  # weights_shape.reverse()

    # number of bias term of a layer = number of filters
    conv_bias = np.ndarray(
        shape=(filters, ), 
        dtype='float32', 
        buffer=weights_file.read(BYTE_SIZE*filters))

    bn_weight_list = None
    
    if layer.get('batch_normalize'):
        # (gama, beta and epsilon) per filter
        bn_weights = np.ndarray(
            shape=(3, filters),
            dtype='float32',
            buffer=weights_file.read(BYTE_SIZE*3*filters)
        )
        bn_weight_list = [
            bn_weights[0],  # scale gamma
            conv_bias,  # shift beta
            bn_weights[1],  # running mean
            bn_weights[2]  # running var
        ]

    conv_weights = np.ndarray(
        shape=darknet_w_shape,
        dtype='float32',
        buffer=weights_file.read(BYTE_SIZE*np.product(darknet_w_shape))
    )
    
    conv_weights = np.transpose(conv_weights, [2, 3, 1, 0])
    
    conv_weights = [conv_weights] if layer.get('batch_normalize') else [conv_weights, conv_bias]
    
    return (conv_weights, bn_weight_list)
    

  from ._conv import register_converters as _register_converters
Using TensorFlow backend.


In [4]:
import numpy as np

from keras.layers import Input
from keras.layers.merge import concatenate

weights_filepath = "./yolo.weights"
weights_file = open(weights_filepath, 'rb')

layers_data = convert_config_file_to_json(config_filepath)
parameters = layers_data[0]

image_height = int(parameters['height'])
image_width = int(parameters['width'])
channels = int(parameters["channels"])

all_layers = [Input(shape=(image_height, image_width, channels))]

weights_header = np.ndarray(shape=(4, ), dtype='int32', buffer=weights_file.read(4*BYTE_SIZE))

for layer_info in layers_data:
    prev_layer = all_layers[-1]
    if layer_info["layer"] == "convolutional":
        layer = convolutional_block(prev_layer, layer_info, parameters)
        all_layers.append(layer)
    elif layer_info["layer"] == "maxpool":
        layer = maxpool_block(prev_layer, layer_info)
        all_layers.append(layer)
    elif layer_info["layer"] == "route":
        ids = [int(i) for i in layer_info['layers'].split(',')]
        concat_layers = [all_layers[i] for i in ids]
        if len(concat_layers) > 1:
            all_layers.append(concatenate(concat_layers))
        else:
            all_layers.append(concat_layers[0])
    elif layer_info["layer"] == "reorg":
        layer = reorg_block(prev_layer, layer_info)
        all_layers.append(layer)
        

remaining_weights = len(weights_file.read()) / BYTE_SIZE

assert remaining_weights == 0, "There are remaining weights."

In [5]:
from keras.models import Model

model = Model(inputs=all_layers[0], outputs=all_layers[-1])
print(model.summary())

model.save("yolo.h5")

__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
input_1 (InputLayer)            (None, 608, 608, 3)  0                                            
__________________________________________________________________________________________________
conv2d_1 (Conv2D)               (None, 608, 608, 32) 864         input_1[0][0]                    
__________________________________________________________________________________________________
batch_normalization_1 (BatchNor (None, 608, 608, 32) 128         conv2d_1[0][0]                   
__________________________________________________________________________________________________
leaky_re_lu_1 (LeakyReLU)       (None, 608, 608, 32) 0           batch_normalization_1[0][0]      
__________________________________________________________________________________________________
max_poolin