# Object Detection: Bounding Box Regressiong with Keras, TensorFlow, and Deep Learning

Custom deep learning model to perform object detection via bounding box regression with Keras and Tensorflow.

## Basic R-CNN Object Detectors

These detectors rely on the concept of **region proposal** regenerators.

These region proposal algorithms (e.g., Selective Search) examing an input image and then identify where a potential object _could_ be. It doesn't yet know if an object exists in a given location, just that the area of the image looks interesting and warrants further inspection.

In the classic implementation,these region proposals were used to extract output features from a pre-trained CNN and then were fed into an SVM for final classification. In this implementation, the location from the regional proposal was treated as the bounding box, while the SVM produced the class label for the bounding box region.

Essentially, the original R-CNN architecture didn't _learn_ how to detect bounding boxes--it was not end-to-end trainable. The key then is the concept of bounding box regression.

## What is Bounding Box Regression?

`Image Classification`:
- present an input image to the CNN
- perform a forward pass through the CNN
- output a vector with _N_ elements, where _N_ is the total number of class labels
- select the class label with the largest probability as our final predicted class label

Unfortunately, that type of model doesn't translate to object detection. It would be impossible for us to construct a class label for every possible combination of xy-coordinate bounding boxes in an input image.

Instead, we need to rely on a different type of machine learning called _regression_. Unlike classification, which produces a label, regression enables us to predict continuous values.

Typically, we scale the output range of values to [0, 1] during training and then scale the outputs back after prediction (if needed).

`Bounding Box Regression`:
- at the head of the network, place a fully-connected layer with four neurons corresponding to the top-left and bottom-right xy-coordinates
- given that four-neuron layer, implement a sigmoid activation function such that the outputs are returned in the range of [0, 1]
- train the model using a loss function such as MSE or MAE on training data that consists of:
    1. the input images
    2. the bounding box of the object in the image

After Training, we can present an input image to our bounding box regressor network. Our network will then perform a forward pass and then predict the output bounding box coordinates of the object.

### Build Dynamic Paths to Configuration File, `config.py`

In [14]:
import os

BASE_PATH = '../data/caltech-101/'
IMAGES_PATH = os.path.sep.join([BASE_PATH, '101_ObjectCategories/airplanes'])
ANNOTS_PATH = os.path.sep.join([BASE_PATH, 'Annotations/Airplanes_Side_2'])

In [15]:
from scipy.io import loadmat

annots = loadmat(os.path.sep.join([ANNOTS_PATH, 'annotation_0001.mat']))

In [16]:
annots

{'__header__': b'MATLAB 5.0 MAT-file, Platform: PCWIN, Created on: Tue Dec 14 11:03:29 2004',
 '__version__': '1.0',
 '__globals__': [],
 'box_coord': array([[ 30, 137,  49, 349]], dtype=uint16),
 'obj_contour': array([[  8.54082661,  11.87852823,   1.86542339,   1.56199597,
          31.60131048,  27.65675403,  23.71219758,  18.85735887,
          18.85735887,  31.60131048,  47.68296371,  51.32409274,
          59.51663306,  60.1234879 ,  56.78578629,  78.02570565,
          91.07308468, 178.46018145, 179.97731855, 222.15372984,
         225.79485887, 239.75252016, 265.84727823, 298.92086694,
         300.13457661, 298.3140121 , 265.54385081, 264.63356855,
         270.39868952, 268.88155242, 265.84727823, 264.02671371,
         260.08215726, 255.83417339, 257.6547379 , 261.90272177,
         261.90272177, 160.25453629, 160.25453629, 156.00655242,
         155.39969758, 149.33114919, 142.04889113, 139.31804435,
         139.92489919, 143.26260081, 136.28377016, 128.09122984,
         

In [4]:
from modules import config, utils
import cv2

import os
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns; sns.set_theme(style='darkgrid')
%matplotlib inline

In [6]:
cv2.imread(f'{config.BASE_PATH}/thumb_01_01_0001.jpg').shape

(480, 640, 3)

In [4]:
images = []
    
if config.COLORSPACE == 'grayscale':
    for location in utils.get_image_locs():
        images.append(cv2.imread(location, cv2.IMREAD_GRAYSCALE))
        
elif config.COLORSPACE == 'alpha':
    for location in utils.get_image_locs():
        images.append(cv2.imread(location, cv2.IMREAD_UNCHANGED))
        
else:
    for location in utils.get_image_locs():
        images.append(cv2.imread(location))

In [12]:
image_numpy = np.array(images)
names = ['image', 'height', 'width']
index = pd.MultiIndex.from_product(
    [range(dim) for dim in image_numpy.shape],
    names=names)

In [8]:
names = ['image', 'height', 'width']
img_numpy = np.array(images)

# iterate through shapes and assign names
index = pd.MultiIndex.from_product(
    [range(dim) for dim in img_numpy.shape],
    names=names)

# use multiindexing to configure the dataframe
img_df = pd.DataFrame(
    { names[0] : img_numpy.flatten() },
    index=index)

img_df = img_df['image']
img_df = img_df.unstack(level='width').sort_index()

ValueError: Must pass 2-d input. shape=(5999, 480, 640, 3)