# Object Detection: Bounding Box Regressiong with Keras, TensorFlow, and Deep Learning

Custom deep learning model to perform object detection via bounding box regression with Keras and Tensorflow.

## Basic R-CNN Object Detectors

These detectors rely on the concept of **region proposal** regenerators.

These region proposal algorithms (e.g., Selective Search) examing an input image and then identify where a potential object _could_ be. It doesn't yet know if an object exists in a given location, just that the area of the image looks interesting and warrants further inspection.

In the classic implementation,these region proposals were used to extract output features from a pre-trained CNN and then were fed into an SVM for final classification. In this implementation, the location from the regional proposal was treated as the bounding box, while the SVM produced the class label for the bounding box region.

Essentially, the original R-CNN architecture didn't _learn_ how to detect bounding boxes--it was not end-to-end trainable. The key then is the concept of bounding box regression.

## What is Bounding Box Regression?

`Image Classification`:
- present an input image to the CNN
- perform a forward pass through the CNN
- output a vector with _N_ elements, where _N_ is the total number of class labels
- select the class label with the largest probability as our final predicted class label

Unfortunately, that type of model doesn't translate to object detection. It would be impossible for us to construct a class label for every possible combination of xy-coordinate bounding boxes in an input image.

Instead, we need to rely on a different type of machine learning called _regression_. Unlike classification, which produces a label, regression enables us to predict continuous values.

Typically, we scale the output range of values to [0, 1] during training and then scale the outputs back after prediction (if needed).

`Bounding Box Regression`:
- at the head of the network, place a fully-connected layer with four neurons corresponding to the top-left and bottom-right xy-coordinates
- given that four-neuron layer, implement a sigmoid activation function such that the outputs are returned in the range of [0, 1]
- train the model using a loss function such as MSE or MAE on training data that consists of:
    1. the input images
    2. the bounding box of the object in the image

After Training, we can present an input image to our bounding box regressor network. Our network will then perform a forward pass and then predict the output bounding box coordinates of the object.

## `OpenCV`

In [3]:
import cv2

### Basic Operations

- `cv.imread()` reads an image as an array
- `cv.resize()` manually scales the image

In [4]:
# print all flags for colour conversions
flags = [i for i in dir(cv2) if i.startswith('COLOR_')]
print(flags)

['COLOR_BAYER_BG2BGR', 'COLOR_BAYER_BG2BGRA', 'COLOR_BAYER_BG2BGR_EA', 'COLOR_BAYER_BG2BGR_VNG', 'COLOR_BAYER_BG2GRAY', 'COLOR_BAYER_BG2RGB', 'COLOR_BAYER_BG2RGBA', 'COLOR_BAYER_BG2RGB_EA', 'COLOR_BAYER_BG2RGB_VNG', 'COLOR_BAYER_BGGR2BGR', 'COLOR_BAYER_BGGR2BGRA', 'COLOR_BAYER_BGGR2BGR_EA', 'COLOR_BAYER_BGGR2BGR_VNG', 'COLOR_BAYER_BGGR2GRAY', 'COLOR_BAYER_BGGR2RGB', 'COLOR_BAYER_BGGR2RGBA', 'COLOR_BAYER_BGGR2RGB_EA', 'COLOR_BAYER_BGGR2RGB_VNG', 'COLOR_BAYER_GB2BGR', 'COLOR_BAYER_GB2BGRA', 'COLOR_BAYER_GB2BGR_EA', 'COLOR_BAYER_GB2BGR_VNG', 'COLOR_BAYER_GB2GRAY', 'COLOR_BAYER_GB2RGB', 'COLOR_BAYER_GB2RGBA', 'COLOR_BAYER_GB2RGB_EA', 'COLOR_BAYER_GB2RGB_VNG', 'COLOR_BAYER_GBRG2BGR', 'COLOR_BAYER_GBRG2BGRA', 'COLOR_BAYER_GBRG2BGR_EA', 'COLOR_BAYER_GBRG2BGR_VNG', 'COLOR_BAYER_GBRG2GRAY', 'COLOR_BAYER_GBRG2RGB', 'COLOR_BAYER_GBRG2RGBA', 'COLOR_BAYER_GBRG2RGB_EA', 'COLOR_BAYER_GBRG2RGB_VNG', 'COLOR_BAYER_GR2BGR', 'COLOR_BAYER_GR2BGRA', 'COLOR_BAYER_GR2BGR_EA', 'COLOR_BAYER_GR2BGR_VNG', 'COLOR_

In [5]:
# print all flags for resizing methods
flags = [i for i in dir(cv2) if i.startswith('INTER_')]
print(flags)

['INTER_AREA', 'INTER_BITS', 'INTER_BITS2', 'INTER_CUBIC', 'INTER_LANCZOS4', 'INTER_LINEAR', 'INTER_LINEAR_EXACT', 'INTER_MAX', 'INTER_NEAREST', 'INTER_NEAREST_EXACT', 'INTER_TAB_SIZE', 'INTER_TAB_SIZE2']


#### Transformations

- `cv.warpAffine(img, M, (cols, rows))`
    - `rows`, `cols` is `image.shape` or _width_, _height_
    - `M` is a transformation matrix (e.g., `np.float32([ [1,0,100], [0,1,50] ])`)
    - `M` as `cv.getRotationMatrix2D( ( ( cols-1)/2.0, (rows-1)/2.0 ), 90, 1)` rotates the image by 90 degrees around center
    - `M` as `cv.getAffineTransform(pts1, pts2)` where `pts` are arrays of xy-points, before and after the transform
    = `M` as `cv.getPrespectiveTransform(pts1, pts2)` where `pts` are arrays of xy-points, before and after the transform

### Saving Images

Open palette image and remove pointless alpha channel \
`im = Image.open('image.png').convert('P')`

Extract palette and save as CSV \
`np.array(im.getpalette()).tofile('palette.csv',sep=',')`

Save palette indices as single channel PGM image that OpenCV can read \
`na = np.array(im)`
`im = Image.fromarray(na).save('indices.pgm')`

First load indices \
`im = Image.open('indices.pgm')`

Now load palette \
`palette = np.fromfile('palette.csv',sep=',').astype(np.uint8)`

Put palette back into image \
`im.putpalette(palette)`

Save \
`im.save('result.png')`

## Next Steps...