<h3> Object Detection </h3>

<b>1. Using an Image Classifier(pretrained) to detect objects using keras and openCV </b>

Here we take a Convolutional Neural Network trained for image classification (pre-trained RESNET-50) and utilize `image pyramids`, `sliding windows`, and `non-maxima suppression` to build a basic object detector.Basically we combine traditional computer vision object detection algorithms with deep learning.


<b> In Image Classification :</b> Input : Image --> Output : Class Label  <br>We present the input image to our neural network, and we obtain a single class label and a probability associated with the class label prediction.This class label characterizes the contents ( the most dominant and  visible contents) of the image.<br>

<b> Object Detection :</b> Along with outputting the class labels i.e the objects present in the image, it also outputs where in the image the objects are with multiple bounding box coordinates.<br>

More specifically, it outputs 3 values,including : <br>
1. A list of bounding boxes, or the (x, y)-coordinates for each object in an image
2. The class label associated with each of the bounding boxes
3. The probability/confidence score associated with each bounding box and class label



<b> 1.1 Importing necessary modules

In [7]:
from tensorflow.keras.applications import ResNet50
from tensorflow.keras.applications.resnet import preprocess_input
from tensorflow.keras.preprocessing.image import img_to_array
from tensorflow.keras.applications import imagenet_utils
from imutils.object_detection import non_max_suppression
import numpy as np
import argparse
import imutils
import time
import cv2
import os
import imutils

<b> 1.2 How deep learning image classifier can be converted into an object detector? </b>

We utilise the elements of traditional computer vision algorithms to convert our CNN image classifier into an object detector.<br>

<b>1.2.1</b> The first element we use is <b>`Image Pyramids`</b>: <br>

* An “image pyramid” is a multi-scale representation of an image:


<img src="Image_pyramid.png" style ="height: 40%;width: 40%;"/>

At the bottom of the pyramid, we have the original sized image .
At each subsequent layer, the image is resized and optionally smoothed.
The image is progressively subsampled until some stopping criterion is met( when a minimum size has been reached), and no further subsampling is required.


<b>1.2.2</b>The second element we use is <b>`Sliding Windows`</b>: <br>

* A sliding window is a fixed-size rectangle that slides from left-to-right and top-to-bottom within an image:



<img src="sliding_window.gif" style ="height: 40%;width: 40%;">

At each stop of the window we would:
1. Extract the Image within the sliding window
2. Input Image to an Image Classifier
3. obtain predictions(class label and probability scores)

<div style ="color:green">Image pyramids and sliding windows helps us localize objects at different locations and multiple scales of the input image

<b>1.2.3</b> The Third element we use is <b>`Non-Maxima Suppression`</b>: <br>

The object detectors generally outputs multiple, overlapping bounding boxes surrounding an object in an image.<br>
This happens because as the sliding window approaches an image, the classifier outputs larger and larger probabilities of  the object class(i.e higher probability of object being detected) .<br>

Since there’s only one object of a particular class,multiple bounding boxes can create a problem.<br>
The solution is to apply non-maxima suppression (NMS), which removes weak, overlapping bounding boxes by giving us the ones with higher confidence.

<img src="nms.jpg" style ="height: 50%;width: 50%;">

Below are the utility functions for implementing Image Pyramids and Sliding Windows and some other functions

In [9]:
def image_pyramid(image, scale=1.5, minSize=(224, 224)):
    # yield the original image
    yield image
    # keep looping over the image pyramid
    while True:
        w = int(image.shape[1] / scale)
        image = imutils.resize(image, width=w)
        if image.shape[0] < minSize[1] or image.shape[1] < minSize[0]:
              break
        yield image
        
def sliding_window(image, step, ws):
      for y in range(0, image.shape[0] - ws[1], step):
            for x in range(0, image.shape[1] - ws[0], step):
                 yield (x, y, image[y:y + ws[1], x:x + ws[0]])
                    
def load_images(folder):
    images = []
    for filename in os.listdir(folder):
        img = cv2.imread(os.path.join(folder, filename))
        if img is not None:
            images.append(img)
    return images

<b> 1.3 The steps we follow in the object detection algorithm:</b>

1. We input image.
2. We construct image pyramid fot the input image.
3. For each scale of the image pyramid, we run a sliding window:
     3.1. For each stop of the sliding window,we extract the image inside the sliding window(ROI)
     3.2. We take the sampled image and pass it through our CNN originally trained for image classification
     3.3. Examine the probability of the top class label of the CNN, and if meets a minimum confidence,we record 
            3.3.1. class label and
            3.3.2  location of the sliding window
4. We apply non-maxima suppression to the bounding boxes for different classes.
5. We return the results of the detected objects(bounding box, class label and probability)

<b>1.3.1</b> First we define some constants needed by our object detection algorithm.

In [244]:
width = 600
scale = 1.5
step = 16
roi_size = (224,224)
INPUT_SIZE = (224, 224)
threshold_prob = 0.90

<b>1.3.2 </b> We load our ResNet classification CNN and input images from local directory

In [10]:
model = ResNet50(weights="imagenet", include_top=True)

input_folder = './raccoon_dataset-master/images/'
image_list = load_images(input_folder)

(H, W) = image_list[2].shape[:2] #We are taking the second image from out dataset of images

<b>1.3.3 </b> We initialize our image pyramid generator object and then create the scaled images from the pyramid and then preprocess the scaled images.<br> We also put the regions of interest (ROIs) generated from pyramid + sliding window output into roi_window list and store the (x, y)-coordinates of where the ROI was in the original image in the loc_in_image list

In [246]:
pyramid = image_pyramid(image_list[2], scale=scale, minSize=roi_size)
# initialize two lists, one to hold the ROIs generated from the image
# pyramid and sliding window, and another list used to store the
# (x, y)-coordinates of where the ROI was in the original image
roi_window = []
loc_in_image = []

for image in pyramid:
    scale = W / float(image.shape[1])
    for (x, y, roi) in sliding_window(image, step, roi_size):
        x = int(x * scale)
        y = int(y * scale)
        w = int(roi_size[0] * scale)
        h = int(roi_size[1] * scale)
        
        roi = cv2.resize(roi, INPUT_SIZE)
        roi = img_to_array(roi)
        roi = preprocess_input(roi)
        
        roi_window.append(roi)
        loc_in_image.append((x, y, x + w, y + h))
        
roi_window = np.array(roi_window, dtype="float32")

<b>1.3.4 </b>  Inputting the ROIs into our pre-trained ResNet image classifier.<br>

After feeding the ROIs in the model , we decodes the predictions, grabbing only the top prediction for each ROI.

In [None]:
start = time.time()
predictions = model.predict(roi_window)
end = time.time()
print("Time taken :", end - start)

predictions = imagenet_utils.decode_predictions(predictions, top=1)

<b>1.3.5 </b> We loop over the predictions , and take ImageNet ID, class label, and probability and then check to see if the minimum confidence has been met <br>(i.e if the probability of the predicted ROIs is above the threshold probabilty defined in the constants earlier).<br>

Then we update the labels dictionary(labels) with the bounding box and prob score tuple (box, probs) associated with each class label.

In [None]:
labels = {}

for (i, p) in enumerate(predictions):
    (imagenetID, label, probs) = p[0]
    # filtering out weaker detections by checking the predicted probability
    # is greater than the threshols probability
    if probs >= threshold_prob:
        box = loc_in_image[i]
        labels_list = labels.get(label, [])
        labels_list.append((box, probs))
        labels[label] = labels_list

<b> 1.3.6 Non-Maxima Supression </b> <br>

Since there are multiple overlapping bounding boxes detected for each class label,we use non-maxima suppression (NMS) which gives us the bounding boxes with higher confidence9among the overlapping ones).

Before that we visualise the bounding box predictions in the original image.

In [None]:
for label in labels.keys():
    clone = image_list[2].copy()
    for (box, prob) in labels[label]:
        (startX, startY, endX, endY) = box
        cv2.rectangle(clone, (startX, startY), (endX, endY),(0, 255, 0), 2)
    cv2.imshow("Before", clone)
    cv2.waitKey(0)
    cv2.destroyAllWindows()
    cv2.waitKey(1)
    clone = image_list[2].copy()

We apply NMS and then visualise the final bounding box predictions in the original image.

In [None]:
for label in labels.keys():
    boxes = np.array([p[0] for p in labels[label]])
    # print(boxes)
    prob = np.array([p[1] for p in labels[label]])
    boxes = non_max_suppression(boxes, prob)
    print(boxes)
    for (startX, startY, endX, endY) in boxes:
            cv2.rectangle(clone, (startX, startY), (endX, endY),(0, 255, 0), 2)
            y = startY - 10 if startY - 10 > 10 else startY + 10
            cv2.putText(clone, label, (startX, y),cv2.FONT_HERSHEY_SIMPLEX, 0.45, (0, 255, 0), 2)
    cv2.imshow("After", clone)
    cv2.waitKey(0)
    cv2.destroyAllWindows()
    cv2.waitKey(1)
    clone = image_list[2].copy()