#### Implementation of R-CNN architecture using PyTorch

R-CNN (Region-based Convolutional Neural Network) is a pioneering architecture for object detection that combines region proposal networks with CNNs for classification and bounding box regression. It was one of the first methods to achieve high accuracy in object detection tasks.

The R-CNN architecture consists of several key components:
1. Pass the image through selective search and generate region proposals.
2. Calculate IOUs (Intersection over Union) on proposed region with ground truth data and add label to the proposed regions.
3. Do Transfer Learning using the proposed regions and the labels.
4. Pass the image through a classifier to get the final predictions.
5. Apply Non-Maximum Suppression (NMS) to remove overlapping bounding boxes.

##### Selective Search
Selective Search is an algorithm used to generate region proposals in R-CNN. It combines the strengths of both exhaustive search and segmentation-based methods to propose regions that are likely to contain objects. The algorithm works by:
1. **Over-segmentation**: The image is over-segmented into small regions using a graph-based segmentation algorithm.
2. **Merging Regions**: The algorithm merges regions based on color, texture, size, and shape compatibility to form larger regions.
3. **Hierarchical Merging**: The merging process is hierarchical, allowing the algorithm to generate regions at different scales.
4. **Region Proposals**: The final output is a set of region proposals that are likely to contain objects, which are then fed into the CNN for further processing.

In [1]:
import cv2
import sys

path = "/Users/hinsun/Workspace/ComputerScience/DeepLearning/assets/running.jpg"

# speed-up using multithreads
cv2.setUseOptimized(True)
cv2.setNumThreads(4)

# read image
im = cv2.imread(path)

# resize image
newHeight = 600
newWidth = int(im.shape[1] * newHeight / im.shape[0])
im = cv2.resize(im, (newWidth, newHeight))

# create Selective Search Segmentation Object using default parameters
ss = cv2.ximgproc.segmentation.createSelectiveSearchSegmentation()

# set input image on which we will run segmentation
ss.setBaseImage(im)
ss.switchToSelectiveSearchFast()
ss.switchToSelectiveSearchQuality()

# run selective search segmentation on input image
rects = ss.process()
print(f"Total Number of Region Proposals: {len(rects)}")

# number of region proposals to show
numShowRects = 100
# increment to increase/decrease total number
# of reason proposals to be shown
increment = 50

while True:
    # create a copy of original image
    imOut = im.copy()
    for i, rect in enumerate(rects):
        # draw rectangle for region proposal till numShowRects
        if i < numShowRects:
            x, y, w, h = rect
            cv2.rectangle(imOut, (x, y), (x + w, y + h), (0, 255, 0), 1, cv2.LINE_AA)
        else:
            break

    # show output
    cv2.imshow("Output", imOut)

    # record key press
    k = cv2.waitKey(0) & 0xFF

    # m is pressed
    if k == 109:
        # increase total number of rectangles to show by increment
        numShowRects += increment
    # l is pressed
    elif k == 108 and numShowRects > increment:
        # decrease total number of rectangles to show by increment
        numShowRects -= increment
    # q is pressed
    elif k == 113:
        break
    # close image show window

cv2.destroyAllWindows()

Total Number of Region Proposals: 8830


##### IOU (Intersection over Union)
Intersection over Union (IoU) is a metric used to evaluate the accuracy of an object detection model. It measures the overlap between the predicted bounding box and the ground truth bounding box. The IoU is calculated as follows:

IoU = Intersection / Union

Where:
- Intersection: The area of the intersection between the predicted bounding box and the ground truth bounding box.
- Union: The area covered by both the predicted bounding box and the ground truth bounding box.

IoU ranges from 0 to 1, where 0 indicates no overlap and 1 indicates perfect overlap.
IoU ≥ 0.5 -> assign label positive
IoU < 0.3 -> assign label negative (background)
0.3 ≤ IoU < 0.5 -> ignore (do not assign label)

Notice: Ground truth box -> will be provided by the dataset