What is Object Detection?

Unlike image classification (which labels an image as a whole), object detection identifies what objects are in an image and where they are.

Goal: Detect and classify multiple objects in an image.

Common Use Cases:

Self-driving cars (pedestrian detection)

Face recognition

Medical imaging (tumor detection)

Selective Search (Region Proposal Method)

Selective Search is an algorithm that generates region proposals where objects might exist.

It groups similar pixels to form region proposals.

Used in RCNN models for object detection.

How Selective Search Works

Start with small regions (superpixels).

Merge similar regions based on color, texture, size, and shape.

Generate object proposals (bounding boxes around possible objects).

🔹 Pros: Works well for small datasets.

🔹 Cons: Slow; generates many regions (~2000 per image).

Implementing Selective Search (Python)

In [None]:
import cv2
import matplotlib.pyplot as plt

# Load Image
image = cv2.imread('example.jpg')
ss = cv2.ximgproc.segmentation.createSelectiveSearchSegmentation()

# Set the image for Selective Search
ss.setBaseImage(image)
ss.switchToSelectiveSearchFast()  # Fast mode

# Get region proposals
rects = ss.process()
print(f"Total region proposals: {len(rects)}")

# Draw first 50 proposals
for i, (x, y, w, h) in enumerate(rects[:50]):
    cv2.rectangle(image, (x, y), (x + w, y + h), (0, 255, 0), 2)

plt.imshow(cv2.cvtColor(image, cv2.COLOR_BGR2RGB))
plt.show()


RCNN (Region-based CNN)

RCNN applies a CNN to each region proposal from Selective Search.

 Steps in RCNN

Generate Region Proposals (Selective Search).

Extract Features from each region using a CNN (like AlexNet).

Classify Objects using an SVM (Support Vector Machine).

 Pros: Works well for small datasets.

 Cons: Slow, since it runs CNN on each region (~2000 times per image).

Fast RCNN (Optimized RCNN)

To speed up RCNN:

Run CNN once per image (not for each region).

Extract region features from CNN feature map.

In [None]:
from tensorflow.keras.applications import ResNet50

# Load Pre-trained Model
base_model = ResNet50(weights="imagenet", include_top=False)

# Extract Features for an image
features = base_model.predict(image)  
print("Feature Map Shape:", features.shape)


YOLO (You Only Look Once) - Real-Time Object Detection

Unlike RCNN, YOLO does not use region proposals.

Instead, YOLO splits an image into a grid and predicts bounding boxes + class probabilities in one pass.

Implementing YOLO (Using OpenCV)

In [None]:
import cv2
import numpy as np

# Load YOLO model and COCO classes
net = cv2.dnn.readNet("yolov3.weights", "yolov3.cfg")
classes = open("coco.names").read().strip().split("\n")

# Load image
image = cv2.imread("example.jpg")
height, width = image.shape[:2]

# Prepare image for YOLO
blob = cv2.dnn.blobFromImage(image, 1/255.0, (416, 416), swapRB=True, crop=False)
net.setInput(blob)

# Get output layers
layer_names = net.getLayerNames()
output_layers = [layer_names[i[0] - 1] for i in net.getUnconnectedOutLayers()]

# Forward pass (get detections)
outputs = net.forward(output_layers)

# Draw detections
for output in outputs:
    for detection in output:
        scores = detection[5:]
        class_id = np.argmax(scores)
        confidence = scores[class_id]
        
        if confidence > 0.5:
            # Get bounding box
            center_x, center_y, w, h = (detection[:4] * np.array([width, height, width, height])).astype("int")
            x, y = int(center_x - w/2), int(center_y - h/2)
            
            # Draw rectangle
            cv2.rectangle(image, (x, y), (x + w, y + h), (0, 255, 0), 2)
            cv2.putText(image, f"{classes[class_id]}: {confidence:.2f}", (x, y - 10), 
                        cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 255, 0), 2)

# Show Image
cv2.imshow("YOLO Detection", image)
cv2.waitKey(0)
cv2.destroyAllWindows()


Summary of Object Detection Models

**Model**	**Method**	     **Speed**	 **Accuracy**	**Best Use Case**

**RCNN**	Selective Search + CNN	Slow	High	Medical Imaging

**Fast RCNN**	CNN feature extraction	Medium	High	General Object Detection

**YOLO**	Single-pass detection	Fast	High	Real-time applications