# Object Detection Model Comparison

In this notebook, we will test and compare the performance of three different object detection models on a video. The models we will use are:

- **YOLOv7**: A state-of-the-art model known for its accuracy and speed.
- **Faster R-CNN**: A popular model for its robustness in object detection tasks.
- **SSD (Single Shot MultiBox Detector)**: A model that provides a good balance between speed and accuracy.

We will process specific frames from the video with each model, visualize the results, and compare their performance. Additionally, we will provide an option to process the entire video if desired and compress the final output for easy download.

Let's begin with loading and setting up the necessary tools.


**Mounting Google Drive**

In [1]:
from google.colab import drive
drive.mount('/content/gdrive')

Mounted at /content/gdrive


**Setting Up YOLOv7 Environment**

To set up the YOLOv7 environment, we need to:
1. Navigate to our Google Drive directory.
2. Clone the YOLOv7 repository from GitHub.
3. Download the required dependencies.
4. Install the necessary packages.

In [2]:
%%bash
cd /content/gdrive/MyDrive
git clone https://github.com/WongKinYiu/yolov7.git
cd yolov7
wget https://raw.githubusercontent.com/WongKinYiu/yolov7/u5/requirements.txt
pip install -r requirements.txt

Collecting numpy<1.24.0,>=1.18.5 (from -r requirements.txt (line 5))
  Downloading numpy-1.23.5-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (2.3 kB)
Collecting thop (from -r requirements.txt (line 36))
  Downloading thop-0.1.1.post2209072238-py3-none-any.whl.metadata (2.7 kB)
Collecting jedi>=0.16 (from ipython->-r requirements.txt (line 34))
  Using cached jedi-0.19.1-py2.py3-none-any.whl.metadata (22 kB)
Downloading numpy-1.23.5-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (17.1 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 17.1/17.1 MB 74.3 MB/s eta 0:00:00
Downloading thop-0.1.1.post2209072238-py3-none-any.whl (15 kB)
Using cached jedi-0.19.1-py2.py3-none-any.whl (1.6 MB)
Installing collected packages: numpy, jedi, thop
  Attempting uninstall: numpy
    Found existing installation: numpy 1.26.4
    Uninstalling numpy-1.26.4:
      Successfully uninstalled numpy-1.26.4
Successfully installed jedi-0.19.1 numpy-1.23.5 thop-0.1.1.post2209072238


Cloning into 'yolov7'...
Updating files:  67% (73/108)Updating files:  68% (74/108)Updating files:  69% (75/108)Updating files:  70% (76/108)Updating files:  71% (77/108)Updating files:  72% (78/108)Updating files:  73% (79/108)Updating files:  74% (80/108)Updating files:  75% (81/108)Updating files:  76% (83/108)Updating files:  77% (84/108)Updating files:  78% (85/108)Updating files:  79% (86/108)Updating files:  80% (87/108)Updating files:  81% (88/108)Updating files:  82% (89/108)Updating files:  83% (90/108)Updating files:  84% (91/108)Updating files:  85% (92/108)Updating files:  86% (93/108)Updating files:  87% (94/108)Updating files:  88% (96/108)Updating files:  89% (97/108)Updating files:  90% (98/108)Updating files:  91% (99/108)Updating files:  92% (100/108)Updating files:  93% (101/108)Updating files:  94% (102/108)Updating files:  95% (103/108)Updating files:  96% (104/108)Updating files:  97% (105/108)Updating files:  98% (106/108)Updating

**Setting Up YOLOv7 Path**

To ensure that we can import YOLOv7 modules, we need to add the YOLOv7 directory to the Python path. This allows us to use functions and classes from YOLOv7 directly in our notebook.


In [3]:
import os
import sys
sys.path.append('/content/gdrive/MyDrive/yolov7')

**Navigating to YOLOv7 Directory**

Before running YOLOv7 commands or scripts, we need to change the current working directory to the YOLOv7 directory in Google Drive.

In [4]:
cd /content/gdrive/MyDrive/yolov7

/content/gdrive/MyDrive/yolov7


**Downloading File from Google Drive**

To obtain a required file, we use `gdown` to download it directly from Google Drive. The `--fuzzy` option allows us to handle Google Drive file URLs and download the file efficiently.



In [5]:
#change URL
!gdown --fuzzy https://drive.google.com/file/d/18ygYV_jkP3nurU2vDBhErwW9wExSd0vD/view?usp=sharing

Downloading...
From (original): https://drive.google.com/uc?id=18ygYV_jkP3nurU2vDBhErwW9wExSd0vD
From (redirected): https://drive.google.com/uc?id=18ygYV_jkP3nurU2vDBhErwW9wExSd0vD&confirm=t&uuid=3a0481eb-a265-4ae3-af86-609c3976ac46
To: /content/gdrive/MyDrive/yolov7/4K Road traffic video for object detection and tracking - free download now!.mp4
100% 144M/144M [00:05<00:00, 25.2MB/s]


**Specify Video File Path**

In [6]:
#give the full path to video, your video will be in the Yolov7 folder
video_path = '/content/gdrive/MyDrive/yolov7/4K Road traffic video for object detection and tracking - free download now!.mp4'

**Object Detection on Specific Frames**

In this section, we will:
1. **Import Required Libraries**: We import libraries for object detection, image processing, and handling video frames.
2. **Define Helper Functions**:
   - `process_frame()`: Converts video frames to tensors and prepares them for model inference.
   - `draw_boxes()`: Draws bounding boxes and labels on the frames based on model predictions.
3. **Compare Object Detection Models**:
   - We load the SSD and Faster R-CNN models.
   - Process specified frames from the video using both models.
   - Draw bounding boxes on the frames and store them for further analysis.

In [7]:
import torch
import torchvision.transforms as T
import cv2
import numpy as np
from torchvision.models.detection import ssdlite320_mobilenet_v3_large, fasterrcnn_resnet50_fpn
from google.colab.patches import cv2_imshow

frames_yolov7 = []
frames_fasterrcnn = []
frames_mobilenet = []

# Load and prepare frame from video
def process_frame(frame, device):
    image = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
    tensor = T.ToTensor()(image).unsqueeze(0).to(device)
    return tensor

# Draw bounding boxes on frame and label objects
def draw_boxes(frame, predictions, model_name):
    for box, label, score in zip(predictions[0]['boxes'], predictions[0]['labels'], predictions[0]['scores']):
        if score > 0.5:  # Confidence threshold
            box = box.cpu().numpy().astype(int)  # Move box coordinates to CPU
            label_text = "car" if label.item() == 3 else "truck" if label.item() == 8 else None
            if label_text:
                cv2.rectangle(frame, (box[0], box[1]), (box[2], box[3]), (0, 255, 0), 2)
                cv2.putText(frame, f'{label_text} {score.item():.2f} ({model_name})', (box[0], box[1]-10), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 255, 0), 2)
    return frame

# Function to process specific frames using both SSD and Faster R-CNN
def compare_models_on_specific_frames(video_path, frame_numbers):
    # Select device
    device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

    # Load models
    ssd_model = ssdlite320_mobilenet_v3_large(pretrained=True).to(device)
    ssd_model.eval()
    faster_rcnn_model = fasterrcnn_resnet50_fpn(pretrained=True).to(device)
    faster_rcnn_model.eval()

    # Open video file
    cap = cv2.VideoCapture(video_path)
    total_frames = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))

    for frame_number in frame_numbers:
        if frame_number >= total_frames:
            print(f"Frame number {frame_number} is out of range. Total frames: {total_frames}")
            continue

        # Set the video frame position to the desired frame
        cap.set(cv2.CAP_PROP_POS_FRAMES, frame_number)
        ret, frame = cap.read()
        if not ret:
            print(f"Failed to read frame number {frame_number}.")
            continue

        # Prepare frame for inference
        tensor = process_frame(frame, device)

        # Test with SSD
        with torch.no_grad():
            ssd_predictions = ssd_model(tensor)
        frame_with_ssd_boxes = frame.copy()
        frame_with_ssd_boxes = draw_boxes(frame_with_ssd_boxes, ssd_predictions, "SSD")

        # Test with Faster R-CNN
        with torch.no_grad():
            faster_rcnn_predictions = faster_rcnn_model(tensor)
        frame_with_rcnn_boxes = frame.copy()
        frame_with_rcnn_boxes = draw_boxes(frame_with_rcnn_boxes, faster_rcnn_predictions, "FasterRCNN")


        if frame_with_ssd_boxes is not None:
            frames_mobilenet.append(frame_with_ssd_boxes)
        if frame_with_rcnn_boxes is not None:
            frames_fasterrcnn.append(frame_with_rcnn_boxes)


    cap.release()
    cv2.destroyAllWindows()

# Example usage
frame_numbers = [16, 159, 261, 282, 349]  # Replace with your chosen frame numbers

compare_models_on_specific_frames(video_path, frame_numbers)


Downloading: "https://download.pytorch.org/models/ssdlite320_mobilenet_v3_large_coco-a79551df.pth" to /root/.cache/torch/hub/checkpoints/ssdlite320_mobilenet_v3_large_coco-a79551df.pth
100%|██████████| 13.4M/13.4M [00:00<00:00, 44.9MB/s]
Downloading: "https://download.pytorch.org/models/fasterrcnn_resnet50_fpn_coco-258fb6c6.pth" to /root/.cache/torch/hub/checkpoints/fasterrcnn_resnet50_fpn_coco-258fb6c6.pth
100%|██████████| 160M/160M [00:01<00:00, 124MB/s]


**Object Detection with YOLOv7**

In this section, we will:

1. **Set Up YOLOv7 Parameters**:
   - Define the configuration options, including weights, image size, confidence threshold, and IoU threshold.

2. **Initialize Video Object**:
   - Open the video file and retrieve video properties like FPS, width, height, and total frame count.
   - Prepare an output video file for saving results.

3. **Define Processing Functions**:
   - `process_frame()`: Handles a single frame, performs object detection using YOLOv7, and draws bounding boxes.

4. **Load and Prepare YOLOv7 Model**:
   - Load the YOLOv7 model, set it to inference mode, and configure it for the specified device.

5. **Process Selected Frames**:
   - For each selected frame, process the frame using the YOLOv7 model and store the results for further analysis.



**Special Note :** Here frames are selected to show actual comparision in three models output.

In [10]:
import random
from pathlib import Path
from utils.general import check_img_size, non_max_suppression, scale_coords, set_logging
from utils.torch_utils import select_device, time_synchronized
from utils.datasets import letterbox
from models.experimental import attempt_load
from utils.plots import plot_one_box


opt  = {
    "weights": "yolov7.pt",
    "img-size": 640, # default image size
    "conf-thres": 0.25, # confidence threshold for inference.
    "iou-thres" : 0.45, # NMS IoU threshold for inference.
    "device" : '0',
}



# Initializing video object
video = cv2.VideoCapture(video_path)

# Video information
fps = video.get(cv2.CAP_PROP_FPS)
w = int(video.get(cv2.CAP_PROP_FRAME_WIDTH))
h = int(video.get(cv2.CAP_PROP_FRAME_HEIGHT))
nframes = int(video.get(cv2.CAP_PROP_FRAME_COUNT))

# Initializing object for writing video output
output = cv2.VideoWriter('output.mp4', cv2.VideoWriter_fourcc(*'DIVX'), fps, (w, h))
torch.cuda.empty_cache()

# User-defined frame numbers for initial evaluation
selected_frames = [16, 159, 261, 282, 349]   # Modify this list to choose specific frames

# Function to process a single frame
def process_frame(frame_number, model, device, half, imgsz, stride, colors, names):
    video.set(cv2.CAP_PROP_POS_FRAMES, frame_number)
    ret, img0 = video.read()
    if ret:
        img = letterbox(img0, imgsz, stride=stride)[0]
        img = img[:, :, ::-1].transpose(2, 0, 1)  # BGR to RGB, to 3x416x416
        img = np.ascontiguousarray(img)
        img = torch.from_numpy(img).to(device)
        img = img.half() if half else img.float()  # uint8 to fp16/32
        img /= 255.0  # 0 - 255 to 0.0 - 1.0
        if img.ndimension() == 3:
            img = img.unsqueeze(0)

        # Inference
        with torch.no_grad():
            pred = model(img, augment=False)[0]
            pred = non_max_suppression(pred, opt['conf-thres'], opt['iou-thres'], agnostic=False)

        for det in pred:  # detections per image
            if len(det):
                det[:, :4] = scale_coords(img.shape[2:], det[:, :4], img0.shape).round()
                for *xyxy, conf, cls in reversed(det):
                    label = f'{names[int(cls)]} {conf:.2f}'
                    plot_one_box(xyxy, img0, label=label, color=colors[int(cls)], line_thickness=3)
        return img0
    else:
        print(f"Failed to read frame number {frame_number}.")
        return None

# Model initialization and setting it for inference
with torch.no_grad():
    weights, imgsz = opt['weights'], opt['img-size']
    set_logging()
    device = select_device(opt['device'])
    half = device.type != 'cpu'
    model = attempt_load(weights, map_location=device)  # Load FP32 model
    stride = int(model.stride.max())  # Model stride
    imgsz = check_img_size(imgsz, s=stride)  # Check img_size
    if half:
        model.half()

    names = model.module.names if hasattr(model, 'module') else model.names
    colors = [[random.randint(0, 255) for _ in range(3)] for _ in names]
    if device.type != 'cpu':
        model(torch.zeros(1, 3, imgsz, imgsz).to(device).type_as(next(model.parameters())))

    # Display selected frames
    for frame_num in selected_frames:
        frame = process_frame(frame_num, model, device, half, imgsz, stride, colors, names)
        if frame is not None:
          frames_yolov7.append(frame)

cv2.destroyAllWindows()

Downloading https://github.com/WongKinYiu/yolov7/releases/download/v0.1/yolov7.pt to yolov7.pt...


100%|██████████| 72.1M/72.1M [00:00<00:00, 177MB/s]



Fusing layers... 
RepConv.fuse_repvgg_block
RepConv.fuse_repvgg_block
RepConv.fuse_repvgg_block


  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]


**Combining and Displaying Object Detection Results**

In this section, we will:

1. **Resize Frames**:
   - Ensure that the frames processed by the different object detection models (YOLOv7, Faster R-CNN, and MobileNet) have the same dimensions. This is essential for combining them side by side.

2. **Combine Frames**:
   - Horizontally concatenate the resized frames from the different models to compare their detection results in a single view.

3. **Display Combined Frames**:
   - Show the combined frames sequentially. The display will wait for a key press before showing the next combined frame.



In [11]:
for fasterrcnn_frame, mobilenet_frame, yolov7_frame in zip(frames_fasterrcnn, frames_mobilenet, frames_yolov7):
    # Resize frames to ensure they have the same dimensions
    yolov7_frame_resized = cv2.resize(yolov7_frame, (640, 640))  # Adjust size as needed
    fasterrcnn_frame_resized = cv2.resize(fasterrcnn_frame, (640, 640))  # Adjust size as needed
    mobilenet_frame_resized = cv2.resize(mobilenet_frame, (640, 640))  # Adjust size as needed

    # Combine frames horizontally
    combined_frame = np.hstack((mobilenet_frame_resized, fasterrcnn_frame_resized, yolov7_frame_resized))

    # Display the combined frame
    cv2_imshow(combined_frame)
    cv2.waitKey(0)  # Wait for a key press before displaying the next combined frame

Output hidden; open in https://colab.research.google.com to view.

**Conclusion :** After visualising the frames it is clearly visible that Yolov7 performs better than other models.So we can process the entire video by using Yolov7.

**Processing the Entire Video**

In this section, we will:

1. **Prompt the User**:
   - Ask the user if they want to process the entire video after displaying selected frames. This allows for flexibility in choosing whether to process all frames or just the selected ones.

2. **Process the Video**:
   - If the user chooses to process the entire video, iterate through all the frames:
     - Extract and process each frame using the specified model.
     - Write the processed frames to an output video file.

3. **Release Resources**:
   - Release the video writer and video capture objects to free up system resources.
4. **Note**:
   - This will take huge time on normal GPUs so one can skip this part and directly download the processed file from later code.



In [12]:
    # Ask user if they want to process the entire video after displaying selected frames
    process_full_video = input("Do you want to process the entire video? (y/n): ").strip().lower()
    if process_full_video == 'y':
        for j in range(nframes):
            frame = process_frame(j, model, device, half, imgsz, stride, colors, names)
            if frame is not None:
                print(f"{j + 1}/{nframes} frames processed")
                output.write(frame)
            else:
                break

output.release()
video.release()

[1;30;43mStreaming output truncated to the last 5000 lines.[0m
4185/9184 frames processed
4186/9184 frames processed
4187/9184 frames processed
4188/9184 frames processed
4189/9184 frames processed
4190/9184 frames processed
4191/9184 frames processed
4192/9184 frames processed
4193/9184 frames processed
4194/9184 frames processed
4195/9184 frames processed
4196/9184 frames processed
4197/9184 frames processed
4198/9184 frames processed
4199/9184 frames processed
4200/9184 frames processed
4201/9184 frames processed
4202/9184 frames processed
4203/9184 frames processed
4204/9184 frames processed
4205/9184 frames processed
4206/9184 frames processed
4207/9184 frames processed
4208/9184 frames processed
4209/9184 frames processed
4210/9184 frames processed
4211/9184 frames processed
4212/9184 frames processed
4213/9184 frames processed
4214/9184 frames processed
4215/9184 frames processed
4216/9184 frames processed
4217/9184 frames processed
4218/9184 frames processed
4219/9184 frames 

# Video Compression

To manage the large video file size, we'll compress the video using `FFmpeg`. This will reduce the file size by lowering the bitrate and adjusting the resolution.

## Steps:

1. **Define Input and Output Paths**: Specify where the original and compressed video files are located.
2. **Set Compression Parameters**: Use `FFmpeg` to compress the video by adjusting the bitrate and resolution.
3. **Execute Compression**: Run the compression command to create a smaller, more manageable video file.
4. **Note**: This will also take some time so one can also skip this part and can directly download compressed file from later code.




In [13]:
import os

# Define input and output file paths
input_video_path = '/content/gdrive/MyDrive/yolov7/output.mp4'
output_video_path = '/content/gdrive/MyDrive/yolov7/compressed_video.mp4'

# Command to compress the video
# Adjust the bitrate as needed (e.g., 1000k for 1 Mbps)
compression_command = f"ffmpeg -i {input_video_path} -b:v 1000k -vf scale=640:-1 {output_video_path}"

# Run the compression command
os.system(compression_command)


0

**Directly download output video**

In [None]:
from google.colab import files
output_video_path = '/content/gdrive/MyDrive/Videos/output.mp4'
# Ensure the compressed video file exists
if os.path.exists(output_video_path):
    # Download the video file
    files.download(output_video_path)
else:
    print("Compressed video file not found.")

**Directly download compressed video**

In [None]:
from google.colab import files
output_video_path = '/content/gdrive/MyDrive/Videos/compressed_video.mp4'
# Ensure the compressed video file exists
if os.path.exists(output_video_path):
    # Download the video file
    files.download(output_video_path)
else:
    print("Compressed video file not found.")