# Part-1

## Step 1 - Source video collection and image frames division

In [15]:
import cv2
import os

video_path = 'whatsapp_video_converted.mp4'
vidcap = cv2.VideoCapture(video_path)

frames_dir = '/Users/shabarivignesh/Data 255 - Deep Learning/frames/'

if not os.path.exists(frames_dir):
    os.makedirs(frames_dir)

success, image = vidcap.read()
count = 0
while success:
    frame_filename = os.path.join(frames_dir, f"frame{count}.jpg")
    cv2.imwrite(frame_filename, image)  
    
    print(f'Saved {frame_filename}')
    
    success, image = vidcap.read()
    print('Read a new frame:', success)  
    
    count += 1

vidcap.release()


Saved /Users/shabarivignesh/Data 255 - Deep Learning/frames/frame0.jpg
Read a new frame: True
Saved /Users/shabarivignesh/Data 255 - Deep Learning/frames/frame1.jpg
Read a new frame: True
Saved /Users/shabarivignesh/Data 255 - Deep Learning/frames/frame2.jpg
Read a new frame: True
Saved /Users/shabarivignesh/Data 255 - Deep Learning/frames/frame3.jpg
Read a new frame: True
Saved /Users/shabarivignesh/Data 255 - Deep Learning/frames/frame4.jpg
Read a new frame: True
Saved /Users/shabarivignesh/Data 255 - Deep Learning/frames/frame5.jpg
Read a new frame: True
Saved /Users/shabarivignesh/Data 255 - Deep Learning/frames/frame6.jpg
Read a new frame: True
Saved /Users/shabarivignesh/Data 255 - Deep Learning/frames/frame7.jpg
Read a new frame: True
Saved /Users/shabarivignesh/Data 255 - Deep Learning/frames/frame8.jpg
Read a new frame: True
Saved /Users/shabarivignesh/Data 255 - Deep Learning/frames/frame9.jpg
Read a new frame: True
Saved /Users/shabarivignesh/Data 255 - Deep Learning/frames/

This code captures frames from a video file and saves each frame as a `.jpg` image in a specified directory. It uses OpenCV (`cv2`) to read the video and extract frames. If the output directory doesn't exist, it creates one. The loop iterates through the video, saving frames sequentially with filenames like `frame0.jpg`, `frame1.jpg`, etc., until all frames are processed. The video capture object is then released to free up resources.

## Step 2 - Vehicle Detection and Bounding Box Inference on Video Frames

In [19]:
import torch
import torchvision
import cv2
from torchvision.transforms import functional as F
import os

COCO_LABELS = [
    '__background__', 'person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus', 'train', 'truck', 'boat',
    'traffic light', 'fire hydrant', 'N/A', 'stop sign', 'parking meter', 'bench', 'bird', 'cat', 'dog', 'horse', 'sheep', 'cow',
    'elephant', 'bear', 'zebra', 'giraffe', 'N/A', 'backpack', 'umbrella', 'N/A', 'N/A', 'handbag', 'tie', 'suitcase', 'frisbee',
    'skis', 'snowboard', 'sports ball', 'kite', 'baseball bat', 'baseball glove', 'skateboard', 'surfboard', 'tennis racket',
    'bottle', 'N/A', 'wine glass', 'cup', 'fork', 'knife', 'spoon', 'bowl', 'banana', 'apple', 'sandwich', 'orange', 'broccoli',
    'carrot', 'hot dog', 'pizza', 'donut', 'cake', 'chair', 'couch', 'potted plant', 'bed', 'N/A', 'dining table', 'N/A', 'toilet',
    'N/A', 'tv', 'laptop', 'mouse', 'remote', 'keyboard', 'cell phone', 'microwave', 'oven', 'toaster', 'sink', 'refrigerator', 'N/A',
    'book', 'clock', 'vase', 'scissors', 'teddy bear', 'hair drier', 'toothbrush'
]

model = torchvision.models.detection.fasterrcnn_resnet50_fpn(pretrained=True)
model.eval()

input_frames_dir = '/Users/shabarivignesh/Data 255 - Deep Learning/frames/'
output_frames_dir = '/Users/shabarivignesh/Data 255 - Deep Learning/frames_with_boxes/'

if not os.path.exists(output_frames_dir):
    os.makedirs(output_frames_dir)

def draw_bounding_boxes(image, boxes, labels):
    for box, label in zip(boxes, labels):
        x1, y1, x2, y2 = map(int, box)
        cv2.rectangle(image, (x1, y1), (x2, y2), color=(255, 0, 0), thickness=2)
        label_text = COCO_LABELS[label]
        cv2.putText(image, label_text, (x1, y1 - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.9, (0, 255, 0), 2)
    return image

frame_filenames = sorted([f for f in os.listdir(input_frames_dir) if f.endswith('.jpg')])

for frame_filename in frame_filenames:
    frame_path = os.path.join(input_frames_dir, frame_filename)
    image = cv2.imread(frame_path)
    img_tensor = F.to_tensor(image).unsqueeze(0)

    with torch.no_grad():
        predictions = model(img_tensor)

    boxes = predictions[0]['boxes'].cpu().numpy()
    labels = predictions[0]['labels'].cpu().numpy()

    image_with_boxes = draw_bounding_boxes(image, boxes, labels)

    output_frame_path = os.path.join(output_frames_dir, frame_filename)
    cv2.imwrite(output_frame_path, image_with_boxes)
    print(f'Saved frame with bounding boxes and labels: {output_frame_path}')

Saved frame with bounding boxes and labels: /Users/shabarivignesh/Data 255 - Deep Learning/frames_with_boxes/frame0.jpg
Saved frame with bounding boxes and labels: /Users/shabarivignesh/Data 255 - Deep Learning/frames_with_boxes/frame1.jpg
Saved frame with bounding boxes and labels: /Users/shabarivignesh/Data 255 - Deep Learning/frames_with_boxes/frame10.jpg
Saved frame with bounding boxes and labels: /Users/shabarivignesh/Data 255 - Deep Learning/frames_with_boxes/frame100.jpg
Saved frame with bounding boxes and labels: /Users/shabarivignesh/Data 255 - Deep Learning/frames_with_boxes/frame101.jpg
Saved frame with bounding boxes and labels: /Users/shabarivignesh/Data 255 - Deep Learning/frames_with_boxes/frame102.jpg
Saved frame with bounding boxes and labels: /Users/shabarivignesh/Data 255 - Deep Learning/frames_with_boxes/frame103.jpg
Saved frame with bounding boxes and labels: /Users/shabarivignesh/Data 255 - Deep Learning/frames_with_boxes/frame104.jpg
Saved frame with bounding box

This script utilizes a pre-trained Faster R-CNN model to detect objects in a series of image frames. It begins by loading the Faster R-CNN model with a ResNet-50 backbone, specifically designed for object detection tasks. Images are read from an input directory, and for each image, the model identifies objects, drawing bounding boxes around them and labeling them with the corresponding COCO dataset labels. The processed images, now containing bounding boxes and labels, are saved to an output directory. The script also ensures that the output directory exists, creating it if necessary, and handles each image in sequence by first converting it to a tensor before passing it to the model for inference.

## Step 3 - Compile image frames into a video

In [25]:
import re

input_frames_dir = '/Users/shabarivignesh/Data 255 - Deep Learning/frames_with_boxes/'
output_video_path = '/Users/shabarivignesh/Data 255 - Deep Learning/output_video.mp4'

def numerical_sort(value):
    numbers = re.findall(r'\d+', value)
    return int(numbers[0]) if numbers else -1

frame_filenames = sorted([f for f in os.listdir(input_frames_dir) if f.endswith('.jpg')], key=numerical_sort)

first_frame_path = os.path.join(input_frames_dir, frame_filenames[0])
first_frame = cv2.imread(first_frame_path)
height, width, layers = first_frame.shape

fourcc = cv2.VideoWriter_fourcc(*'mp4v')
fps = 30  
output_video = cv2.VideoWriter(output_video_path, fourcc, fps, (width, height))


for frame_filename in frame_filenames:
    frame_path = os.path.join(input_frames_dir, frame_filename)
    frame = cv2.imread(frame_path)
    output_video.write(frame)


output_video.release()

print(f"Video saved to {output_video_path}")


Video saved to /Users/shabarivignesh/Data 255 - Deep Learning/output_video.mp4


This script combines a series of image frames into a video. It starts by reading frames from a specified directory, sorts them numerically to maintain the correct order, and uses OpenCV's `VideoWriter` to compile them into an `.mp4` video file. The first frame is used to determine the video's dimensions, and each subsequent frame is added to the output video file. After all frames are processed, the video file is saved to a specified output path.