### Step 1: Collect a Source Video and Divide into Frames

In [2]:
pip install opencv-python

Collecting opencv-python
  Downloading opencv_python-4.10.0.84-cp37-abi3-macosx_11_0_arm64.whl.metadata (20 kB)
Downloading opencv_python-4.10.0.84-cp37-abi3-macosx_11_0_arm64.whl (54.8 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m54.8/54.8 MB[0m [31m21.6 MB/s[0m eta [36m0:00:00[0m00:01[0m00:01[0m
[?25hInstalling collected packages: opencv-python
Successfully installed opencv-python-4.10.0.84
Note: you may need to restart the kernel to use updated packages.


In [3]:
pip install opencv-python-headless

Collecting opencv-python-headless
  Downloading opencv_python_headless-4.10.0.84-cp37-abi3-macosx_11_0_arm64.whl.metadata (20 kB)
Downloading opencv_python_headless-4.10.0.84-cp37-abi3-macosx_11_0_arm64.whl (54.8 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m54.8/54.8 MB[0m [31m16.4 MB/s[0m eta [36m0:00:00[0m00:01[0m00:01[0m
[?25hInstalling collected packages: opencv-python-headless
Successfully installed opencv-python-headless-4.10.0.84
Note: you may need to restart the kernel to use updated packages.


In [4]:
import cv2
# This will print the OpenCV version if the installation was successful
print(cv2.__version__)  


4.10.0


In [5]:
import cv2
import os

# Creating a directory to store video frames
if not os.path.exists('video_frames'):
    os.mkdir('video_frames')

# Loading the source video
video_path = 'source.mp4' 
vidcap = cv2.VideoCapture(video_path)
success, image = vidcap.read()
count = 0

# Loop through the video and save each frame as an image
while success:
    # Save frame as JPEG file
    cv2.imwrite(f"video_frames/frame{count:04d}.jpg", image)  
    success, image = vidcap.read()
    count += 1

print(f"{count} frames extracted.")

309 frames extracted.


1. cv2.VideoCapture() opens the video file.
2. The while success: loop reads each frame, and cv2.imwrite() saves each frame as a JPEG image in the video_frames directory.
4. Each frame is saved with a numbered file name like frame0001.jpg, frame0002.jpg, etc.

Divided the video into discrete image frames.

#### Step 2 : Conducting inference on each frame of the video, drawing bounding boxes around detected vehicles.

In [8]:
import torch
from torchvision.models.detection import fasterrcnn_resnet50_fpn
from torchvision.transforms import functional as F
from PIL import Image
import cv2

# Load the pre-trained Faster R-CNN model
model = fasterrcnn_resnet50_fpn(pretrained=True)
model.eval()

# Check if a GPU is available, if not use the CPU
device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')
model.to(device)

# COCO traffic-related label indices with person and other objects
TRAFFIC_LABELS = {
    1: 'person', 
    3: 'car', 
    6: 'bus', 
    8: 'truck', 
    4: 'motorbike', 
    2: 'bicycle', 
    10: 'traffic light', 
    12: 'stop sign'
}
def detect_vehicles(image_path):
    # Load image using PIL
    image = Image.open(image_path).convert("RGB")

    # Transform image to tensor
    image_tensor = F.to_tensor(image).unsqueeze(0).to(device)

    # Perform inference
    with torch.no_grad():
        predictions = model(image_tensor)

    # Extract bounding boxes, labels, and scores
    boxes = predictions[0]['boxes'].cpu().numpy()
    labels = predictions[0]['labels'].cpu().numpy()
    scores = predictions[0]['scores'].cpu().numpy()

    return boxes, labels, scores

def draw_boxes_for_traffic_objects(image_path, boxes, labels, scores, threshold=0.5):
    image = cv2.imread(image_path) 
    for i, box in enumerate(boxes):
        # Only draw boxes for traffic-related objects with a score above the threshold
        if labels[i] in TRAFFIC_LABELS and scores[i] > threshold:
            # Extract bounding box coordinates
            x1, y1, x2, y2 = map(int, box)
            # Draw bounding box
            cv2.rectangle(image, (x1, y1), (x2, y2), (0, 255, 0), 2)
            # Add label to the bounding box
            cv2.putText(image, TRAFFIC_LABELS[labels[i]], (x1, y1 - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.9, (255, 0, 0), 2)

    return image

# Process all frames
output_dir = 'output_frames'
if not os.path.exists(output_dir):
    os.mkdir(output_dir)

# List all frames
frame_list = sorted(os.listdir('video_frames'))

# Run detection and save results
for frame_file in frame_list:
    frame_path = os.path.join('video_frames', frame_file)
    boxes, labels, scores = detect_vehicles(frame_path)
    result_frame = draw_boxes_for_traffic_objects(frame_path, boxes, labels, scores)
    cv2.imwrite(os.path.join(output_dir, frame_file), result_frame)

print(f"Processed {len(frame_list)} frames.")

Processed 309 frames.


1. Faster R-CNN is a object detection model utlized to detect objects in an image and draw bounding boxes around them. The pretrained=True flag loads weights from a model pre-trained on the COCO dataset.

2. model.eval() puts the model into evaluation mode, which is important for inference to ensure layers like dropout and batch normalization behave correctly.
labels represent COCO dataset class indices for common traffic-related objects like person, car, bus, truck, etc. They are used later for drawing bounding boxes on the detected objects.

3. Loading the image: The image is loaded and converted to RGB format using PIL.
4. Converting to a tensor: The image is transformed into a PyTorch tensor.
5. Performing inference: The model is applied to the image tensor to make predictions. torch.no_grad() ensures no gradients are computed, speeding up inference and saving memory.
6. Extracting results: The model outputs bounding boxes (boxes), object labels (labels), and confidence scores (scores), which are converted to NumPy arrays for easy manipulation.
7. Thresholding: Only objects with a confidence score higher than the threshold (0.5 by default) are considered.
8. Drawing boxes: For each detected object, the bounding box coordinates (x1, y1, x2, y2) are drawn on the image using cv2.rectangle(), and the corresponding traffic object label is added using cv2.putText().

9. Processing the frames: Each frame from the video is loaded from the video_frames folder. The detect_vehicles() function is applied to the frame to get the bounding boxes, labels, and scores.
10. Saving the results: After drawing bounding boxes, the modified frames are saved in the output_frames directory.

#### Step 3 : Formating the Results back into a Video.

In [9]:
def frames_to_video(output_dir, output_video_path, fps=30):
    frame_files = sorted([f for f in os.listdir(output_dir) if f.endswith('.jpg')])
    first_frame = cv2.imread(os.path.join(output_dir, frame_files[0]))
    height, width, layers = first_frame.shape

    # Define the codec and create VideoWriter object
    video = cv2.VideoWriter(output_video_path, cv2.VideoWriter_fourcc(*'mp4v'), fps, (width, height))

    for frame_file in frame_files:
        frame = cv2.imread(os.path.join(output_dir, frame_file))
        video.write(frame)

    video.release()
    print(f"Video saved to {output_video_path}")

# Combine frames into video
output_video_path = 'detected_vehicles_video.mp4'
frames_to_video('output_frames', output_video_path)


Video saved to detected_vehicles_video.mp4


1. This above function takes individual image frames from a specified directory, reads them sequentially, and writes them into a video file using OpenCV's VideoWriter.

2. The output is a video that plays the frames at a specified frame rate (default 30 fps), making it ideal for scenarios like object detection where each frame represents a step in a process. 