# Vehicle Detection in Videos Using Faster R-CNN with ResNet101 and PyTorch

### Vehicle detection is a critical task in computer vision, with applications ranging from autonomous driving to traffic management. In this article, we will explore how to use Faster R-CNN with a ResNet101 backbone, a powerful object detection model, to detect vehicles in a video. We will utilize PyTorch, a popular deep learning framework, for this task.

### The primary goal of this tutorial is to demonstrate how to perform vehicle detection on a video using a pre-trained Faster R-CNN model with a ResNet101 backbone. This tutorial assumes you have a basic understanding of Python and PyTorch.

<figure>
        <img src="https://storage.googleapis.com/kaggle-datasets-images/4733451/8031295/9065058552fd8c6529abfb736ba03b32/dataset-cover.png?t=2024-04-04-19-51-20" alt ="Audio Art" style='width:800px;height:500px;'>
        <figcaption>

In [8]:
!pip install ipywidgets

Collecting ipywidgets
  Downloading ipywidgets-8.1.3-py3-none-any.whl (139 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m139.4/139.4 kB[0m [31m5.6 MB/s[0m eta [36m0:00:00[0m
Collecting jupyterlab-widgets~=3.0.11
  Downloading jupyterlab_widgets-3.0.11-py3-none-any.whl (214 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m214.4/214.4 kB[0m [31m11.9 MB/s[0m eta [36m0:00:00[0m
Collecting widgetsnbextension~=4.0.11
  Downloading widgetsnbextension-4.0.11-py3-none-any.whl (2.3 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.3/2.3 MB[0m [31m24.2 MB/s[0m eta [36m0:00:00[0ma [36m0:00:01[0m
Installing collected packages: widgetsnbextension, jupyterlab-widgets, ipywidgets
Successfully installed ipywidgets-8.1.3 jupyterlab-widgets-3.0.11 widgetsnbextension-4.0.11
[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.0.1[0m[39;49m -> [0m[32;49m24.1.2[0m
[1m[[0m[3

In [9]:
import cv2
import torch
import shutil
from torchvision import models, transforms

## Define the transformation for the input video frames

In [10]:
transform = transforms.Compose([
    transforms.ToTensor()
])

## Load the pre-trained Faster R-CNN model with ResNet101 backbone

In [11]:
model = models.detection.fasterrcnn_resnet50_fpn(pretrained=True)
model.eval()

FasterRCNN(
  (transform): GeneralizedRCNNTransform(
      Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
      Resize(min_size=(800,), max_size=1333, mode='bilinear')
  )
  (backbone): BackboneWithFPN(
    (body): IntermediateLayerGetter(
      (conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
      (bn1): FrozenBatchNorm2d(64, eps=0.0)
      (relu): ReLU(inplace=True)
      (maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
      (layer1): Sequential(
        (0): Bottleneck(
          (conv1): Conv2d(64, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn1): FrozenBatchNorm2d(64, eps=0.0)
          (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
          (bn2): FrozenBatchNorm2d(64, eps=0.0)
          (conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn3): FrozenBatchNorm2d(256, eps=0.0)
          (relu): ReLU(

## Define the video path

In [12]:
video_path = '/kaggle/input/vehicle-detection-image-dataset/Sample_Video_HighQuality.mp4'

## Open the video file

In [13]:
cap = cv2.VideoCapture(video_path)

## Get video properties

In [14]:
width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
fps = cap.get(cv2.CAP_PROP_FPS)
output_path = '/kaggle/working/output_video.mp4' 

## Define the codec and create a VideoWriter object

In [15]:
fourcc = cv2.VideoWriter_fourcc(*'mp4v')
out = cv2.VideoWriter(output_path, fourcc, fps, (width, height))

while cap.isOpened():
    ret, frame = cap.read()
    if not ret:
        break

    # Convert the frame to RGB and apply transformations
    img_rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
    img_tensor = transform(img_rgb).unsqueeze(0)

    # Perform object detection
    with torch.no_grad():
        predictions = model(img_tensor)

    # Draw bounding boxes and labels on the frame
    for element in range(len(predictions[0]['boxes'])):
        if predictions[0]['scores'][element] > 0.5:  # confidence threshold
            box = predictions[0]['boxes'][element].numpy().astype(int)
            cv2.rectangle(frame, (box[0], box[1]), (box[2], box[3]), (0, 255, 0), 2)
            label = f"{predictions[0]['labels'][element].item()}: {predictions[0]['scores'][element]:.2f}"
            cv2.putText(frame, label, (box[0], box[1] - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 255, 0), 2)

    # Write the frame with detections
    out.write(frame)

# Release the video capture and writer objects
cap.release()
out.release()
cv2.destroyAllWindows()

print(f"Processed video saved to {output_path}")

Processed video saved to /kaggle/working/output_video.mp4
