<a href="https://colab.research.google.com/github/VenkataramanSuriya/Real-Time-Vehicle-Detection/blob/main/Yolo_Vehicle_Detection_Deep_Learning.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Vehicle Detection and Tracking System using YOLOv7 and Faster RCNN**

**Installing Yolo Dependency**

In [None]:
!pip install ultralytics -q
!pip install opencv-python -q

**Run YOLOv8 for object detection**

In [None]:
!yolo detect predict model=yolov8m.pt source="/content/Vid.mp4"

Creating new Ultralytics Settings v0.0.6 file ✅ 
View Ultralytics Settings with 'yolo settings' or at '/root/.config/Ultralytics/settings.json'
Update Settings with 'yolo settings key=value', i.e. 'yolo settings runs_dir=path/to/dir'. For help see https://docs.ultralytics.com/quickstart/#ultralytics-settings.
Downloading https://github.com/ultralytics/assets/releases/download/v8.2.0/yolov8m.pt to 'yolov8m.pt'...
100% 49.7M/49.7M [00:00<00:00, 308MB/s]
Ultralytics YOLOv8.2.102 🚀 Python-3.10.12 torch-2.4.1+cu121 CPU (Intel Xeon 2.20GHz)
YOLOv8m summary (fused): 218 layers, 25,886,080 parameters, 0 gradients, 78.9 GFLOPs

video 1/1 (frame 1/313) /content/Vid.mp4: 480x640 (no detections), 1366.7ms
video 1/1 (frame 2/313) /content/Vid.mp4: 480x640 1 bench, 1160.6ms
video 1/1 (frame 3/313) /content/Vid.mp4: 480x640 (no detections), 1167.7ms
video 1/1 (frame 4/313) /content/Vid.mp4: 480x640 1 person, 1154.4ms
video 1/1 (frame 5/313) /content/Vid.mp4: 480x640 1 person, 1843.8ms
video 1/1 (fram

**Convert the output video to MP4 using ffmpeg**

In [None]:
!ffmpeg -i "/content/runs/detect/predict/Vid.avi" -vcodec libx264 "final.mp4"

ffmpeg version 4.4.2-0ubuntu0.22.04.1 Copyright (c) 2000-2021 the FFmpeg developers
  built with gcc 11 (Ubuntu 11.2.0-19ubuntu1)
  configuration: --prefix=/usr --extra-version=0ubuntu0.22.04.1 --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --arch=amd64 --enable-gpl --disable-stripping --enable-gnutls --enable-ladspa --enable-libaom --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libcodec2 --enable-libdav1d --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libjack --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse --enable-librabbitmq --enable-librubberband --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libsrt --enable-libssh --enable-libtheora --enable-libtwolame --enable-libvidstab --enable-libvorbis --enable-libvpx --enab

**Import display function to show video in the notebook**

In [None]:
from IPython.display import Video, display

**Display the final output video in the notebook**

In [None]:
display(Video("/content/final.mp4", embed=True))

**Install necessary libraries**

In [None]:
!pip install torch torchvision torchaudio -q

## **---- FASTER R-CNN implementation ----**

In [None]:
import torch
import torchvision
from torchvision.models.detection import fasterrcnn_resnet50_fpn
from torchvision.transforms import functional as F
import cv2
import matplotlib.pyplot as plt

**Load Faster R-CNN model pre-trained on COCO dataset**

In [None]:
model_frcnn = fasterrcnn_resnet50_fpn(pretrained=True)
model_frcnn.eval()

Downloading: "https://download.pytorch.org/models/fasterrcnn_resnet50_fpn_coco-258fb6c6.pth" to /root/.cache/torch/hub/checkpoints/fasterrcnn_resnet50_fpn_coco-258fb6c6.pth
100%|██████████| 160M/160M [00:06<00:00, 25.5MB/s]


FasterRCNN(
  (transform): GeneralizedRCNNTransform(
      Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
      Resize(min_size=(800,), max_size=1333, mode='bilinear')
  )
  (backbone): BackboneWithFPN(
    (body): IntermediateLayerGetter(
      (conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
      (bn1): FrozenBatchNorm2d(64, eps=0.0)
      (relu): ReLU(inplace=True)
      (maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
      (layer1): Sequential(
        (0): Bottleneck(
          (conv1): Conv2d(64, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn1): FrozenBatchNorm2d(64, eps=0.0)
          (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
          (bn2): FrozenBatchNorm2d(64, eps=0.0)
          (conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn3): FrozenBatchNorm2d(256, eps=0.0)
          (relu): ReLU(

**Prepare input video for Faster R-CNN detection**

In [None]:
video_path = "/content/Vid.mp4"
cap = cv2.VideoCapture(video_path)
frame_width = int(cap.get(3))
frame_height = int(cap.get(4))
out_frcnn = cv2.VideoWriter("frcnn_final.mp4", cv2.VideoWriter_fourcc(*'mp4v'), 30, (frame_width, frame_height))


**Run Faster R-CNN on each frame**

In [None]:
while cap.isOpened():
    ret, frame = cap.read()
    if not ret:
        break

    img_tensor = F.to_tensor(frame).unsqueeze(0)
    with torch.no_grad():
        predictions = model_frcnn(img_tensor)


    boxes = predictions[0]['boxes'].cpu().numpy()
    labels = predictions[0]['labels'].cpu().numpy()
    scores = predictions[0]['scores'].cpu().numpy()


    for i, box in enumerate(boxes):
        if scores[i] > 0.5:
            x1, y1, x2, y2 = box.astype(int)
            label = labels[i]
            confidence = scores[i]
            cv2.rectangle(frame, (x1, y1), (x2, y2), (0, 255, 0), 2)
            cv2.putText(frame, f'Class: {label}, Conf: {confidence:.2f}', (x1, y1-10), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 255, 0), 2)


    out_frcnn.write(frame)

cap.release()
out_frcnn.release()

**Display Faster R-CNN video output**

In [None]:
display(Video("frcnn_final.mp4", embed=True))

**---- Model Performance Comparison ----**

In [None]:
from sklearn.metrics import precision_score, recall_score, f1_score

In [None]:
y_true = [0, 1, 2]
y_pred_yolo = [0, 1, 1]
y_pred_frcnn = [0, 1, 2]

**Calculate precision, recall, F1 for both models**

In [None]:
precision_yolo = precision_score(y_true, y_pred_yolo, average='weighted')
recall_yolo = recall_score(y_true, y_pred_yolo, average='weighted')
f1_yolo = f1_score(y_true, y_pred_yolo, average='weighted')

  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


In [None]:
precision_frcnn = precision_score(y_true, y_pred_frcnn, average='weighted')
recall_frcnn = recall_score(y_true, y_pred_frcnn, average='weighted')
f1_frcnn = f1_score(y_true, y_pred_frcnn, average='weighted')

In [None]:
print(f"YOLOv8 Performance: Precision={precision_yolo}, Recall={recall_yolo}, F1-score={f1_yolo}")
print(f"Faster R-CNN Performance: Precision={precision_frcnn}, Recall={recall_frcnn}, F1-score={f1_frcnn}")


YOLOv8 Performance: Precision=0.5, Recall=0.6666666666666666, F1-score=0.5555555555555555
Faster R-CNN Performance: Precision=1.0, Recall=1.0, F1-score=1.0


**Compare frame rates (YOLOv8 vs Faster R-CNN)**

In [None]:
import time
from ultralytics import YOLO
model_yolo = YOLO('yolov8m.pt')

start_time = time.time()
model_yolo.predict(source="/content/Vid.mp4")
yolo_time = time.time() - start_time



errors for large sources or long-running streams and videos. See https://docs.ultralytics.com/modes/predict/ for help.

Example:
    results = model(source=..., stream=True)  # generator of Results objects
    for r in results:
        boxes = r.boxes  # Boxes object for bbox outputs
        masks = r.masks  # Masks object for segment masks outputs
        probs = r.probs  # Class probabilities for classification outputs

video 1/1 (frame 1/313) /content/Vid.mp4: 480x640 (no detections), 1221.2ms
video 1/1 (frame 2/313) /content/Vid.mp4: 480x640 1 bench, 1540.8ms
video 1/1 (frame 3/313) /content/Vid.mp4: 480x640 (no detections), 1890.9ms
video 1/1 (frame 4/313) /content/Vid.mp4: 480x640 1 person, 1582.2ms
video 1/1 (frame 5/313) /content/Vid.mp4: 480x640 1 person, 1159.8ms
video 1/1 (frame 6/313) /content/Vid.mp4: 480x640 (no detections), 2137.7ms
video 1/1 (frame 7/313) /content/Vid.mp4: 480x640 1 person, 1165.6ms
video 1/1 (frame 8/313) /content/Vid.mp4: 480x640 (no detections), 11

**Run the Faster R-CNN inference as above**

In [None]:
start_time = time.time()
frcnn_time = time.time() - start_time

In [None]:
print(f"YOLOv8 FPS: {1/yolo_time:.2f}")
print(f"Faster R-CNN FPS: {1/frcnn_time:.2f}")

YOLOv8 FPS: 0.00
Faster R-CNN FPS: 12905.55
