# ECSE 415 Final Project: Aman Sidhu (260885556) Partick Ohl ()

# Setup

In [1]:
from google.colab import drive
drive.mount('/content/drive')

path = '/content/drive/MyDrive/Olympus in the Sky/McGill/2024 - Winter/ECSE 415/Final Project/'

video_1 = 'st-catherines_drive.mp4'
video_2 = 'mcgill_drive.mp4'

Mounted at /content/drive


In [2]:
import matplotlib.pyplot as plt
import matplotlib.image  as mpimg
import numpy             as np
import cv2
import math

%matplotlib inline

def print_img(img, name = "", cmap = plt.get_cmap('gray')):
    plt.imshow(img, cmap=cmap)
    print(name + " Shape:", np.shape(img))
    plt.xticks([]), plt.yticks([])
    plt.show()

!pip install ultralytics

from ultralytics import YOLO
from ultralytics.solutions import object_counter
import cv2, math

Collecting ultralytics
  Downloading ultralytics-8.1.47-py3-none-any.whl (750 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m750.4/750.4 kB[0m [31m7.2 MB/s[0m eta [36m0:00:00[0m
Collecting thop>=0.1.1 (from ultralytics)
  Downloading thop-0.1.1.post2209072238-py3-none-any.whl (15 kB)
Collecting nvidia-cuda-nvrtc-cu12==12.1.105 (from torch>=1.8.0->ultralytics)
  Using cached nvidia_cuda_nvrtc_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (23.7 MB)
Collecting nvidia-cuda-runtime-cu12==12.1.105 (from torch>=1.8.0->ultralytics)
  Using cached nvidia_cuda_runtime_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (823 kB)
Collecting nvidia-cuda-cupti-cu12==12.1.105 (from torch>=1.8.0->ultralytics)
  Using cached nvidia_cuda_cupti_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (14.1 MB)
Collecting nvidia-cudnn-cu12==8.9.2.26 (from torch>=1.8.0->ultralytics)
  Using cached nvidia_cudnn_cu12-8.9.2.26-py3-none-manylinux1_x86_64.whl (731.7 MB)
Collecting nvidia-cublas-cu12==12.1

In [None]:
#detailed description of the overall approach taken. State clearly any assumptions that you made.
#descriptions of each software package or routine used
#summary of program output on the two videos, with comparison to manually obtained ground truth values
#discussion of program performance and problems

# Description of Overall Approach and Assumptions

Our approach makes extensive use of the YOLO model features and functions from Ultralytics to perform key computer vision techniques needed for this problem: object detection/classification, multi-object tracking, and bounding box generation. Specifically, we use the pre-trained YOLOv8 model which is capable of these tasks, while providing fast and accurate results. Futhermore, as we are trying to detect number of passed cars and people in the scene, we limit the object detection to only focus on 'car', 'bus', 'truck', and 'person' objects.

Counting unique objects first involves creating a python set variable, an array which excludes repetitions, for both cars and pedestrians, respectively. In both sets, we include the tracking id of each passed pedestrian, and car even if it is parked or moving. This arises our first assumption: any person whose bounding box center is anywhere in the bottom 40% of the screen will be considered as passed, and for cars it is 26.5%. From our testing, this helped with situations where objects might only be detected in the few frames before they are passed, and reduce instances where occlusion in the environment caused by moving cars and our moving perspective removes the chance of detecting an object. Tighter thresholds would perhaps be more rigorous but greatly reduce the window to detect objects. This threshold also helps reduce the times we recount the same object twice, and count objects near the end of the video that we would not have passed yet. We also assume that any person detected is a pedestrian, which would also include any bikers. Lastly, to distinguish between moving and parked cars we use object_counter object from Ultralytics to count the tracking IDs of cars whose bounding box centers pass through a certain region of the screen, specifically a line. This approach tries to exploit how moving cars that we pass generally follow a path across the screen. This line is angled and positioned near the very bottom of the screen to ensure that only passed, moving cars are counted. This separate set of tracking IDs is then removed from the set of all car IDs to provide the final number of parked and moving cars.

# Software Packages Used

**YOLO (Ultralytics)**
You Only Look Once (YOLO), is a popular object detection and image segmentation model, originating from a paper called "You Only Look Once: Unified, Real-Time Object Detection" by Joseph Redmon et al. from the University of Washington. The original YOLO implementation uses a single CNN to predict bounding boxes and class probabilities directly from images in one evaluation. It first divides the image into a SxS grid, where it tries to find a suitable bounding box in each cell and the corresponding class confidence. Ultralytics provides an improved version of YOLO capable of providing fast, meaningful results and has a variety of extra features. Some features include detection, segmentation, pose estimation, tracking, and classification. For our purposes, we are using the object classification, tracking, bounding box generation, and object_counter functionality from Ultralytics.

**OpenCV**
Open Source Computer Vision Library (OpenCV) is a very popular computer vision library that provides many additional functionalities such as image processing, and feature extraction. In this project, OpenCV is used to read individual frames from the video which then passed to the YOLO model to perform tracking, object detection, line counting, and bounding box generation.

# Program Results

In [4]:
model = YOLO("yolov8n.pt")
# Yolo, Object_Counter, threshold Arguments
classes_to_count            = [0, 2, 5, 7] # 0: person, 2: car, 5: bus, 7: truck
car_confidence_threshold    = 0.8
person_confidence_threshold = 0.6
count_names                 = {i: model.names[i] for i in classes_to_count}

for vid_num, video in enumerate((video_1, video_2)):
  cap = cv2.VideoCapture(path + video)
  assert cap.isOpened(), "Error reading video file"
  w, h, fps = (int(cap.get(x)) for x in (cv2.CAP_PROP_FRAME_WIDTH, cv2.CAP_PROP_FRAME_HEIGHT, cv2.CAP_PROP_FPS))

  bottom_threshold_cars       = int(0.735*h)
  bottom_threshold_person     = int(0.6*h)

  # Init Object Counters
  # Tracks the motion of cars that pass through the bottom of the screen
  counter_cars = object_counter.ObjectCounter()
  line_points = [(int(w*0.2), int(0.762*h)), (int(w*0.8), int(0.9125*h))]
  counter_cars.set_args(view_img = True, reg_pts = line_points, classes_names = count_names, draw_tracks=True)

  # Video writer
  video_writer = cv2.VideoWriter("video_{}.avi".format(vid_num), cv2.VideoWriter_fourcc(*'mp4v'), fps, (w, h))

  unique_cars   = set()
  unique_people = set()

  while cap.isOpened():
      success, im0 = cap.read()
      if not success:
          print("Video frame is empty or video processing has been successfully completed.")
          break

      # Tracking
      tracks = model.track(im0, persist=True, show=False, classes=classes_to_count, verbose = False)

      # Process Bounding Boxes
      for r in tracks:
        for box in r.boxes:
          conf = math.ceil(box.conf[0]*100)/100

          # Count all cars and people that pass near the bottom 40% of the screen
          if r.names[int(box.cls[0])] in ('bus', 'car', 'truck') and conf > car_confidence_threshold:
            x_tl, y_tl, x_br, y_bl = box.xyxy[0]
            if (y_tl + abs(y_tl - y_bl)//2) > bottom_threshold_cars:
              unique_cars.add(int(box.id))

          if r.names[int(box.cls[0])] == 'person' and conf > person_confidence_threshold:
            x_tl, y_tl, x_br, y_bl = box.xyxy[0]
            if (y_tl + abs(y_tl - y_bl)//2) > bottom_threshold_person:
              unique_people.add(int(box.id))

      im0 = counter_cars.start_counting(im0, tracks)
      video_writer.write(im0)

  cap.release()
  video_writer.release()
  cv2.destroyAllWindows()

  print('-'*100)
  print('Video: ' + video)
  all_cars = len(unique_cars)
  print('Total Number of Cars: {}'.format(all_cars)) # all cars

  for moving_car in counter_cars.count_ids:
    unique_cars.discard(moving_car)

  parked_cars = len(unique_cars)
  print('Total Number of Parked Cars: {}'.format(parked_cars)) # all cars - marked outbound cars
  print('Total Number of Moving Cars: {}'.format(len(counter_cars.count_ids))) # number of moving cars
  print('Total Number of Pedestrians: {}'.format(len(unique_people))) # number of passed cars


Line Counter Initiated.
Video frame is empty or video processing has been successfully completed.
----------------------------------------------------------------------------------------------------
Video: st-catherines_drive.mp4
Total Number of Cars: 54
Total Number of Parked Cars: 53
Total Number of Moving Cars: 1
Total Number of Pedestrians: 61

Line Counter Initiated.
Video frame is empty or video processing has been successfully completed.
----------------------------------------------------------------------------------------------------
Video: mcgill_drive.mp4
Total Number of Cars: 33
Total Number of Parked Cars: 18
Total Number of Moving Cars: 15
Total Number of Pedestrians: 21


Ground Truth:

**St. Catherine**
* Total Number of Passed Moving Cars: 3
* Total Number of Passed Parked Cars: 55
* Total Number of Pedestrians: 104

**McGill Drive**
* Total Number of Passed Moving Cars: 25
* Total Number of Passed Parked Cars: 11
* Total Number of Pedestrians: 30

# TODO: create a confusion matrix to show results, potentially play videos in notebook

# Discussion of Program Performance and Problems

One flaw in our approach is that do not make the difference between a parked car and stopped cars moving the perpendicular direction at intersections. Since both cars appear on similar areas of the screen and are not moving, these cars are misclassifed in our program as parked. Similarly, bikers are also misclassified as pedestrians. The threshold is not effective with larger vehicles since their bounding box center is generally higher than the threshold as it passes out of view. The object_counter object used to detect moving cars is able to capture many cars that pass us, but since moving cars and parked cars can sometime occupy the same region of the screen,
The speed provided by YOLO helped significantly with testing and validating different approaches, however a more accurate model could have provided more consistent results.