### Authors: Amy Qi Wang 

In [1]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [2]:
#@title dependencies
!pip install ultralytics
!pip install cvzone
!pip install tracker

Collecting ultralytics
  Downloading ultralytics-8.0.222-py3-none-any.whl (653 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m654.0/654.0 kB[0m [31m9.2 MB/s[0m eta [36m0:00:00[0m
Collecting thop>=0.1.1 (from ultralytics)
  Downloading thop-0.1.1.post2209072238-py3-none-any.whl (15 kB)
Installing collected packages: thop, ultralytics
Successfully installed thop-0.1.1.post2209072238 ultralytics-8.0.222
Collecting cvzone
  Downloading cvzone-1.6.1.tar.gz (25 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Building wheels for collected packages: cvzone
  Building wheel for cvzone (setup.py) ... [?25l[?25hdone
  Created wheel for cvzone: filename=cvzone-1.6.1-py3-none-any.whl size=26297 sha256=f99719200b444d4a366d283e3871dfc2f6ef5ba5adb254fda1374b0f9dd11cb5
  Stored in directory: /root/.cache/pip/wheels/2c/9f/b3/92e945ac4a71bf727a92463f38155cc5a4fa49c5010b38ec4c
Successfully built cvzone
Installing collected packages: cvzone
Successfully installed cvzone

In [3]:
!git clone https://github.com/pjreddie/darknet
%cd darknet
!make

Cloning into 'darknet'...
remote: Enumerating objects: 5955, done.[K
remote: Total 5955 (delta 0), reused 0 (delta 0), pack-reused 5955[K
Receiving objects: 100% (5955/5955), 6.37 MiB | 12.57 MiB/s, done.
Resolving deltas: 100% (3932/3932), done.
/content/darknet
mkdir -p obj
mkdir -p backup
mkdir -p results
gcc -Iinclude/ -Isrc/ -Wall -Wno-unused-result -Wno-unknown-pragmas -Wfatal-errors -fPIC -Ofast -c ./src/gemm.c -o obj/gemm.o
gcc -Iinclude/ -Isrc/ -Wall -Wno-unused-result -Wno-unknown-pragmas -Wfatal-errors -fPIC -Ofast -c ./src/utils.c -o obj/utils.o
gcc -Iinclude/ -Isrc/ -Wall -Wno-unused-result -Wno-unknown-pragmas -Wfatal-errors -fPIC -Ofast -c ./src/cuda.c -o obj/cuda.o
gcc -Iinclude/ -Isrc/ -Wall -Wno-unused-result -Wno-unknown-pragmas -Wfatal-errors -fPIC -Ofast -c ./src/deconvolutional_layer.c -o obj/deconvolutional_layer.o
gcc -Iinclude/ -Isrc/ -Wall -Wno-unused-result -Wno-unknown-pragmas -Wfatal-errors -fPIC -Ofast -c ./src/convolutional_layer.c -o obj/convolutional_

In [4]:
!wget https://pjreddie.com/media/files/yolov3.weights

--2023-12-06 00:52:52--  https://pjreddie.com/media/files/yolov3.weights
Resolving pjreddie.com (pjreddie.com)... 128.208.4.108
Connecting to pjreddie.com (pjreddie.com)|128.208.4.108|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 248007048 (237M) [application/octet-stream]
Saving to: ‘yolov3.weights’


2023-12-06 00:53:09 (14.9 MB/s) - ‘yolov3.weights’ saved [248007048/248007048]



In [5]:
#@title Tracker
import math

class Tracker:
    def __init__(self):
        self.center_points = {}  # Stores the center positions of objects
        self.id_count = 0  # Counter for the IDs

    def _calculate_center(self, rect):
        """Calculate the center point of a rectangle."""
        x, y, w, h = rect
        return (x + w / 2, y + h / 2)

    def update(self, objects_rect):
        objects_bbs_ids = []

        for rect in objects_rect:
            center_x, center_y = self._calculate_center(rect)
            same_object_detected = False

            for obj_id, (tracked_x, tracked_y) in self.center_points.items():
                if math.hypot(center_x - tracked_x, center_y - tracked_y) < 35:
                    self.center_points[obj_id] = (center_x, center_y)
                    objects_bbs_ids.append([*rect, obj_id])
                    same_object_detected = True
                    break

            if not same_object_detected:
                self.center_points[self.id_count] = (center_x, center_y)
                objects_bbs_ids.append([*rect, self.id_count])
                self.id_count += 1

        # Update the dictionary by keeping only the IDs that are still in use
        self.center_points = {obj_id: self.center_points[obj_id] for obj_id in [obj[-1] for obj in objects_bbs_ids]}

        return objects_bbs_ids


import math:
This imports the math module, which provides mathematical functions. In this code, it's used for the hypot function to calculate the Euclidean distance between points.

class Tracker:
This line starts the definition of a class named Tracker, which will contain methods and attributes for tracking objects.

    def __init__(self):
This is the initializer for the Tracker class. It's automatically called when you create an instance of this class.

    self.center_points = {}
Within the initializer, this line creates an empty dictionary center_points as an attribute of the class. This dictionary will store the center positions of tracked objects.

    self.id_count = 0
Another attribute, id_count, is initialized to 0. This will be used to assign unique IDs to new objects as they are tracked.

    def _calculate_center(self, rect):
This is a private method (indicated by the underscore at the beginning) of the Tracker class. It's used to calculate the center point of a rectangle.

    x, y, w, h = rect
This line unpacks the rect argument, which is expected to be a tuple or list containing the coordinates of a rectangle (x, y) and its width (w) and height (h).

    return (x + w / 2, y + h / 2)
The method calculates and returns the center point of the rectangle. The center's x-coordinate is x + w/2, and the y-coordinate is y + h/2.

    def update(self, objects_rect):
This is a public method of the Tracker class. It's used to update the tracking status of objects based on their new positions (rectangles).

    objects_bbs_ids = []
This line initializes an empty list, objects_bbs_ids, which will store information about the objects and their IDs.

    for rect in objects_rect:
This loop iterates over each rectangle (rect) in the input objects_rect, which represents the current positions of detected objects.

    Center_x, center_y = self._calculate_center(rect)
For each rectangle, this line calculates its center point using the _calculate_center method.

    same_object_detected = False
This flag is set to False initially for each object. It will be used to determine if the current object has been detected in previous frames.

    for obj_id, (tracked_x, tracked_y) in self.center_points.items():
This nested loop iterates over the center_points dictionary to check if any previously tracked object is close to the current object.

    if math.hypot(center_x - tracked_x, center_y - tracked_y) < 35:
The math.hypot function calculates the Euclidean distance between the current object's center and the tracked object's center. If this distance is less than 35 (a threshold), it's assumed to be the same object.

    self.center_points[obj_id] = (center_x, center_y)...
If the same object is detected, its center point is updated in the center_points dictionary, and its rectangle and ID are added to objects_bbs_ids. The flag same_object_detected is set to True.

    if not same_object_detected:...
If the object is not found in the existing center_points, it is considered a new object. It's then added to the center_points dictionary with a new ID, and its information is appended to objects_bbs_ids. The id_count is incremented to ensure a unique ID for the next object.

    self.center_points = {obj_id:...
After processing all objects, the center_points dictionary is updated to keep only the IDs that are still in use, removing any objects not detected in the current frame.

    return objects_bbs_ids:
Finally, the method returns the list objects_bbs_ids, containing information about all tracked objects and their IDs in the current frame.
This code is a simple yet effective way to track objects across video frames by updating their positions and maintaining their unique IDs.

In [None]:
#@title submission code - mcgill_drive.mp4
import cv2
import pandas as pd
import numpy as np
from ultralytics import YOLO
import cvzone
import torch

# Initialize the YOLO model
model = YOLO('yolov8s.pt')


# Video and class names paths
two_lane_street = '/content/drive/MyDrive/Final_project/mcgill_drive.mp4'
coco_name_path = '/content/darknet/data/coco.names'


# Load class list
with open(coco_name_path, "r") as file:
    class_list = file.read().split("\n")

# Initialize trackers for cars and pedestrians
car_tracker = Tracker()
pedestrian_tracker = Tracker()

# Define areas of interest
left_lane_area = [(303,358), (257,368), (427,376), (440,366)]  # Left 2 lane cars
parked_cars_left_area = [(42,348), (42,382), (58,382), (59,344)]  # Left parked cars
pedestrians_area_1 = [(787,324), (787,413), (818,413), (818,324)]
parked_cars_right_area = [(812,375),(812,428),(837,428),(837,375)]
pedestrians_area_2 = [(57,329), (57,373), (80,373), (80,329)]

# Sets to keep track of distinct objects in each area
left_lane_cars = set()
parked_cars = set()
pedestrians = set()

# Open the video
cap = cv2.VideoCapture(two_lane_street)
count = 0

while True:
    ret, frame = cap.read()
    if not ret:
        break

    count += 1
    if count % 2 != 0:
        continue

    frame = cv2.resize(frame, (1020, 500))

    # Detect objects in the frame
    results = model.predict(frame)
    if torch.cuda.is_available():
      detection_data = pd.DataFrame(results[0].boxes.cpu().data).astype("float")
    else:
      detection_data = pd.DataFrame(results[0].boxes.data).astype("float")

    # Process each detection
    cars = []
    people = []
    for index, row in detection_data.iterrows():
        bbox = [int(row[i]) for i in range(4)]  # Extract bounding box
        class_id = int(row[5])
        class_name = class_list[class_id]

        if 'car' in class_name:
            cars.append(bbox)
        elif 'person' in class_name:
            people.append(bbox)

    # Update trackers
    cars_tracked = car_tracker.update(cars)
    people_tracked = pedestrian_tracker.update(people)

    # Process tracked cars
    for x, y, w, h, id in cars_tracked:

        cx, cy = (x + w) // 2, (y + h) // 2
        if cv2.pointPolygonTest(np.array(left_lane_area, np.int32), (cx, cy), False) > 0:
            cv2.circle(frame,(cx,cy),4,(0,0,255),-1)
            cv2.rectangle(frame,(x,y),(x,y),(255,255,255),2)
            cvzone.putTextRect(frame,f'{id}',(x,y),1,1)
            left_lane_cars.add(id)
        elif cv2.pointPolygonTest(np.array(parked_cars_left_area, np.int32), (cx, cy), False) > 0 or cv2.pointPolygonTest(np.array(parked_cars_left_area, np.int32), (cx, cy), False) > 0:
            cv2.circle(frame,(cx,cy),4,(0,0,255),-1)
            cv2.rectangle(frame,(x,y),(x,y),(255,255,255),2)
            cvzone.putTextRect(frame,f'{id}',(x,y),1,1)
            parked_cars.add(id)

    # Process tracked pedestrians
    for x, y, w, h, id in people_tracked:

        cx, cy = (x + w) // 2, (y + h) // 2
        if cv2.pointPolygonTest(np.array(pedestrians_area_1, np.int32), (cx, cy), False) > 0 or cv2.pointPolygonTest(np.array(pedestrians_area_2, np.int32), (cx, cy), False) > 0:
            cv2.circle(frame,(cx,cy),4,(0,0,255),-1)
            cv2.rectangle(frame,(x,y),(w,h),(255,255,255),2)
            cvzone.putTextRect(frame,f'{id}',(x,y),1,1)
            pedestrians.add(id)

    # Draw the polygons on the frame
    cv2.polylines(frame, [np.array(left_lane_area, np.int32)], True, (0, 255, 0), 2)
    cv2.polylines(frame, [np.array(parked_cars_left_area, np.int32)], True, (255, 0, 0), 2)
    cv2.polylines(frame, [np.array(pedestrians_area_1, np.int32)], True, (0, 0, 255), 2)
    cv2.polylines(frame, [np.array(pedestrians_area_2, np.int32)], True, (0, 0, 255), 2)
    cv2.polylines(frame, [np.array(parked_cars_right_area, np.int32)], True, (0, 0, 255), 2)

# Print results
print("Number of moving cars passed:", len(left_lane_cars))
print("Number of parked cars passed:", len(parked_cars))
print("Number of pedestrians passed:", len(pedestrians))


Downloading https://github.com/ultralytics/assets/releases/download/v0.0.0/yolov8s.pt to 'yolov8s.pt'...


100%|██████████| 21.5M/21.5M [00:00<00:00, 201MB/s]



0: 320x640 6 cars, 110.9ms
Speed: 7.8ms preprocess, 110.9ms inference, 40.0ms postprocess per image at shape (1, 3, 320, 640)

0: 320x640 5 cars, 8.5ms
Speed: 4.1ms preprocess, 8.5ms inference, 1.5ms postprocess per image at shape (1, 3, 320, 640)

0: 320x640 5 cars, 1 traffic light, 6.9ms
Speed: 4.2ms preprocess, 6.9ms inference, 1.5ms postprocess per image at shape (1, 3, 320, 640)

0: 320x640 5 cars, 1 traffic light, 7.7ms
Speed: 2.6ms preprocess, 7.7ms inference, 1.4ms postprocess per image at shape (1, 3, 320, 640)

0: 320x640 6 cars, 1 traffic light, 7.4ms
Speed: 3.3ms preprocess, 7.4ms inference, 1.5ms postprocess per image at shape (1, 3, 320, 640)

0: 320x640 1 person, 8 cars, 1 traffic light, 8.1ms
Speed: 2.7ms preprocess, 8.1ms inference, 1.7ms postprocess per image at shape (1, 3, 320, 640)

0: 320x640 1 person, 8 cars, 1 traffic light, 7.4ms
Speed: 3.2ms preprocess, 7.4ms inference, 2.0ms postprocess per image at shape (1, 3, 320, 640)

0: 320x640 7 cars, 1 traffic light,

In [7]:
#@title submission code - st_catherine.mp4
import cv2
import pandas as pd
import numpy as np
from ultralytics import YOLO
import cvzone
import torch

# Initialize the YOLO model
model = YOLO('yolov8s.pt')

# Video and class names paths
one_way_street = '/content/drive/MyDrive/Final_project/st_catherine.mp4'
coco_name_path = '/content/darknet/data/coco.names'

# Load class list
with open(coco_name_path, "r") as file:
    class_list = file.read().split("\n")

# Initialize trackers for cars and pedestrians
car_tracker = Tracker()
pedestrian_tracker = Tracker()

# Define areas of interest
moving_cars_area = [(303,358),(257,368),(427,376),(440,366)]
parked_cars_left_area = [(38,348),(38,421),(67,421),(66,352)]
parked_cars_right_area = [(864,352),(864,425),(891,425),(891,352)]
pedestrians_right_area = [(890,295),(890,415),(852,410),(852,295)]
pedestrians_left_area = [(60,291),(60,385),(109,385),(109,291)]

# Sets to keep track of distinct objects in each area
moving_cars = set()
parked_cars = set()
pedestrians = set()

# Open the video
cap = cv2.VideoCapture(one_way_street)
count = 0

while True:
    ret, frame = cap.read()
    if not ret:
        break

    count += 1
    if count % 2 != 0:
        continue

    frame = cv2.resize(frame, (1020, 500))

    # Detect objects in the frame
    results = model.predict(frame)
    detection_data = pd.DataFrame(results[0].boxes.cpu().data if torch.cuda.is_available() else results[0].boxes.data).astype("float")

    # Process each detection
    cars = []
    people = []
    for index, row in detection_data.iterrows():
        bbox = [int(row[i]) for i in range(4)]  # Extract bounding box
        class_id = int(row[5])
        class_name = class_list[class_id]

        if 'car' in class_name:
            cars.append(bbox)
        elif 'person' in class_name:
            people.append(bbox)

    # Update trackers
    cars_tracked = car_tracker.update(cars)
    people_tracked = pedestrian_tracker.update(people)

    # Process tracked cars
    for x, y, w, h, id in cars_tracked:
        cx, cy = (x + w) // 2, (y + h) // 2
        if cv2.pointPolygonTest(np.array(moving_cars_area, np.int32), (cx, cy), False) > 0:
            cv2.circle(frame,(cx,cy),4,(0,0,255),-1)
            cv2.rectangle(frame,(x,y),(w,h),(255,255,255),2)
            cvzone.putTextRect(frame,f'{id}',(x,y),1,1)
            moving_cars.add(id)
        elif cv2.pointPolygonTest(np.array(parked_cars_left_area, np.int32), (cx, cy), False) > 0:
            cv2.circle(frame,(cx,cy),4,(0,0,255),-1)
            cv2.rectangle(frame,(x,y),(w,h),(255,255,255),2)
            cvzone.putTextRect(frame,f'{id}',(x,y),1,1)
            parked_cars.add(id)
        elif cv2.pointPolygonTest(np.array(parked_cars_right_area, np.int32), (cx, cy), False) > 0:
            cv2.circle(frame,(cx,cy),4,(0,0,255),-1)
            cv2.rectangle(frame,(x,y),(w,h),(255,255,255),2)
            cvzone.putTextRect(frame,f'{id}',(x,y),1,1)
            parked_cars.add(id)

    # Process tracked pedestrians
    for x, y, w, h, id in people_tracked:
        cx, cy = (x + w) // 2, (y + h) // 2
        if cv2.pointPolygonTest(np.array(pedestrians_right_area, np.int32), (cx, cy), False) > 0:
            cv2.circle(frame,(cx,cy),4,(0,0,255),-1)
            cv2.rectangle(frame,(x,y),(w,h),(255,255,255),2)
            cvzone.putTextRect(frame,f'{id}',(x,y),1,1)
            pedestrians.add(id)
        elif cv2.pointPolygonTest(np.array(pedestrians_left_area, np.int32), (cx, cy), False) > 0:
            cv2.circle(frame,(cx,cy),4,(0,0,255),-1)
            cv2.rectangle(frame,(x,y),(w,h),(255,255,255),2)
            cvzone.putTextRect(frame,f'{id}',(x,y),1,1)
            pedestrians.add(id)

    # Draw the polygons on the frame
    cv2.polylines(frame, [np.array(moving_cars_area, np.int32)], True, (0, 255, 0), 2)
    cv2.polylines(frame, [np.array(parked_cars_left_area, np.int32)], True, (255, 0, 0), 2)
    cv2.polylines(frame, [np.array(parked_cars_right_area, np.int32)], True, (255, 0, 0), 2)
    cv2.polylines(frame, [np.array(pedestrians_right_area, np.int32)], True, (0, 0, 255), 2)
    cv2.polylines(frame, [np.array(pedestrians_left_area, np.int32)], True, (0, 0, 255), 2)

    # Break the loop on ESC key press
    if cv2.waitKey(0) & 0xFF == 27:
        break

# Print results
print("Number of moving cars passed:", len(moving_cars))
print("Number of parked cars passed:", len(parked_cars))
print("Number of pedestrians:", len(pedestrians))


# Clean up
cap.release()
cv2.destroyAllWindows()



0: 320x640 3 persons, 7 cars, 1 truck, 9.1ms
Speed: 1.8ms preprocess, 9.1ms inference, 1.5ms postprocess per image at shape (1, 3, 320, 640)

0: 320x640 3 persons, 7 cars, 1 truck, 7.3ms
Speed: 4.5ms preprocess, 7.3ms inference, 1.5ms postprocess per image at shape (1, 3, 320, 640)

0: 320x640 3 persons, 7 cars, 1 truck, 6.6ms
Speed: 2.1ms preprocess, 6.6ms inference, 1.4ms postprocess per image at shape (1, 3, 320, 640)

0: 320x640 3 persons, 8 cars, 1 truck, 7.2ms
Speed: 2.5ms preprocess, 7.2ms inference, 1.5ms postprocess per image at shape (1, 3, 320, 640)

0: 320x640 3 persons, 7 cars, 1 truck, 7.4ms
Speed: 3.9ms preprocess, 7.4ms inference, 1.5ms postprocess per image at shape (1, 3, 320, 640)

0: 320x640 3 persons, 8 cars, 1 truck, 7.3ms
Speed: 2.5ms preprocess, 7.3ms inference, 1.5ms postprocess per image at shape (1, 3, 320, 640)

0: 320x640 2 persons, 1 bicycle, 8 cars, 1 truck, 8.6ms
Speed: 2.4ms preprocess, 8.6ms inference, 1.5ms postprocess per image at shape (1, 3, 320, 

Import statements:


*   import cv2
*   import pandas as pd
*   import numpy as np
*   from ultralytics import YOLO
*   import cvzone
*   import torch

These lines import necessary libraries: cv2 for computer vision tasks, pandas for data manipulation, numpy for numerical operations, ultralytics for using the YOLO model, cvzone for augmented reality features in OpenCV, and torch for utilizing PyTorch functionalities.

    model = YOLO('yolov8s.pt')


*   This line initializes the YOLO object detection model with the specified model file (yolov8s.pt). YOLO is a widely-used, real-time object detection system.


    two_lane_street = '/content/drive/MyDrive/Final_project/mcgill_drive.mp4'
    one_way_street = '/content/drive/MyDrive/Final_project/st_catherine.mp4'
This code is for all the 2 lane streets and one way street. However, the output will depend on the position of the car and the code is specifically for this position of car. I have strategically placed boxes in places where the parked car and moving cars will be detected from both of the sides. It uses the fact that parked car area is either all the way towards the left or all the way towards the right. And the distance between the parked car areas remain the same with the car in the video.


    with open(coco_name_path, "r") as file:
    class_list = file.read().split("\n")


*   This block of code reads the class names from coco.txt, splitting the file content into a list by new lines. These class names correspond to various objects that the YOLO model can detect.

**car_tracker = Tracker()
pedestrian_tracker = Tracker()**:


*   Initializes two Tracker objects (from the previously defined Tracker class) to track cars and pedestrians separately.

**left_lane_area = [(303,358), (257,368), (427,376), (440,366)]...**=


*   These lines define various polygonal areas of interest within the video frame, such as lanes and parking areas, using coordinates. These are used to track and categorize objects based on their location.

**left_lane_cars = set()....=**

* Creates empty sets to keep track of unique IDs of cars and pedestrians in each area of interest.



    count = 0

    while True:

    ret, frame = cap.read()
    if not ret:
        break
        

This loop iterates over each frame of the video. ret is a boolean indicating whether the frame was read correctly. If ret is False, the loop breaks, indicating the end of the video or an error.

    count += 1
    if count % 2 != 0:
        continue

This part increments a frame counter and skips every other frame (i.e., processes only even-numbered frames) to reduce computational load.


    frame = cv2.resize(frame, (1020, 500))

Resizes the current video frame to a specified size (1020x500 pixels) for consistent processing.

    results = model.predict(frame)
    if torch.cuda.is_available():
        detection_data = pd.DataFrame(results[0].boxes.cpu().data).astype("float")
    else:
        detection_data = pd.DataFrame(results[0].boxes.data).astype("float")

This section uses the YOLO model to detect objects in the resized frame and then transfers the detection results to a Pandas DataFrame for easier processing. The results include bounding boxes of detected objects.

    cars = []
    people = []

Initializes two lists to store bounding boxes of detected cars and people separately.

    for index, row in detection_data.iterrows():
        bbox = [int(row[i]) for i in range(4)]  # Extract bounding box
        class_id = int(row[5])
        class_name = class_list[class_id]

        if 'car' in class_name:
            cars.append(bbox)
        elif 'person' in class_name:
            people.append(bbox)

Iterates over each object detected by YOLO. The objects are classified as 'car' or 'person' based on their class ID, and their bounding boxes are added to the respective lists.

            
    cars_tracked = car_tracker.update(cars)
    people_tracked = pedestrian_tracker.update(people)

Updates the car and pedestrian trackers with the latest detected bounding boxes.

    for x, y, w, h, id in cars_tracked:
     cx, cy = (x + w) // 2, (y + h) // 2
        if cv2.pointPolygonTest(np.array(left_lane_area, np.int32), (cx, cy), False) > 0:
            cv2.circle(frame,(cx,cy),4,(0,0,255),-1)
            cv2.rectangle(frame,(x,y),(x,y),(255,255,255),2)
            cvzone.putTextRect(frame,f'{id}',(x,y),1,1)
            left_lane_cars.add(id)
        elif cv2.pointPolygonTest(np.array(parked_cars_left_area, np.int32), (cx, cy), False) > 0 or cv2.pointPolygonTest(np.array(parked_cars_left_area, np.int32), (cx, cy), False) > 0:
            cv2.circle(frame,(cx,cy),4,(0,0,255),-1)
            cv2.rectangle(frame,(x,y),(x,y),(255,255,255),2)
            cvzone.putTextRect(frame,f'{id}',(x,y),1,1)
            parked_cars.add(id)

    for x, y, w, h, id in people_tracked:
    cx, cy = (x + w) // 2, (y + h) // 2
        if cv2.pointPolygonTest(np.array(pedestrians_area_1, np.int32), (cx, cy), False) > 0 or cv2.pointPolygonTest(np.array(pedestrians_area_2, np.int32), (cx, cy), False) > 0:
            cv2.circle(frame,(cx,cy),4,(0,0,255),-1)
            cv2.rectangle(frame,(x,y),(w,h),(255,255,255),2)
            cvzone.putTextRect(frame,f'{id}',(x,y),1,1)
            pedestrians.add(id)

These blocks iterate over tracked cars and pedestrians. For each object, the code checks if it falls within any predefined area of interest and, if so, performs visualizations and updates the sets with the object's ID.

    cv2.polylines(frame, [np.array(left_lane_area, np.int32)], True, (0, 255, 0), 2)
    cv2.polylines(frame, [np.array(parked_cars_left_area, np.int32)], True, (255, 0, 0), 2)
    cv2.polylines(frame, [np.array(pedestrians_area_1, np.int32)], True, (0, 0, 255), 2)
    cv2.polylines(frame, [np.array(pedestrians_area_2, np.int32)], True, (0, 0, 255), 2)
    cv2.polylines(frame, [np.array(parked_cars_right_area, np.int32)], True, (0, 0, 255), 2)


Draws the predefined areas of interest as polygons on the frame for visualization.

    print("Number of moving cars passed:", len(left_lane_cars))
    print("Number of parked cars passed:", len(parked_cars))
    print("Number of pedestrians passed:", len(pedestrians))
After processing the entire video, these lines print the count of unique moving cars, parked cars, and pedestrians detected in the predefined areas.

**Report on Object Tracking Code Implementation**

This Jupyter notebook presents our Computer Vision final project, which focuses on object tracking within video frames. We were tasked with processing two video files, ‘st-catherines_drive.mp4’ and 'mcgill_drive.mp4' with the goal of identifying and tracking various objects such as cars and pedestrians. We used machine learning models, with an integration of an advanced object detection model, custom tracking algorithms, and a video processing loop. We also had to use the YOLO (You Only Look Once) model, specifically the 'yolov8s.pt' version, for the best efficiency and accuracy in real-time object detection. With this overall approach, we hoped to achieve a high-performing and responsive object detection system.

Before we began, we made several assumptions. Firstly, we assumed the videos provided have a certain level of video quality and specific conditions, such that our object detection and tracking algorithms are optimized for these videos specifically. These conditions could be video specific or environmental, including camera angle, lighting conditions, weather conditions, or traffic patterns. In terms of environment, we do assume a stable environment, as drastic changes could affect detection and tracking accuracy. We also assume that the YOLO model has been trained on a dataset e.g. COCO, which includes the object classes needed like cars and pedestrians. However, we note that the YOLO model has limitations that also impact the project's scope and effectiveness, particularly object detection accuracy under varying conditions. Additionally, the custom object tracker's performance is likely influenced by factors such as the size and speed of the objects, as well as potential occlusions.

We first implemented a custom Tracker class to handle the tracking of objects. This class uses an object tracking method to track objects by their center points. When a new object is detected, it is assigned a unique ID, and its position is updated across successive frames. If an object's center point is close to a previously detected object, i.e. within a threshold distance of 35 pixels, it's assumed to be the same object. 

Then to begin the video processing, our code initializes the YOLO model for object detection and defines specific areas of interest in the video frame, such as lanes and parking spots. These areas are critical for categorizing the tracked objects, i.e. cars and pedestrians. We then use a loop to process the videos. The loop reads the video frames, resizes them, then feeds them into the YOLO model for object detection. The objects are categorized based on their class IDs. These tracked objects are further analyzed to track the movement and changes of the detected objects over time and determine whether they fall within the predefined areas of interest. This step is crucial for distinguishing between, for example, moving cars, parked cars, and pedestrians in specific zones. Additionally, the code keeps track of unique IDs for each category of object in different sets, allowing for a count of distinct cars and pedestrians in each area of interest. Finally, to visualize the detection, we mark the tracked objects with circles and rectangles, along with the IDs of the objects. The final output prints the number of cars and pedestrians detected in each area. 

In terms of software and routines, the project uses many tools. As mentioned before, we use the Ultralytics YOLO package for object detection. We also mentioned a custom 'Tracker' class we  developed, responsible for assigning unique IDs to each detected object and updating their positions across video frames. The Python Math module is used for essential mathematical operations like hypot, which is used to calculate Euclidean distance between points. It's crucial for determining the proximity of tracked objects in our tracking algorithms. OpenCV and cvzone are leveraged for video processing and annotation tasks, including reading video frames, resizing images, drawing shapes (like rectangles and circles for visualization), and handling polygon operations for area-based tracking. Then we have the PyTorch library, a popular machine learning library, that is used for handling the operations related to the YOLO model, particularly when dealing with tensor computations and GPU acceleration (if available). It's critical for efficient processing of the object detection tasks. We also used Numpy for scientific computing. In this code, it's used mainly for handling numerical operations and data structures, especially for manipulating arrays like those representing the areas of interest. Lastly, we used Pandas for data manipulation to handle data returned by the YOLO model. It's particularly useful for managing structured data like bounding box coordinates and class IDs, which is used for organizing and analyzing the tracking data.

Overall, the program’s performance was sufficient. Firstly, the program proved to be efficient, as it uses the YOLO model, which is known for its fast object detection capabilities. Combined with the Python libraries we used, e.g. NumPy and Pandas, our program was able to process video frames quite quickly. The video processing took 1 mins 36 secs for mcgill_drive and 1 mins 54 secs for st_catherine. This is a good execution time considering that the videos are quite large, 61.5MB and 83.5MB respectively. However, the execution speed also depends on hardware capabilities. We chose to use GPU because yolov8 can be computationally intensive. Thus, a less powerful hardware may limit the program's usability or efficiency. 

In terms of accuracy, the program is also very accurate. The number of objects detected by the program did not vary much from the manually obtained ground truths. For the mcgill_drive.mp4 video, the maximum error rate was 20.5% for pedestrians. Otherwise, the program only had 4.3% to 8.3% error for moving cars and parked cars respectively. For the st-catherine_drive.mp4 video, there were only missing detected objects for number of parked cars, at 3.7%. The reasons for these discrepancies could be that the program misses objects that were in poorer quality lighting, e.g. the pedestrians in the mcgill_drive.mp4 video. In addition, the tracking algorithm also does not seem to account for occlusions, where an object might be temporarily blocked by another, which occurs in both videos. Furthermore, the use of a fixed distance threshold (35 pixels) for tracking objects might not be optimal in all scenarios. For instance, in cases where the camera angle or distance from the objects varies significantly, this threshold may lead to incorrect tracking. The program's effectiveness is also highly dependent on the predefined areas of interest. Any misalignment or incorrect definition of these areas could lead to inaccurate categorization of objects.

In summary, our implementation demonstrates a comprehensive approach to applying modern computer vision and machine learning techniques for real-time object detection and tracking in video streams. It is indicative of an application aimed towards areas like traffic monitoring, surveillance, or automated video analysis, using Python libraries and machine learning models to process and interpret video data effectively. It has specific strengths in real-time processing and area-based categorization. However, it faces limitations in terms of adaptability, robustness in varied conditions, and handling complex scenarios like occlusions. Future improvements could include adaptive thresholding for tracking, more dynamic area definitions, and expanded object class recognition to enhance its versatility and robustness.

### Comparison of Program Output vs. Manually Obtained Ground Truth
Note: we did not count cars that were stopped, e.g. at a red light, as parked cars and we only counted people who were walking/standing as pedestrians i.e. exlcuding people sitting on the floor, but including people who were even just standing. We also did not count trucks as cars. We defined "passing" an object as physically moving past them. 

mcgill_drive.mp4
|  | Program Output | Manual Detection | 
| --- | --- | --- | 
| Number of Parked Cars | 13 | 12 |
| Number of Moving Cars | 24 | 23 |
| Number of Pedestrians| 31 | 39 |

st-catherine_drive.mp4
|  | Program Output | Manual Detection | 
| --- | --- | --- | 
| Number of Parked Cars | 52 | 54 |
| Number of Moving Cars | 1 | 1 |
| Number of Pedestrians| 100 | 100 |
