# Final Project

Dania Pennimpede (260986441) and Samantha Handal (260983914)

## Goal


Write a python program that will analyze the two dashcam videos provided (mcgill_drive.mp4 and st-catherines_drive.mp4, each are 30 frames per second, and are taken with the same car/dashcam) and provide the following analytics:

* Number of parked cars passed
* Number of moving cars passed
* Number of pedestrians passed

## Our Approach

Our approach to analyzing the two dashcam videos involved a multi-step process aimed at extracting valuable insights captured in the footage. The key steps of our approach are outlined below:


**Step 1 - Object Detection and Tracking:**
* We used the YOLOv8 object detection model to detect the various cars and pedestrians in each frame of the videos.
*   The detected objects are then tracked across frames using the Deep SORT algorithm.
*   This enables us to accurately track the movement of each detected object throughout the video and not have the objects be counted multiple times.

**Step 2 - Speed Estimation:**
*   For each tracked object, we estimate its speed by analyzing its trajectory over time.
*   This is achieved by calculating the distance traveled by the object between consecutive frames and dividing it by the time elapsed between those frames.
* This provides us with a measure of the speed of each object as it moves through the scene.

**Step 3 - Classification:**
* Based on the estimated speeds of the tracked objects, we classify them into two categories: moving cars and pedestrians. Moving cars are identified based on their relatively high speeds, while pedestrians are identified based on their slower speeds.
* To identify parked cars, we analyze the movements of detected cars in the video. Parked cars are characterized by their stationary positions over an extended period. We identify instances where cars remain stationary for a significant duration, indicating that they are likely parked.




## Assumptions Made

* We assumed that the dashcam videos provided were captured under typical driving conditions with sufficient lighting and visibility.
* We assumed that the objects of interest (cars, pedestrians, etc.) would be adequately captured and distinguishable within the video frames.
* We assumed a constant frame rate of 30 frames per second for both input videos.
* We assumed no cars were reversing
* We assumed no major obstructions prevented accurate object detecting and tracking.


## Software Packages Used
1.   **Ultralytics YOLO:**
* As seen in previous assignments, YOLO is a real-time object detection system that processes images or video frames and detects objects within them.
* Our program utilizes YOLOv8 to detect cars and pedestrians in each frame of the dashcam videos.
* YOLOv8 speed estimation algorithm was used as inspiration for segregating between parked and moving cars: https://github.com/ultralytics/ultralytics/blob/main/ultralytics/solutions/speed_estimation.py

2.   **Deep SORT:**
* As seen in class, Deep SORT is an extension of the SORT algorithm that integrates appearance information into the tracking process.
* Our program utilizes Deep SORT to track the objects detected by YOLO across video frames.
* DeepSORT source code was adjusted to work with more recent versions of colab: https://github.com/nwojke/deep_sort/tree/master, dependencies imported above


## Results

**Manual Approximation**

*McGill:*

*   Number of Pedestrians: ~30
* Number of Moving Cars: ~25
* Number of Parked Cars: ~15

*Saint Catherine:*

*   Number of Pedestrians: ~90
* Number of Moving Cars: ~8
* Number of Parked Cars: ~60

**Detected By Program**

*McGill:*

*   Number of Pedestrians: 25
* Number of Moving Cars: 37
* Number of Parked Cars: 15

*Saint Catherine:*

*   Number of Pedestrians: 64
* Number of Moving Cars: 57
* Number of Parked Cars: 12



## Program Performance and Problems

**Performance**

Overall, the program performed reasonably well when it came to detecting moving cars, parked cars and pedestrians in the McGill video:

*   25 out of 30 pedestrians were detected -> 0 false positives, 5 missing pedestrians
*   37 out of 25 moving cars were detected -> 12 false positives, 0 missing moving cars
* 15 out of 15 parked cars were detected -> 0 false positives, 0 missing parked cars

In terms of the Saint Catherine video, our program performed well in terms of detecting pedestrians and cars, but struggled to differentiate between moving and parked cars:

* 64 out of 90 pedestrians were detected -> 0 false positives, 26 missing pedestrians
* 57 out of 8 moving cars were detected -> 49 false positives, 0 missing moving cars
* 12 out of 60 parked cars were detected -> 0 false positves, 48 missing parked cars



**Problems**

*Speed Estimation*

* The speed estimation technique used in our program relied solely on analyzing object trajectories, which might not accurately capture complex movements such as acceleration or deceleration. This lead to inaccuracies in speed measurements.

*Car Classification*

* Our program encountered challenges in differentiating between moving and parked cars, particularly in crowded or complex scenes. This led to a high number of false positives in the classification of moving cars, as well as a significant number of missing parked cars.


**Possible Improvements**

*   Improve our speed estimation method by implementing scaling techniques that could enhance the accuracy of speed measurements by considering factors like object acceleration and deceleration.
*   By analyzing the movements of detected cars in relation to static objects, our program could improve the classification of parked cars.
* Prepare for edge cases such as the car performing turns, switching lanes, or the lighting effects of the video.



## Code Developed

In [None]:
# Install the necessary dependencies
!pip install ultralytics==8.0.33 tensorflow==2.11.0 scikit-image==0.19.3 filterpy==1.4.5

In [None]:
!pip install scikit-learn==0.22.2.post1

In [None]:
# Ultralytics YOLO 🚀, AGPL-3.0 license

from collections import defaultdict
from time import time

import cv2
import numpy as np

class SpeedEstimator:
    """A class to estimation speed of objects in real-time video stream based on their tracks."""

    def __init__(self):
        """Initializes the speed-estimator class with default values for Visual, Image, track and speed parameters."""

        # Region information
        self.reg_pts = [(20, 400), (1260, 400)]

        # Predict/track information
        self.boxes = None
        self.trk_ids = None
        self.trk_pts = None
        self.trk_history = defaultdict(list)

        # Speed estimator information
        self.current_time = 0
        self.dist_data = {}
        self.trk_idslist = []
        self.spdl_dist_thresh = 10
        self.trk_previous_times = {}
        self.trk_previous_points = {}

    def set_args(
        self,
        reg_pts,
        spdl_dist_thresh=10
    ):
        """
        Configures the speed estimation and display parameters.

        Args:
            reg_pts (list): Initial list of points defining the speed calculation region.
            spdl_dist_thresh (int): Euclidean distance threshold for speed line
        """
        if reg_pts is None:
            print("Region points not provided, using default values")
        else:
            self.reg_pts = reg_pts
        self.spdl_dist_thresh = spdl_dist_thresh

    def store_track_info(self, track_id, box):
        """
        Store track data.

        Args:
            track_id (int): object track id.
            box (list): object bounding box data
        """
        track = self.trk_history[track_id]
        bbox_center = (float((box[0] + box[2]) / 2), float((box[1] + box[3]) / 2))
        track.append(bbox_center)

        if len(track) > 30:
            track.pop(0)

        self.trk_pts = np.hstack(track).astype(np.int32).reshape((-1, 1, 2))
        return track

    def calculate_speed(self, trk_id, track):
        """
        Calculation of object speed.

        Args:
            trk_id (int): object track id.
            track (list): tracking history for tracks path drawing
        """

        for point in self.reg_pts:

            if not point[0][0] < track[-1][0] < point[1][0]:
                continue
            if point[1][1] - self.spdl_dist_thresh < track[-1][1] < point[1][1] + self.spdl_dist_thresh:
                direction = "known"

            elif point[0][1] - self.spdl_dist_thresh < track[-1][1] < point[0][1] + self.spdl_dist_thresh:
                direction = "known"

            else:
                direction = "unknown"

            if self.trk_previous_times[trk_id] != 0 and direction != "unknown" and trk_id not in self.trk_idslist:
                self.trk_idslist.append(trk_id)

                time_difference = time() - self.trk_previous_times[trk_id]
                if time_difference > 0:
                    dist_difference = np.abs(track[-1][1] - self.trk_previous_points[trk_id][1])
                    speed = dist_difference / time_difference
                    self.dist_data[trk_id] = speed

                    self.trk_previous_times[trk_id] = time()
                    self.trk_previous_points[trk_id] = track[-1]
                    return speed

        self.trk_previous_times[trk_id] = time()
        self.trk_previous_points[trk_id] = track[-1]
        return 0

    def estimate_speed(self, track):
        """
        Calculate object based on tracking data.

        Args:
            track: track obtained from the object tracking process.
        """
        self.box = track.bbox
        self.trk_id = track.track_id

        track_new = self.store_track_info(self.trk_id, self.box)

        if self.trk_id not in self.trk_previous_times:
            self.trk_previous_times[self.trk_id] = 0

        return_speed = self.calculate_speed(self.trk_id, track_new)

        return return_speed

if __name__ == "__main__":
    SpeedEstimator()

In [None]:
import os
import random
import cv2
import numpy as np
from ultralytics import YOLO

from google.colab import drive
drive.mount('/content/drive', force_remount=True)

from drive.MyDrive.ECSE415.Final_Project.deep_sort.deep_sort.tracker import Tracker as DeepSortTracker
from drive.MyDrive.ECSE415.Final_Project.deep_sort.deep_sort.track import Track as DeepSortTrack
from drive.MyDrive.ECSE415.Final_Project.deep_sort.tools import generate_detections as gdet
from drive.MyDrive.ECSE415.Final_Project.deep_sort.deep_sort import nn_matching
from drive.MyDrive.ECSE415.Final_Project.deep_sort.deep_sort.detection import Detection

# Helper to pass the class_id along
class EnhancedDetection(Detection):
    def __init__(self, bbox, score, feature, class_id):
        super().__init__(bbox, score, feature)
        self.class_id = class_id

class EnhancedTrack(DeepSortTrack):
    def __init__(self, mean, covariance, track_id, n_init, max_age, feature, class_id):
        super().__init__(mean, covariance, track_id, n_init, max_age, feature)
        self.class_id = class_id              # Store object type

class EnhancedTracker(DeepSortTracker):
    def _initiate_track(self, detection):
        mean, covariance = self.kf.initiate(detection.to_xyah())
        self.tracks.append(EnhancedTrack(
            mean, covariance, self._next_id, self.n_init, self.max_age,
            detection.feature, detection.class_id))
        self._next_id += 1

# Assuming class names corresponding to the model's outputs
class_names = {0: 'person', 1: 'bicycle', 2: 'car', 3: 'motorcycle', 4: 'airplane', 5: 'bus', 6: 'train', 7: 'truck', 8: 'boat', 9: 'traffic light', 10: 'fire hydrant', 11: 'stop sign', 12: 'parking meter', 13: 'bench', 14: 'bird', 15: 'cat', 16: 'dog', 17: 'horse', 18: 'sheep', 19: 'cow', 20: 'elephant', 21: 'bear', 22: 'zebra', 23: 'giraffe', 24: 'backpack', 25: 'umbrella', 26: 'handbag', 27: 'tie', 28: 'suitcase', 29: 'frisbee', 30: 'skis', 31: 'snowboard', 32: 'sports ball', 33: 'kite', 34: 'baseball bat', 35: 'baseball glove', 36: 'skateboard', 37: 'surfboard', 38: 'tennis racket', 39: 'bottle', 40: 'wine glass', 41: 'cup', 42: 'fork', 43: 'knife', 44: 'spoon', 45: 'bowl', 46: 'banana', 47: 'apple', 48: 'sandwich', 49: 'orange', 50: 'broccoli', 51: 'carrot', 52: 'hot dog', 53: 'pizza', 54: 'donut', 55: 'cake', 56: 'chair', 57: 'couch', 58: 'potted plant', 59: 'bed', 60: 'dining table', 61: 'toilet', 62: 'tv', 63: 'laptop', 64: 'mouse', 65: 'remote', 66: 'keyboard', 67: 'cell phone', 68: 'microwave', 69: 'oven', 70: 'toaster', 71: 'sink', 72: 'refrigerator', 73: 'book', 74: 'clock', 75: 'vase', 76: 'scissors', 77: 'teddy bear', 78: 'hair drier', 79: 'toothbrush'}

class Tracker:
    tracker = None
    encoder = None
    tracks = None

    def __init__(self):

        max_cosine_distance = 0.4
        nn_budget = None

        encoder_model_filename = '/content/drive/MyDrive/ECSE415/Final_Project/mars-small128.pb'

        metric = nn_matching.NearestNeighborDistanceMetric("cosine", max_cosine_distance, nn_budget)
        self.tracker = EnhancedTracker(metric)
        self.encoder = gdet.create_box_encoder(encoder_model_filename, batch_size=1)

    def update(self, frame, detections):

        if len(detections) == 0:
            self.tracker.predict()
            self.tracker.update([])
            self.update_tracks()
            return

        bboxes = np.asarray([d[:4] for d in detections])    # Extract bbox coordinates
        bboxes[:, 2:] = bboxes[:, 2:] - bboxes[:, 0:2]      # Convert coordinates to [x, y, w, h]
        scores = [d[4] for d in detections]
        class_ids = [d[5] for d in detections]              # Extract class IDs

        # Extract features using pre-trained encoder
        features = self.encoder(frame, bboxes)

        # Prepare detections for the tracker
        dets = []
        for bbox_id, bbox in enumerate(bboxes):
            det = EnhancedDetection(bbox, scores[bbox_id], features[bbox_id], class_ids[bbox_id])
            dets.append(det)

        # Update the tracker with the detections
        self.tracker.predict()
        self.tracker.update(dets)

        tracks = []
        for track in self.tracker.tracks:
            if not track.is_confirmed() or track.time_since_update > 1:
                continue
            bbox = track.to_tlbr()

            id = track.track_id

            obj_type = class_names[track.class_id]

            tracks.append(Track(id, bbox, obj_type))

        self.tracks = tracks


class Track:
    track_id = None
    bbox = None
    obj_type = None

    def __init__(self, id, bbox, obj_type):
        self.track_id = id
        self.bbox = bbox
        self.obj_type = obj_type


def calculate_filtered_mean(speeds, threshold=5):
    if not speeds:
        return 0
    mean_speed = np.mean(speeds)
    std_dev = np.std(speeds)
    filtered_speeds = [speed for speed in speeds if (mean_speed - threshold * std_dev <= speed <= mean_speed + threshold * std_dev)]
    return np.mean(filtered_speeds) if filtered_speeds else 0

# Modified function to process each video and count total people and cars
def process_video(video_path, video_out_path):

    cap = cv2.VideoCapture(video_path)
    ret, frame = cap.read()

    # Get the frames per second of the video
    fps = cap.get(cv2.CAP_PROP_FPS)

    cap_out = cv2.VideoWriter(video_out_path, cv2.VideoWriter_fourcc(*'MP4V'), cap.get(cv2.CAP_PROP_FPS),
                              (frame.shape[1], frame.shape[0]))

    model = YOLO("yolov8n.pt")

    w = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
    reg_pts = [[(0, 1000),(w, 1000)], [(0, 1050),(w, 1050)], [(0, 1135),(w, 1135)], [(0, 1220),(w, 1220)], [(0, 1370),(w, 1370)]]
    speed_obj = SpeedEstimator()
    speed_obj.set_args(reg_pts=reg_pts, spdl_dist_thresh=25)

    tracker = Tracker()

    colors = [(random.randint(0, 255), random.randint(0, 255), random.randint(0, 255)) for j in range(10)]

    detection_threshold = 0.5

    # Initialize the counter dictionary
    track_counts = {}

    # Dictionary to store object types
    obj_types = {}

    speed_history = defaultdict(list)

    prev_frame = None

    while ret:

        results = model(frame)

        for result in results:
            detections = []
            for r in result.boxes.data.tolist():
                x1, y1, x2, y2, score, class_id = r
                if score > detection_threshold:
                    detections.append([int(x1), int(y1), int(x2), int(y2), score, int(class_id)])

            tracker.update(frame, detections)

            for track in tracker.tracks:
                bbox = track.bbox
                x1, y1, x2, y2 = bbox
                track_id = track.track_id
                obj_type = track.obj_type  # Retrieve object type for this track

                # Update the count for this track ID
                track_counts[track_id] = track_counts.get(track_id, 0) + 1

                # Store object type based on track ID
                obj_types[track_id] = obj_type

                frame, speed = speed_obj.estimate_speed(frame, track)

                if speed != 0:
                    speed_history[track_id].append(speed)

                if speed_history[track_id]:  # Check if there's any speed recorded
                    newest_speed = speed_history[track_id][-1]  # Access the last speed recorded
                else:
                    newest_speed = 0  # Default speed if no speed is recorded

                # Loop through each tuple pair in reg_pts and draw each line
                for pt1, pt2 in reg_pts:
                    color = (0, 255, 0)  # Green color
                    thickness = 2  # Line thickness
                    cv2.line(frame, pt1, pt2, color, thickness)

                cv2.rectangle(frame, (int(x1), int(y1)), (int(x2), int(y2)), (colors[track_id % len(colors)]), 3)
                cv2.putText(frame, f"ID: {track_id} - Type: {obj_type} - Speed: {newest_speed:.2f} pixels/second", (int(x1), int(y1) - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.9, (255, 255, 255), 2)

        cap_out.write(frame)
        prev_frame = frame.copy()
        ret, frame = cap.read()

    cap.release()
    cap_out.release()
    cv2.destroyAllWindows()

    # Initialize counters
    num_people = 0
    num_parked_cars = 0
    num_moving_cars = 0

    # Iterate over the tracks
    for track_id, count in track_counts.items():
        obj_type = obj_types[track_id]
        if len(speed_history[track_id]) > 0:  # Ensure there is speed data
            avg_speed = calculate_filtered_mean(speed_history[track_id], 5)  # Adjust threshold as needed
        else:
            avg_speed = 0  # Default to 0 if no speed data

        # Check object type and count accordingly
        if obj_type == 'person':
            num_people += 1
        elif obj_type == 'car' and count > 10:
            if 3 <= avg_speed <= 7:
                num_parked_cars += 1
            else:
                num_moving_cars += 1

        print(f"Track ID: {track_id} - Count: {count} - Type: {obj_types[track_id]} - Avg Speed: {avg_speed:.2f} pixels/second")

    return num_people, num_parked_cars, num_moving_cars

Mounted at /content/drive


In [None]:
num_people, num_parked_cars, num_moving_cars = process_video('/content/drive/MyDrive/ECSE415/Final_Project/mcgill_drive.mp4', '/content/drive/MyDrive/ECSE415/Final_Project/mcgill_drive_out.mp4')

# Print summary counts
print(f"Total number of people: {num_people}")
print(f"Total number of parked cars: {num_parked_cars}")
print(f"Total number of moving cars: {num_moving_cars}")

Ultralytics YOLOv8.0.33 🚀 Python-3.10.12 torch-2.2.1+cu121 CUDA:0 (Tesla T4, 15102MiB)
YOLOv8n summary (fused): 168 layers, 3151904 parameters, 0 gradients, 8.7 GFLOPs

0: 384x640 6 cars, 7.7ms
Speed: 0.4ms pre-process, 7.7ms inference, 1.6ms postprocess per image at shape (1, 3, 640, 640)

0: 384x640 6 cars, 8.7ms
Speed: 0.5ms pre-process, 8.7ms inference, 1.8ms postprocess per image at shape (1, 3, 640, 640)

0: 384x640 1 person, 6 cars, 8.7ms
Speed: 0.6ms pre-process, 8.7ms inference, 1.8ms postprocess per image at shape (1, 3, 640, 640)

0: 384x640 6 cars, 8.2ms
Speed: 0.5ms pre-process, 8.2ms inference, 1.7ms postprocess per image at shape (1, 3, 640, 640)

0: 384x640 6 cars, 9.0ms
Speed: 0.7ms pre-process, 9.0ms inference, 2.4ms postprocess per image at shape (1, 3, 640, 640)

0: 384x640 1 person, 6 cars, 10.4ms
Speed: 0.7ms pre-process, 10.4ms inference, 2.1ms postprocess per image at shape (1, 3, 640, 640)

0: 384x640 6 cars, 10.6ms
Speed: 0.6ms pre-process, 10.6ms inference, 2

Track ID: 1 - Count: 61 - Type: car - Avg Speed: 0.75 pixels/second
Track ID: 2 - Count: 62 - Type: car - Avg Speed: 6.42 pixels/second
Track ID: 3 - Count: 987 - Type: car - Avg Speed: 3.81 pixels/second
Track ID: 4 - Count: 95 - Type: car - Avg Speed: 2.22 pixels/second
Track ID: 5 - Count: 68 - Type: car - Avg Speed: 16.03 pixels/second
Track ID: 8 - Count: 2 - Type: car - Avg Speed: 0.82 pixels/second
Track ID: 9 - Count: 76 - Type: car - Avg Speed: 0.54 pixels/second
Track ID: 11 - Count: 89 - Type: car - Avg Speed: 10.98 pixels/second
Track ID: 13 - Count: 2 - Type: car - Avg Speed: 25.50 pixels/second
Track ID: 17 - Count: 16 - Type: car - Avg Speed: 0.00 pixels/second
Track ID: 18 - Count: 10 - Type: person - Avg Speed: 1.03 pixels/second
Track ID: 20 - Count: 84 - Type: car - Avg Speed: 0.28 pixels/second
Track ID: 22 - Count: 42 - Type: car - Avg Speed: 9.43 pixels/second
Track ID: 37 - Count: 69 - Type: car - Avg Speed: 10.10 pixels/second
Track ID: 38 - Count: 8 - Type: tra

In [None]:
num_people, num_parked_cars, num_moving_cars = process_video('/content/drive/MyDrive/ECSE415/Final_Project/st-catherines_drive.mp4', '/content/drive/MyDrive/ECSE415/Final_Project/st-catherines_drive_out.mp4')

# Print summary counts
print(f"Total number of people: {num_people}")
print(f"Total number of parked cars: {num_parked_cars}")
print(f"Total number of moving cars: {num_moving_cars}")

Ultralytics YOLOv8.0.33 🚀 Python-3.10.12 torch-2.2.1+cu121 CUDA:0 (Tesla T4, 15102MiB)
YOLOv8n summary (fused): 168 layers, 3151904 parameters, 0 gradients, 8.7 GFLOPs

0: 384x640 2 persons, 7 cars, 1 bus, 1 truck, 8.1ms
Speed: 0.4ms pre-process, 8.1ms inference, 1.7ms postprocess per image at shape (1, 3, 640, 640)

0: 384x640 2 persons, 8 cars, 1 bus, 1 truck, 8.0ms
Speed: 0.7ms pre-process, 8.0ms inference, 1.7ms postprocess per image at shape (1, 3, 640, 640)

0: 384x640 2 persons, 7 cars, 1 bus, 1 truck, 11.6ms
Speed: 0.6ms pre-process, 11.6ms inference, 2.2ms postprocess per image at shape (1, 3, 640, 640)

0: 384x640 2 persons, 6 cars, 1 bus, 1 truck, 8.6ms
Speed: 0.5ms pre-process, 8.6ms inference, 1.6ms postprocess per image at shape (1, 3, 640, 640)

0: 384x640 1 person, 6 cars, 1 bus, 1 truck, 1 traffic light, 8.4ms
Speed: 0.6ms pre-process, 8.4ms inference, 1.6ms postprocess per image at shape (1, 3, 640, 640)

0: 384x640 2 persons, 7 cars, 1 bus, 1 truck, 10.6ms
Speed: 0.6

Track ID: 1 - Count: 39 - Type: car - Avg Speed: 10.50 pixels/second
Track ID: 2 - Count: 326 - Type: car - Avg Speed: 0.00 pixels/second
Track ID: 3 - Count: 59 - Type: car - Avg Speed: 14.81 pixels/second
Track ID: 4 - Count: 82 - Type: car - Avg Speed: 9.32 pixels/second
Track ID: 5 - Count: 232 - Type: truck - Avg Speed: 0.00 pixels/second
Track ID: 6 - Count: 158 - Type: car - Avg Speed: 10.07 pixels/second
Track ID: 10 - Count: 2 - Type: person - Avg Speed: 0.00 pixels/second
Track ID: 16 - Count: 101 - Type: car - Avg Speed: 17.74 pixels/second
Track ID: 23 - Count: 17 - Type: bicycle - Avg Speed: 79.14 pixels/second
Track ID: 25 - Count: 20 - Type: bicycle - Avg Speed: 4.49 pixels/second
Track ID: 30 - Count: 145 - Type: car - Avg Speed: 8.44 pixels/second
Track ID: 31 - Count: 3 - Type: person - Avg Speed: 0.00 pixels/second
Track ID: 39 - Count: 85 - Type: car - Avg Speed: 6.87 pixels/second
Track ID: 45 - Count: 4 - Type: car - Avg Speed: 0.00 pixels/second
Track ID: 53 - Co