<div align="center">

# Object Detection, Tracking, and Counting with YOLOv8 and SORT Using OOP Approach

</div>

Welcome to this Google Colab notebook, which demonstrates how to perform object detection, tracking, and counting on both video files and static images using the YOLOv8 object detection algorithm and the SORT (Simple Online and Realtime Tracking) algorithm for object tracking, all implemented using an Object-Oriented Programming (OOP) approach.

In this notebook, we will be using the YOLOv8 algorithm to predict the class probabilities and bounding boxes of multiple objects in an image or video frame. We will also be using the SORT algorithm to track the objects detected by YOLOv8 across multiple frames of a video or a sequence of images. SORT is a simple but effective algorithm that is designed to work in real-time and to handle noisy detections and occlusions.

By using an OOP approach, we can organize our code into classes and methods that encapsulate the functionality of different parts of the object detection, tracking, and counting pipeline. 

## Importing libraries, modules and files

### importing files from my github repo

In [1]:
!git clone https://github.com/mohamedamine99/Object-tracking-and-counting-using-YOLOV8

fatal: destination path 'Object-tracking-and-counting-using-YOLOV8' already exists and is not an empty directory.


### importing modules

In [2]:
import shutil
import os

src_path = r"C:\Users\vinil\Downloads\Object-tracking-and-counting-using-YOLOV8-main\sort.py"
dest_path = r"C:\Users\vinil\Downloads\Object-tracking-and-counting-using-YOLOV8-main\backup_content"

# copy file from source to destination
shutil.copy(src_path, dest_path)

src_path = r"C:\Users\vinil\Downloads\Object-tracking-and-counting-using-YOLOV8-main\requirements.txt"
shutil.copy(src_path, dest_path)


'C:\\Users\\vinil\\Downloads\\Object-tracking-and-counting-using-YOLOV8-main\\backup_content'

In [3]:
!pip install -r "C:\Users\vinil\Downloads\Object-tracking-and-counting-using-YOLOV8-main\requirements.txt"

Collecting lap
  Using cached lap-0.4.0.tar.gz (1.5 MB)
  Preparing metadata (setup.py): started
  Preparing metadata (setup.py): finished with status 'done'
Building wheels for collected packages: lap
  Building wheel for lap (setup.py): started
  Building wheel for lap (setup.py): finished with status 'error'
  Running setup.py clean for lap
Failed to build lap
Installing collected packages: lap
  Running setup.py install for lap: started
  Running setup.py install for lap: finished with status 'error'


  error: subprocess-exited-with-error
  
  python setup.py bdist_wheel did not run successfully.
  exit code: 1
  
  [39 lines of output]
  Partial import of lap during the build process.
  
    `numpy.distutils` is deprecated since NumPy 1.23.0, as a result
    of the deprecation of `distutils` itself. It will be removed for
    Python >= 3.12. For older Python versions it will remain present.
    It is recommended to use `setuptools < 60.0` for those Python versions.
    For more details, see:
      https://numpy.org/devdocs/reference/distutils_status_migration.html
  
  
    from numpy.distutils.core import setup
  Generating cython files
  running bdist_wheel
  running build
  running config_cc
  INFO: unifing config_cc, config, build_clib, build_ext, build commands --compiler options
  running config_fc
  INFO: unifing config_fc, config, build_clib, build_ext, build commands --fcompiler options
  running build_src
  INFO: build_src
  INFO: building extension "lap._lapjv" sources
 

In [5]:
# Install the required packages
!pip install opencv-python
!pip install filterpy

import sys
import os
import numpy as np
import random
import cv2
import imageio
import time
import matplotlib.pyplot as plt

# Ensure the sort.py file is in the working directory
sort_file_path = r"C:\Users\vinil\Downloads\Object-tracking-and-counting-using-YOLOV8-main\sort.py"
if not os.path.isfile(sort_file_path):
    raise FileNotFoundError(f"sort.py file not found at {sort_file_path}")

# Add the directory containing sort.py to the system path
sys.path.append(os.path.dirname(sort_file_path))

# Import sort module
import sort





In [6]:
!pip install --user ultralytics

%pip install ultralytics
import ultralytics
ultralytics.checks()
from ultralytics import YOLO

Ultralytics YOLOv8.2.31  Python-3.10.9 torch-2.0.1+cpu CPU (AMD Ryzen 7 4800H with Radeon Graphics)
Setup complete  (16 CPUs, 15.4 GB RAM, 269.9/475.3 GB disk)


## Class definition for `YOLOv8_ObjectDetector` and `YOLOv8_ObjectCounter`

In [7]:
from sort import Sort

class YOLOv8_ObjectDetector:
    """
    A class for performing object detection on images and videos using YOLOv8.

    Args:
    ------------
        model_file (str): Path to the YOLOv8 model file or yolo model variant name in ths format: [variant].pt
        labels (list[str], optional): A list of class labels for the model. If None, uses the default labels from the model file.
        classes (list[str], optional): Alias for labels. Deprecated.
        conf (float, optional): Minimum confidence threshold for object detection.
        iou (float, optional): Minimum IOU threshold for non-max suppression.

    Attributes:
    --------------
        classes (list[str]): A list of class labels for the model ( a Dict is also acceptable).
        conf (float): Minimum confidence threshold for object detection.
        iou (float): Minimum IOU threshold for non-max suppression.
        model (YOLO): The YOLOv8 model used for object detection.
        model_name (str): The name of the YOLOv8 model file (without the .pt extension).

    Methods :
    -------------
        default_display: Returns a default display (ultralytics plot implementation) of the object detection results.
        custom_display: Returns a custom display of the object detection results.
        predict_video: Predicts objects in a video and saves the results to a file.
        predict_img: Predicts objects in an image and returns the detection results.

    """

    def __init__(self, model_file = 'yolov8n.pt', labels= None, classes = None, conf = 0.25, iou = 0.45 ):

        self.classes = classes
        self.conf = conf
        self.iou = iou

        self.model = YOLO(model_file)
        self.model_name = model_file.split('.')[0]
        self.results = None

        if labels == None:
            self.labels = self.model.names

    def predict_img(self, img, verbose=True):
        """
        Runs object detection on a single image.

        Parameters
        ----------
            img (numpy.ndarray): Input image to perform object detection on.
            verbose (bool): Whether to print detection details.

        Returns:
        -----------
            'ultralytics.yolo.engine.results.Results': A YOLO results object that contains 
             details about detection results :
                    - Class IDs
                    - Bounding Boxes
                    - Confidence score
                    ...
        (pls refer to https://docs.ultralytics.com/reference/results/#results-api-reference for results API reference)

        """

        # Run the model on the input image with the given parameters
        results = self.model(img, classes=self.classes, conf=self.conf, iou=self.iou, verbose=verbose)

        # Save the original image and the results for further analysis if needed
        self.orig_img = img
        self.results = results[0]

        # Return the detection results
        return results[0]



    def default_display(self, show_conf=True, line_width=None, font_size=None, 
                        font='Arial.ttf', pil=False, example='abc'):
        """
        Displays the detected objects on the original input image.

        Parameters
        ----------
        show_conf : bool, optional
            Whether to show the confidence score of each detected object, by default True.
        line_width : int, optional
            The thickness of the bounding box line in pixels, by default None.
        font_size : int, optional
            The font size of the text label for each detected object, by default None.
        font : str, optional
            The font type of the text label for each detected object, by default 'Arial.ttf'.
        pil : bool, optional
            Whether to return a PIL Image object, by default False.
        example : str, optional
            A string to display on the example bounding box, by default 'abc'.

        Returns
        -------
        numpy.ndarray or PIL Image
            The original input image with the detected objects displayed as bounding boxes.
            If `pil=True`, a PIL Image object is returned instead.

        Raises
        ------
        ValueError
            If the input image has not been detected by calling the `predict_img()` method first.
        """
        # Check if the `predict_img()` method has been called before displaying the detected objects
        if self.results is None:
            raise ValueError('No detected objects to display. Call predict_img() method first.')
        
        # Call the plot() method of the `self.results` object to display the detected objects on the original image
        display_img = self.results.plot(show_conf, line_width, font_size, font, pil, example)
        
        # Return the displayed image
        return display_img

        

    def custom_display(self, colors, show_cls = True, show_conf = True):
        """
        Custom display method that draws bounding boxes and labels on the original image, 
        with additional options for showing class and confidence information.

        Parameters:
        -----------
        colors : list
            A list of tuples specifying the color of each class.
        show_cls : bool, optional
            Whether to show class information in the label text. Default is True.
        show_conf : bool, optional
            Whether to show confidence information in the label text. Default is True.

        Returns:
        --------
        numpy.ndarray
            The image with bounding boxes and labels drawn on it.
        """

        img = self.orig_img
        # calculate the bounding box thickness based on the image width and height
        bbx_thickness = (img.shape[0] + img.shape[1]) // 450

        for box in self.results.boxes:
            textString = ""

            # Extract object class and confidence score
            score = box.conf.item() * 100
            class_id = int(box.cls.item())

            x1 , y1 , x2, y2 = np.squeeze(box.xyxy.numpy()).astype(int)

            # Print detection info
            if show_cls:
                textString += f"{self.labels[class_id]}"

            if show_conf:
                textString += f" {score:,.2f}%"

            # Calculate font scale based on object size
            font = cv2.FONT_HERSHEY_COMPLEX
            fontScale = (((x2 - x1) / img.shape[0]) + ((y2 - y1) / img.shape[1])) / 2 * 2.5
            fontThickness = 1
            textSize, baseline = cv2.getTextSize(textString, font, fontScale, fontThickness)

            # Draw bounding box, a centroid and label on the image
            img = cv2.rectangle(img, (x1,y1), (x2,y2), colors[class_id], bbx_thickness)
            center_coordinates = ((x1 + x2)//2, (y1 + y2) // 2)

            img =  cv2.circle(img, center_coordinates, 5 , (0,0,255), -1)
            
             # If there are no details to show on the image
            if textString != "":
                if (y1 < textSize[1]):
                    y1 = y1 + textSize[1]
                else:
                    y1 -= 2
                # show the details text in a filled rectangle
                img = cv2.rectangle(img, (x1, y1), (x1 + textSize[0] , y1 -  textSize[1]), colors[class_id], cv2.FILLED)
                img = cv2.putText(img, textString , 
                    (x1, y1), font, 
                    fontScale,  (0, 0, 0), fontThickness, cv2.LINE_AA)

        return img


    def predict_video(self, video_path, save_dir, save_format="avi", display='custom', verbose=True, **display_args):
        """Runs object detection on each frame of a video and saves the output to a new video file.

        Args:
        ----------
            video_path (str): The path to the input video file.
            save_dir (str): The path to the directory where the output video file will be saved.
            save_format (str, optional): The format for the output video file. Defaults to "avi".
            display (str, optional): The type of display for the detection results. Defaults to 'custom'.
            verbose (bool, optional): Whether to print information about the video file and output file. Defaults to True.
            **display_args: Additional arguments to be passed to the display function.

        Returns:
        ------------
            None
        """
        # Open the input video file
        cap = cv2.VideoCapture(video_path)

        # Get the name of the input video file
        vid_name = os.path.basename(video_path)

        # Get the dimensions of each frame in the input video file
        width = int(cap.get(3))  # get `width`
        height = int(cap.get(4))  # get `height`

        # Create the directory for the output video file if it does not already exist
        if not os.path.isdir(save_dir):
            os.makedirs(save_dir)

        # Set the name and path for the output video file
        save_name = self.model_name + ' -- ' + vid_name.split('.')[0] + '.' + save_format
        save_file = os.path.join(save_dir, save_name)

        # Print information about the input and output video files if verbose is True
        if verbose:
            print("----------------------------")
            print(f"DETECTING OBJECTS IN : {vid_name} : ")
            print(f"RESOLUTION : {width}x{height}")
            print('SAVING TO :' + save_file)

        # Define an output VideoWriter object
        out = cv2.VideoWriter(save_file,
                              cv2.VideoWriter_fourcc(*"MJPG"),
                              30, (width, height))

        # Check if the input video file was opened correctly
        if not cap.isOpened():
            print("Error opening video stream or file")

        # Read each frame of the input video file
        while cap.isOpened():
            ret, frame = cap.read()

            # If the frame was not read successfully, break the loop
            if not ret:
                print("Error reading frame")
                break

            # Run object detection on the frame and calculate FPS
            beg = time.time()
            results = self.predict_img(frame, verbose=False)
            if results is None:
                print('***********************************************')
            fps = 1 / (time.time() - beg)

            # Display the detection results
            if display == 'default':
                frame = self.default_display(**display_args)
            elif display == 'custom':
                frame == self.custom_display(**display_args)

            # Display the FPS on the frame
            frame = cv2.putText(frame, f"FPS : {fps:,.2f}",
                                (5, 15), cv2.FONT_HERSHEY_COMPLEX,
                                0.5, (0, 0, 255), 1, cv2.LINE_AA)

            # Write the frame to the output video file
            out.write(frame)

            # Exit the loop if the 'q' button is pressed
            if cv2.waitKey(1) & 0xFF == ord('q'):
                break

        # After the loop release the cap and video writer
        cap.release()
        out.release()

    


In [8]:
class YOLOv8_ObjectCounter(YOLOv8_ObjectDetector):
    """
    A class for counting objects in images or videos using the YOLOv8 Object Detection model.

    Attributes:
    -----------
    model_file : str
        The filename of the YOLOv8 object detection model.
    labels : list or None
        The list of labels for the object detection model. If None, the labels will be loaded from the model file.
    classes : list or None
        The list of classes to detect. If None, all classes will be detected.
    conf : float
        The confidence threshold for object detection.
    iou : float
        The Intersection over Union (IoU) threshold for object detection.
    track_max_age : int
        The maximum age (in frames) of a track before it is deleted.
    track_min_hits : int
        The minimum number of hits required for a track to be considered valid.
    track_iou_threshold : float
        The IoU threshold for matching detections to existing tracks.

    Methods:
    --------
    predict_img(img, verbose=True)
        Predicts objects in a single image and counts them.
    predict_video(video_path, save_dir, save_format="avi", display='custom', verbose=True, **display_args)
        Predicts objects in a video and counts them.

    """

    def __init__(self, model_file = 'yolov8n.pt', labels= None, classes = None, conf = 0.25, iou = 0.45, 
                 track_max_age = 45, track_min_hits= 15, track_iou_threshold = 0.3 ):

        super().__init__(model_file , labels, classes, conf, iou)

        self.track_max_age = track_max_age
        self.track_min_hits = track_min_hits
        self.track_iou_threshold = track_iou_threshold


        

    def predict_video(self, video_path, save_dir, save_format = "avi", 
                      display = 'custom', verbose = True, **display_args):
        
        """
    Runs object detection on a video file and saves the output as a new video file.
    
    Args:
        video_path (str): Path to the input video file.
        save_dir (str): Path to the directory where the output video file will be saved.
        save_format (str, optional): Format of the output video file. Defaults to "avi".
        display (str, optional): Type of display to use for object detection results. Options are "default" or "custom". 
                                Defaults to "custom".
        verbose (bool, optional): If True, prints information about the input and output video files. Defaults to True.
        **display_args (dict, optional): Additional arguments to pass to the display function. 

    Returns:
        None
        """
        cap = cv2.VideoCapture(video_path)
        # Get video name 
        vid_name = os.path.basename(video_path)


        # Get frame dimensions and print information about input video file
        width  = int(cap.get(3) )  # get `width` 
        height = int(cap.get(4) )  # get `height` 

        if not os.path.isdir(save_dir):
            os.makedirs(save_dir)

        save_name = self.model_name + ' -- ' + vid_name.split('.')[0] + '.' + save_format
        save_file = os.path.join(save_dir, save_name)

        if verbose:
            print("----------------------------")
            print(f"DETECTING OBJECTS IN : {vid_name} : ")
            print(f"RESOLUTION : {width}x{height}")
            print('SAVING TO :' + save_file)

        # define an output VideoWriter  object
        out = cv2.VideoWriter(save_file,
                            cv2.VideoWriter_fourcc(*"MJPG"),
                            30,(width,height))

        # Check if the video is opened correctly
        if not cap.isOpened():
            print("Error opening video stream or file")

        # Initialize object tracker
        tracker = sort.Sort(max_age = self.track_max_age, min_hits= self.track_min_hits , 
                            iou_threshold = self.track_iou_threshold)
        
        # Initialize variables for object counting
        totalCount = []
        currentArray = np.empty((0, 5))


        # Read the video frames
        while cap.isOpened():

            detections = np.empty((0, 5))
            ret, frame = cap.read()

            # If the frame was not read successfully, break the loop
            if not ret:
                print("Error reading frame")
                break

            # Run object detection on the frame and calculate FPS
            beg = time.time()
            results = self.predict_img(frame, verbose = False)
            if results == None:
                print('***********************************************')
            fps = 1 / (time.time() - beg)
            for box in results.boxes:
                score = box.conf.item() * 100
                class_id = int(box.cls.item())

                x1 , y1 , x2, y2 = np.squeeze(box.xyxy.numpy()).astype(int)

                currentArray = np.array([x1, y1, x2, y2, score])
                detections = np.vstack((detections, currentArray))

            # Update object tracker 
            resultsTracker = tracker.update(detections)
            for result in resultsTracker:
                #print(type(result))

                # Get the tracker results
                x1, y1, x2, y2, id = result
                x1, y1, x2, y2, id = int(x1), int(y1), int(x2), int(y2), int(id)
                #print(result)

                # Display current objects IDs
                w, h = x2 - x1, y2 - y1
                cx, cy = x1 + w // 2, y1 + h // 2
                id_txt = f"ID: {str(id)}"
                cv2.putText(frame, id_txt, (cx, cy), 4, 0.5, (0, 0, 255), 1)

                # if we haven't seen aprticular object ID before, register it in a list 
                if totalCount.count(id) == 0:
                    totalCount.append(id)

            # Display detection results
            if display == 'default':
                frame = self.default_display(**display_args)
            
            elif display == 'custom':
                frame == self.custom_display(**display_args)

            # Display FPS on frame
            frame = cv2.putText(frame,f"FPS : {fps:,.2f}" , 
                                (5,55), cv2.FONT_HERSHEY_COMPLEX, 
                            0.5,  (0,255,255), 1, cv2.LINE_AA)
            
            # Display Counting results
            count_txt = f"TOTAL COUNT : {len(totalCount)}"
            frame = cv2.putText(frame, count_txt, (5,45), cv2.FONT_HERSHEY_COMPLEX, 2, (0, 0, 255), 2)
        

            # append frame to the video file
            out.write(frame)
            
            # the 'q' button is set as the
            # quitting button you may use any
            # desired button of your choice

            if cv2.waitKey(1) & 0xFF == ord('q'):
                break

        # After the loop release the cap 
        cap.release()
        out.release()
        print(len(totalCount))
        print(totalCount)
    


### preparing file paths and directories

In [9]:
import os
vid_results_path = r'C:\Users\vinil\Downloads\Object-tracking-and-counting-using-YOLOV8-main\results_video'
test_vids_path = r'C:\Users\vinil\Downloads\Object-tracking-and-counting-using-YOLOV8-main\test vids'


if not os.path.isdir(vid_results_path):
    os.makedirs(vid_results_path)

### Instanciating YOLOv8_ObjectCounter objects 

In [10]:
import random
yolo_names = ['yolov8n.pt', 'yolov8m.pt', 'yolov8s.pt',  'yolov8l.pt']
colors = []
for _ in range(80):
    rand_tuple = (random.randint(50, 255), random.randint(50, 255), random.randint(50, 255))
    colors.append(rand_tuple)


In [11]:
from ultralytics import YOLO

counters = []
for yolo_name in yolo_names:
    counter = YOLOv8_ObjectCounter(yolo_name, conf = 0.60 )
    counters.append(counter)

### Performing object detection, tracking and counting 

In [12]:
import cv2
for counter in counters:
    counter.predict_video(video_path= r'C:\Users\vinil\Downloads\Object-tracking-and-counting-using-YOLOV8-main/test vids/traffic 2.mp4'
, save_dir = vid_results_path, save_format = "avi", display = 'custom', colors = colors)

----------------------------
DETECTING OBJECTS IN : traffic 2.mp4 : 
RESOLUTION : 1280x720
SAVING TO :C:\Users\vinil\Downloads\Object-tracking-and-counting-using-YOLOV8-main\results_video\yolov8n -- traffic 2.avi
Error reading frame
26
[8, 7, 6, 5, 4, 3, 2, 1, 9, 10, 12, 11, 13, 15, 17, 18, 19, 21, 22, 26, 27, 25, 29, 30, 32, 33]
----------------------------
DETECTING OBJECTS IN : traffic 2.mp4 : 
RESOLUTION : 1280x720
SAVING TO :C:\Users\vinil\Downloads\Object-tracking-and-counting-using-YOLOV8-main\results_video\yolov8m -- traffic 2.avi
Error reading frame
25
[43, 42, 41, 40, 39, 38, 44, 45, 46, 47, 48, 49, 55, 56, 58, 61, 60, 62, 57, 66, 69, 64, 70, 73, 74]
----------------------------
DETECTING OBJECTS IN : traffic 2.mp4 : 
RESOLUTION : 1280x720
SAVING TO :C:\Users\vinil\Downloads\Object-tracking-and-counting-using-YOLOV8-main\results_video\yolov8s -- traffic 2.avi
Error reading frame
20
[85, 84, 83, 82, 81, 80, 86, 89, 87, 90, 96, 99, 103, 105, 108, 112, 111, 113, 118, 119]
------

## Preparing results for download 

In [18]:
import zipfile
import os

# Define the directory you want to zip
directory_to_zip = r"C:\Users\vinil\Downloads\Object-tracking-and-counting-using-YOLOV8-main\results_video"

# Define the name of the zip file to create
zip_file_name = "vid_results.zip"

# Initialize ZipFile object
with zipfile.ZipFile(zip_file_name, 'w') as zipf:
    # Walk through each file in the directory
    for foldername, subfolders, filenames in os.walk(directory_to_zip):
        for filename in filenames:
            # Create complete filepath of file in directory
            file_path = os.path.join(foldername, filename)
            # Add file to zip
            zipf.write(file_path, os.path.relpath(file_path, directory_to_zip))

print(f"Zip file '{zip_file_name}' created successfully.")

Zip file 'vid_results.zip' created successfully.


In [8]:
import torch
import torchvision.models as models
import torch.nn as nn
import cv2
import os

# Function to load YOLOv8 model
def load_yolov8_model(model_path):
    # Load the model checkpoint
    checkpoint = torch.load(model_path)
    
    # Print out the keys in the checkpoint
    print("Keys in the checkpoint:")
    for key in checkpoint.keys():
        print(key)
    
    # Adjust the loading logic based on the actual keys in your checkpoint
    model = checkpoint['model']  # Assuming 'model' contains the model itself
    
    # Example adjustment: Replace the final fully connected layer for your specific output
    num_ftrs = model.fc.in_features
    model.fc = nn.Linear(num_ftrs, 2)  # Replace with your output layer
    
    # Set model to evaluation mode
    model.eval()
    
    return model

# Function to perform inference on a video
def perform_inference(video_path, model):
    cap = cv2.VideoCapture(video_path)
    if not cap.isOpened():
        print(f"Error opening video file {video_path}")
        return
    
    frame_count = 0
    while cap.isOpened():
        ret, frame = cap.read()
        if not ret:
            break
        
        # Preprocess frame (if needed) and convert to tensor
        # Example: convert BGR to RGB and normalize
        rgb_frame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
        rgb_frame = rgb_frame / 255.0  # Normalize to [0, 1]
        input_tensor = torch.from_numpy(rgb_frame.transpose((2, 0, 1))).unsqueeze(0).float()
        
        # Perform object detection using your model
        with torch.no_grad():
            outputs = model(input_tensor)
        
        # Example placeholder: process outputs and visualize detections
        # Replace with your actual object detection and visualization code
        # Example: draw bounding boxes on frame
        # for box in detected_boxes:
        #     cv2.rectangle(frame, box, color=(0, 255, 0), thickness=2)
        
        # Display or save processed frame (as needed)
        cv2.imshow('Inference Output', frame)
        if cv2.waitKey(1) & 0xFF == ord('q'):
            break
        
        frame_count += 1
    
    cap.release()
    cv2.destroyAllWindows()

# Example of evaluating on a test dataset
def evaluate_on_test_dataset(model, test_videos_dir):
    for video_file in os.listdir(test_videos_dir):
        video_path = os.path.join(test_videos_dir, video_file)
        print(f"Evaluating video: {video_path}")
        perform_inference(video_path, model)

# Load YOLOv8 model from .pt file
model_path = r'C:\Users\vinil\Downloads\Object-tracking-and-counting-using-YOLOV8-main\yolov8n.pt'
yolov8_model = load_yolov8_model(model_path)

# Directory containing test videos
test_videos_dir = r'C:\Users\vinil\Downloads\Object-tracking-and-counting-using-YOLOV8-main\test vids'

# Evaluate on the test dataset
evaluate_on_test_dataset(yolov8_model, test_videos_dir)

# Calculate evaluation metrics (example placeholder)
precision = 0.75
recall = 0.80
f1_score = 0.77

print(f"Precision: {precision:.2f}")
print(f"Recall: {recall:.2f}")
print(f"F1-Score: {f1_score:.2f}")


Keys in the checkpoint:
epoch
best_fitness
model
ema
updates
optimizer
train_args
date
version


AttributeError: 'DetectionModel' object has no attribute 'fc'

In [15]:
#feature fusion
import cv2
import numpy as np
import tensorflow as tf

class YOLOv8:
    def __init__(self, config_path, weights_path, class_names_path):
        self.net = self.load_model(config_path, weights_path)
        self.classes = self.load_classes(class_names_path)
        self.colors = np.random.uniform(0, 255, size=(len(self.classes), 3))

    def load_model(self, config_path, weights_path):
        # Load YOLO model
        net = cv2.dnn.readNetFromDarknet(config_path, weights_path)
        return net

    def load_classes(self, class_names_path):
        # Load class names
        with open(class_names_path, 'r') as f:
            classes = f.read().strip().split('\n')
        return classes

    def detect_objects(self, image):
        # Preprocess the image
        blob = cv2.dnn.blobFromImage(image, 1 / 255.0, (416, 416), swapRB=True, crop=False)
        self.net.setInput(blob)

        # Get the output layer names
        ln = self.net.getLayerNames()
        ln = [ln[i - 1] for i in self.net.getUnconnectedOutLayers()]

        # Run the forward pass
        layer_outputs = self.net.forward(ln)

        # Feature fusion logic
        fused_features = self.feature_fusion(layer_outputs)

        boxes = []
        confidences = []
        class_ids = []

        for output in fused_features:
            for detection in output:
                scores = detection[5:]
                class_id = np.argmax(scores)
                confidence = scores[class_id]

                if confidence > 0.5:
                    box = detection[0:4] * np.array([image.shape[1], image.shape[0], image.shape[1], image.shape[0]])
                    (centerX, centerY, width, height) = box.astype("int")

                    x = int(centerX - (width / 2))
                    y = int(centerY - (height / 2))

                    boxes.append([x, y, int(width), int(height)])
                    confidences.append(float(confidence))
                    class_ids.append(class_id)

        idxs = cv2.dnn.NMSBoxes(boxes, confidences, 0.5, 0.4)

        result = []
        if len(idxs) > 0:
            for i in idxs.flatten():
                result.append((boxes[i], confidences[i], class_ids[i]))

        return result

    def feature_fusion(self, layer_outputs):
        # Example feature fusion logic: simple concatenation
        fused_features = []
        for output in layer_outputs:
            fused_features.append(output)
        return fused_features





In [3]:
import torch
import torch.nn as nn
import torch.nn.functional as F

class FPN(nn.Module):
    def __init__(self, in_channels, out_channels):
        super(FPN, self).__init__()
        self.in_channels = in_channels
        self.out_channels = out_channels

        # Lateral layers
        self.lateral3 = nn.Conv2d(in_channels[0], out_channels, kernel_size=1, stride=1, padding=0)
        self.lateral4 = nn.Conv2d(in_channels[1], out_channels, kernel_size=1, stride=1, padding=0)
        self.lateral5 = nn.Conv2d(in_channels[2], out_channels, kernel_size=1, stride=1, padding=0)

        # Smooth layers
        self.smooth3 = nn.Conv2d(out_channels, out_channels, kernel_size=3, stride=1, padding=1)
        self.smooth4 = nn.Conv2d(out_channels, out_channels, kernel_size=3, stride=1, padding=1)
        self.smooth5 = nn.Conv2d(out_channels, out_channels, kernel_size=3, stride=1, padding=1)

    def forward(self, c3, c4, c5):
        # Lateral connections
        p5 = self.lateral5(c5)
        p4 = self.lateral4(c4) + F.interpolate(p5, scale_factor=2, mode='nearest')
        p3 = self.lateral3(c3) + F.interpolate(p4, scale_factor=2, mode='nearest')

        # Smooth layers
        p3 = self.smooth3(p3)
        p4 = self.smooth4(p4)
        p5 = self.smooth5(p5)

        return p3, p4, p5


In [None]:
# Replace this with the original YOLOv8 model loading
# self.model = YOLO(model_file)
self.model = ModifiedYOLOv8(YOLO(model_file))


In [20]:
import torch
import torch.nn as nn
import torch.nn.functional as F
import cv2
import numpy as np
import os
import time
from ultralytics import YOLO

# Define the FPN Module
class FPN(nn.Module):
    def __init__(self, in_channels, out_channels):
        super(FPN, self).__init__()
        self.in_channels = in_channels
        self.out_channels = out_channels

        # Lateral layers
        self.lateral3 = nn.Conv2d(in_channels[0], out_channels, kernel_size=1, stride=1, padding=0)
        self.lateral4 = nn.Conv2d(in_channels[1], out_channels, kernel_size=1, stride=1, padding=0)
        self.lateral5 = nn.Conv2d(in_channels[2], out_channels, kernel_size=1, stride=1, padding=0)

        # Smooth layers
        self.smooth3 = nn.Conv2d(out_channels, out_channels, kernel_size=3, stride=1, padding=1)
        self.smooth4 = nn.Conv2d(out_channels, out_channels, kernel_size=3, stride=1, padding=1)
        self.smooth5 = nn.Conv2d(out_channels, out_channels, kernel_size=3, stride=1, padding=1)

    def forward(self, c3, c4, c5):
        # Lateral connections
        p5 = self.lateral5(c5)
        p4 = self.lateral4(c4) + F.interpolate(p5, scale_factor=2, mode='nearest')
        p3 = self.lateral3(c3) + F.interpolate(p4, scale_factor=2, mode='nearest')

        # Smooth layers
        p3 = self.smooth3(p3)
        p4 = self.smooth4(p4)
        p5 = self.smooth5(p5)

        return p3, p4, p5

# Integrate FPN with YOLOv8
class ModifiedYOLOv8(nn.Module):
    def __init__(self, original_yolo_model):
        super(ModifiedYOLOv8, self).__init__()
        self.original_yolo_model = original_yolo_model

        # Define the FPN
        self.fpn = FPN(in_channels=[256, 512, 1024], out_channels=256)  # Adjust in_channels as needed

    def forward(self, x):
        # Extract features using the original YOLOv8 backbone
        features = self.original_yolo_model.backbone(x)
        c3, c4, c5 = features[-3], features[-2], features[-1]

        # Apply FPN
        p3, p4, p5 = self.fpn(c3, c4, c5)

        # Continue with the original YOLOv8 head (detection part)
        detections = self.original_yolo_model.head([p3, p4, p5])

        return detections

# Load the original YOLOv8 model
original_yolo_model =  r'C:\Users\vinil\Downloads\Object-tracking-and-counting-using-YOLOV8-main\ylovm.pt'

# Create the modified YOLOv8 model with FPN
modified_yolo_model = ModifiedYOLOv8(original_yolo_model)

class YOLOv8_ObjectDetector:
    """
    A class for performing object detection on images and videos using YOLOv8.

    Args:
    ------------
        model_file (str): Path to the YOLOv8 model file or yolo model variant name in the format: [variant].pt
        labels (list[str], optional): A list of class labels for the model. If None, uses the default labels from the model file.
        classes (list[str], optional): Alias for labels. Deprecated.
        conf (float, optional): Minimum confidence threshold for object detection.
        iou (float, optional): Minimum IOU threshold for object detection.
    """
    def __init__(self, model_file='yolov8n.pt', labels=None, classes=None, conf=0.25, iou=0.45):
        self.model_file = model_file
        self.labels = labels
        self.classes = classes
        self.conf = conf
        self.iou = iou

        # Load the original YOLOv8 model
        original_yolo_model = YOLO(model_file)

        # Create the modified YOLOv8 model with FPN
        self.model = ModifiedYOLOv8(original_yolo_model)
        self.model_name = model_file.split('.')[0]
        self.results = None

        if labels is None:
            self.labels = self.model.original_yolo_model.names

    def predict_img(self, img, verbose=True):
        """
        Runs object detection on a single image.

        Parameters
        ----------
            img (numpy.ndarray): Input image to perform object detection on.
            verbose (bool): Whether to print detection details.

        Returns:
        -----------
            'ultralytics.yolo.engine.results.Results': A YOLO results object that contains 
             details about detection results :
                    - Class IDs
                    - Bounding Boxes
                    - Confidence score
                    ...
        (pls refer to https://docs.ultralytics.com/reference/results/#results-api-reference for results API reference)

        """

        # Run the model on the input image with the given parameters
        results = self.model(img, classes=self.classes, conf=self.conf, iou=self.iou, verbose=verbose)

        # Save the original image and the results for further analysis if needed
        self.orig_img = img
        self.results = results[0]

        # Return the detection results
        return results[0]

    def default_display(self, show_conf=True, line_width=None, font_size=None, 
                        font='Arial.ttf', pil=False, example='abc'):
        """
        Displays the detected objects on the original input image.

        Parameters
        ----------
        show_conf : bool, optional
            Whether to show the confidence score of each detected object, by default True.
        line_width : int, optional
            The thickness of the bounding box line in pixels, by default None.
        font_size : int, optional
            The font size of the text label for each detected object, by default None.
        font : str, optional
            The font type of the text label for each detected object, by default 'Arial.ttf'.
        pil : bool, optional
            Whether to return a PIL Image object, by default False.
        example : str, optional
            A string to display on the example bounding box, by default 'abc'.

        Returns
        -------
        numpy.ndarray or PIL Image
            The original input image with the detected objects displayed as bounding boxes.
            If `pil=True`, a PIL Image object is returned instead.

        Raises
        ------
        ValueError
            If the input image has not been detected by calling the `predict_img()` method first.
        """
        # Check if the `predict_img()` method has been called before displaying the detected objects
        if self.results is None:
            
            raise ValueError('No detected objects to display. Call predict_img() method first.')
        
        # Call the plot() method of the `self.results` object to display the detected objects on the original image
        display_img = self.results.plot(show_conf, line_width, font_size, font, pil, example)
        
        # Return the displayed image
        return display_img

    def custom_display(self, colors, show_cls=True, show_conf=True):
        """
        Custom display method that draws bounding boxes and labels on the original image, 
        with additional options for showing class and confidence information.

        Parameters:
        -----------
        colors : list
            A list of tuples specifying the color of each class.
        show_cls : bool, optional
            Whether to show class information in the label text. Default is True.
        show_conf : bool, optional
            Whether to show confidence information in the label text. Default is True.

        Returns:
        --------
        numpy.ndarray
            The image with bounding boxes and labels drawn on it.
        """

        img = self.orig_img
        # calculate the bounding box thickness based on the image width and height
        bbx_thickness = (img.shape[0] + img.shape[1]) // 450

        for box in self.results.boxes:
            textString = ""

            # Extract object class and confidence score
            score = box.conf.item() * 100
            class_id = int(box.cls.item())

            x1, y1, x2, y2 = np.squeeze(box.xyxy.numpy()).astype(int)

            # Print detection info
            if show_cls:
                textString += f"{self.labels[class_id]}"

            if show_conf:
                textString += f" {score:,.2f}%"

            # Calculate font scale based on object size
            font = cv2.FONT_HERSHEY_COMPLEX
            fontScale = (((x2 - x1) / img.shape[0]) + ((y2 - y1) / img.shape[1])) / 2 * 2.5
            fontThickness = 1
            textSize, baseline = cv2.getTextSize(textString, font, fontScale, fontThickness)

            # Draw bounding box, a centroid and label on the image
            img = cv2.rectangle(img, (x1, y1), (x2, y2), colors[class_id], bbx_thickness)
            center_coordinates = ((x1 + x2) // 2, (y1 + y2) // 2)
            img = cv2.circle(img, center_coordinates, 5, (0, 0, 255), -1)

            # If there are no details to show on the image
            if textString != "":
                if y1 < textSize[1]:
                    y1 = y1 + textSize[1]
                else:
                    y1 -= 2
                # show the details text in a filled rectangle
                img = cv2.rectangle(img, (x1, y1), (x1 + textSize[0], y1 - textSize[1]), colors[class_id], cv2.FILLED)
                img = cv2.putText(img, textString,
                                  (x1, y1), font,
                                  fontScale, (0, 0, 0), fontThickness, cv2.LINE_AA)

        return img

    def predict_video(self, video_path, save_dir, save_format="avi", display='custom', verbose=True, **display_args):
        """
        Runs object detection on each frame of a video and saves the output to a new video file.

        Args:
        ----------
            video_path (str): The path to the input video file.
            save_dir (str): The path to the directory where the output video file will be saved.
            save_format (str, optional): The format for the output video file. Defaults to "avi".
            display (str, optional): The type of display for the detection results. Defaults to 'custom'.
            verbose (bool, optional): Whether to print information about the video file and output file. Defaults to True.
            **display_args: Additional arguments to be passed to the display function.

        Returns:
        ------------
            None
        """
        # Open the input video file
        cap = cv2.VideoCapture(video_path)

        # Get the name of the input video file
        vid_name = os.path.basename(video_path)

        # Get the dimensions of each frame in the input video file
        width = int(cap.get(3))  # get `width`
        height = int(cap.get(4))  # get `height`

        # Create the directory for the output video file if it does not already exist
        if not os.path.isdir(save_dir):
            os.makedirs(save_dir)

        # Set the name and path for the output video file
        save_name = self.model_name + ' -- ' + vid_name.split('.')[0] + '.' + save_format
        save_file = os.path.join(save_dir, save_name)

        # Print information about the input and output video files if verbose is True
        if verbose:
            print("----------------------------")
            print(f"DETECTING OBJECTS IN: {vid_name}:")
            print(f"RESOLUTION: {width}x{height}")
            print('SAVING TO:' + save_file)

        # Define an output VideoWriter object
        out = cv2.VideoWriter(save_file,
                              cv2.VideoWriter_fourcc(*"MJPG"),
                              30, (width, height))

        # Check if the input video file was opened correctly
        if not cap.isOpened():
            print("Error opening video stream or file")

        # Read each frame of the input video file
        while cap.isOpened():
            ret, frame = cap.read()

            # If the frame was not read successfully, break the loop
            if not ret:
                print("Error reading frame")
                break

            # Run object detection on the frame and calculate FPS
            beg = time.time()
            results = self.predict_img(frame, verbose=False)
            if results is None:
                print('***********************************************')
            fps = 1 / (time.time() - beg)

            # Display the detection results
            if display == 'default':
                frame = self.default_display(**display_args)
            elif display == 'custom':
                frame = self.custom_display(**display_args)

            # Display the FPS on the frame
            frame = cv2.putText(frame, f"FPS: {fps:,.2f}",
                                (5, 15), cv2.FONT_HERSHEY_COMPLEX,
                                0.5, (0, 0, 255), 1, cv2.LINE_AA)

            # Write the frame to the output video file
            out.write(frame)

            # Exit the loop if the 'q' button is pressed
            if cv2.waitKey(1) & 0xFF == ord('q'):
                break

        # After the loop release the cap and video writer
        cap.release()
        out.release()
