# YOLO for Object Tracking
## Tracking Traffic Signs and Traffic Lights
The main goal of this project is to create a model capable of **detecting** and also **tracking** traffic signs and traffic lights.

### Dependency
Before starting with this notebook, It is important to install the correct dependencies for this project. We will start by downloading the roboflow library for installing our datasets and we will also install ultralytics for importing and use YOLO.

### YOLO
The YOLO network is highly efficient for object detection tasks. However, we will also need to incorporate other computer vision algorithms to adapt object detection for object tracking.

In [None]:
!pip install roboflow
!pip install ultralytics

### Dataset
First and foremost, it is essential to download the dataset that will be used for training the YOLOv11 architecture. Additionally, we will download a series of videos from another dataset to check and test whether our architecture functions correctly.

In [None]:
import kagglehub
from roboflow import Roboflow

traffic_lights_video_path = kagglehub.dataset_download('matteoiorio/traffic-lights-video')

rf = Roboflow(api_key="Ex8Yj8EiaKSeCDoNeUms")
project = rf.workspace("ithb-5ka4m").project("lisa-traffic-light-detection-8vuch")
version = project.version(3)
traffic_lights = version.download("yolov11")

project = rf.workspace("usmanchaudhry622-gmail-com").project("traffic-and-road-signs")
version = project.version(1)
traffic_signs = version.download("yolov11")


In [None]:
print(traffic_lights.location)
print(traffic_signs.location)

#### COLAB Utility
Add utility function for avoiding the colab disconnection during the YOLO training of the network. This function click every 60000 milliseconds to the connect button, in this way we can avoid to be disconnected.

In [4]:
# Ensure colab doesn't disconnect
%%javascript
function ClickConnect(){
console.log("Working");
document.querySelector("colab-toolbar-button#connect").click()
}setInterval(ClickConnect,60000)

<IPython.core.display.Javascript object>

### Dependencies
Before starting with this notebook, It is necessary to import all the different dependencies that will be used for training the network and use the trained network for tracking the different traffic signs and traffic lights.

In [None]:
# Import Essential Libraries
import os
import random
import cv2
from ultralytics import YOLO
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.patches as patches
import pathlib
from IPython.display import HTML
from base64 import b64encode
import warnings
warnings.filterwarnings('ignore')

## Strategy
The problem encountered for this project is that, there was not a good dataset with both informations about traffic lights and also traffic signs. If we want to use DNN we need datasets with a lot of images, in order to have an high accuracy. So our idea is to use two different YOLO models, one will be trained only on traffic signs and another one on traffic lights, in this ways It will be possible to use two different specialized models for two different tasks, and by combining them It would be possible to accomplish our final task. Both dataset are formatted using the YOLO format, in this way training the system will much smoother. The ideal goal will be to have a single model capable of doing both tasks, but in order to do that we need a dataset resulting for example from the intersection of both datasets, in this way It is possible to have one single model capable of doing both tasks. But if we do so the training time will increase.

### Traffic Signs Dataset
The traffic signs dataset, that can be accessed at this link: https://universe.roboflow.com/usmanchaudhry622-gmail-com/traffic-and-road-signs, has 29 different classes and It is a subset of the GSTBR. Our first idea was to use the entire GSTBR dataset, but because of Its dimension and because we do not have the resources to train a model on such dataset, we decided to switch and use a subset of It. The original GSTBR dataset has more than 50 000 images. This dataset has exactly 10 000 different images of traffic signs, but also by having this kind of dimension It is really difficult to train the model in a short time period.

#### Traffic Signs Classes
This dataset contains 29 different classes, which are:
- Road narrows on right
- 50 KMPh speed limit
- Attention Please
- Beware of children
- CYCLE ROUTE AHEAD WARNING
- Dangerous Left Curve Ahead
- Dangerous Right Curve Ahead
- End of all speed and passing limits
- Give Way
- Go Straight or Turn Right
- Go straight or turn left
- Keep-Left
- Keep-Right
- Left Zig Zag Traffic
- No Entry
- No Over Taking
- Overtaking by trucks is prohibited
- Pedestrian Crossing
- Round-About
- Slippery Road Ahead
- Speed Limit 20 KMPh
- Speed Limit 30 KMPh
- Stop Sign
- Straight Ahead Only
- Traffic signal
- Truck traffic is prohibited
- Turn left ahead
- Turn right ahead
- Uneven Road

---
### Traffic Lights Dataset
The second dataset that we used is the Lisa traffic lights dataset, It has more than 9000 images of traffic lights. Even this dataset is a subset of the origial Lisa traffic Light dataset (https://www.kaggle.com/datasets/mbornoe/lisa-traffic-light-dataset) because the original one has more than 100K images. The dataset can be accessed using this link: https://universe.roboflow.com/ithb-5ka4m/lisa-traffic-light-detection-8vuch.


#### Traffic Lights Classes
1. go
2. stop
3. warning

In [6]:
def show_train_images(directory):
    num_samples = 9
    images_path = os.path.join(directory, "images")
    labels_path = os.path.join(directory, "labels")

    image_files = os.listdir(images_path)
    labels_files = {os.path.splitext(file)[0]: file for file in os.listdir(labels_path)}

    # Randomly select num_samples images
    rand_images = random.sample(image_files, num_samples)

    fig, axes = plt.subplots(3, 3, figsize=(11, 11))

    for i in range(num_samples):
        image = rand_images[i]
        image_name = os.path.splitext(image)[0]
        image_path = os.path.join(images_path, image)
        label_file = labels_files.get(image_name, None)

        # Load image
        img = plt.imread(image_path)

        ax = axes[i // 3, i % 3]
        ax.imshow(img)
        ax.axis('off')

        # Overlay labels if they exist
        if label_file:
            label_path = os.path.join(labels_path, label_file)
            with open(label_path, 'r') as f:
                for line in f:
                    parts = line.strip().split()
                    class_id = int(parts[0])
                    x_center, y_center, width, height = map(float, parts[1:])

                    # Convert YOLO format to rectangle coordinates
                    img_h, img_w = img.shape[:2]
                    x1 = int((x_center - width / 2) * img_w)
                    y1 = int((y_center - height / 2) * img_h)
                    x2 = int((x_center + width / 2) * img_w)
                    y2 = int((y_center + height / 2) * img_h)

                    # Draw rectangle and label
                    rect = patches.Rectangle((x1, y1), x2 - x1, y2 - y1, linewidth=2, edgecolor='red', facecolor='none')
                    ax.add_patch(rect)
                    ax.text(x1, y1 - 10, f"Class: {class_id}", color='red', fontsize=8, backgroundcolor='white')

    plt.tight_layout()
    plt.show()


### Training Images
Here below will be listed some of the images that are taken from both of the two datasets. As we can see the two different datasets has completely different classes, by doing this our models will be specialized for two different tasks. The first dataset contains only road sign images, instead the second dataset only contains traffic lights images.

In [None]:
traffic_signs_images_dir = os.path.join(traffic_signs.location, "train")
traffic_lights_images_dir = os.path.join(traffic_lights.location, "train")
show_train_images(traffic_signs_images_dir)

In [None]:
show_train_images(traffic_lights_images_dir)

### YOLOv11 import and training
We trained the YOLOv11 model instand of YOLOv8 model because we needed a small model capable of performing this task with higher accuracy. Both of models will be trained for 30 epochs by using the default optimizer: Adam and using an automatic batch.

#### Traffic Signs Model
The first model that will be trained is the traffic sign model. This model will use the traffic signs dataset. Because also this dataset is very huge in dimension, the training will require a lot of time.

In [None]:
traffic_sings_model = YOLO('yolo11n.pt')
output_data_path = os.path.join(traffic_signs.location, "data.yaml")
results = traffic_sings_model.train(data=output_data_path, epochs = 30, batch = -1, optimizer = 'auto')

#### Traffic Light Model
The second model that will be trained is the traffic light model. This model will use the Lisa dataset. Here we use the same strategy as before so we train for 30 epochs using the default optimizer and using the default batch size.

In [None]:
traffic_lights_model = YOLO('yolo11n.pt')
output_data_path = os.path.join(traffic_lights.location, "data.yaml")
results = traffic_lights_model.train(data=output_data_path, epochs = 30, batch = -1, optimizer = 'auto')

#### Combination of the Models

Now that both models are trained, we need to combine them to process a single frame. This allows us to call both models for predictions on a given frame and return any detections to the caller.

---
#### Detectors Class
To achieve this, we created a class called `Detectors`. This class takes a list of models as input, in this example we pass as input the traffic sign model and the traffic light model. The class exposes a `detect` method that takes a single frame as input. Internally, it calls the `single_detect` method, which uses the provided model to identify objects belonging to Its trained classes.

The combined detections are returned as a list of objects. Each object includes the bounding box information (x1, y1, x2, y2) and the class name.

---

### Example Usage

```python

detector = Detectors([signs_model, lights_model])
detections = detector.detect(frame)

print(detections): detections = [[100, 50, 200, 150, 'Stop Sign'], [300, 100, 400, 200, 'stop']]

In [28]:
class Detectors:
    """
    A class that combines multiple detection models (e.g., traffic signs and lights)
    to process a single frame and provide a consolidated list of detections.

    Attributes:
        models: All the models that will be used for executing the detection on the input frames.
    """

    def __init__(self, models: list):
        """
        Initializes the Detectors class with the specified models.

        Args:
            models: List of all the models used for the detection task.
        """
        self.models = list(models)

    def __single_detect(self, model, frame):
        """
        Uses the specified model to detect objects in the provided frame.

        Args:
            model: The detection model to use for predictions.
            frame: The image frame to analyze.

        Returns:
            A list of detected objects. Each object is represented as a list
            containing the bounding box coordinates (x1, y1, x2, y2) and
            the class name.
        """
        bbox_list = []
        objs = model.predict(frame, verbose=False, conf=0.5)[0].boxes.data.cpu().numpy()
        for det in objs:
            x1, y1, x2, y2, conf, cls = det
            bbox_list.append([x1, y1, x2, y2, model.names[cls]])
        return bbox_list

    def detect(self, frame):
        """
        Detects objects in the given frame using all the models stored inside this class.

        Args:
            frame: The image frame to analyze.

        Returns:
            A combined list of detections very models. Each detection
            includes bounding box coordinates and the class name.
        """
        detections = []
        for model in self.models:
          detections.extend(self.__single_detect(model, frame))
        return detections

detector = Detectors([traffic_sings_model, traffic_lights_model])

### Error handling
Set UTF-8 as encoding for possible errors

In [12]:
import locale

def getpreferredencoding(do_setlocale: bool = True):
    return "UTF-8"
locale.getpreferredencoding = getpreferredencoding

### Object Traking Phase
#### Intersection over Union
Intersection over Union (IoU) is a metric used to evaluate the accuracy of object detection and tracking systems. It measures the overlap between two bounding boxes:

1. The predicted bounding box (what the model detects).
2. The ground truth bounding box (the actual or correct bounding box).

Mathematically, it is defined as:
$$ IoU = \frac{\text{Area of Overlap}}{\text{Area of Union}} $$
Where:

1. **Area of Overlap** is the region where the predicted and ground truth boxes intersect.
2. **Area of Union** is the total area covered by both bounding boxes combined, subtracting any overlap to avoid double-counting.

IoU serves as a quantitative measure of how well the predicted bounding box aligns with the ground truth. It ranges between 0 and 1:

1. IoU = 0 means no overlap between the boxes.
2. IoU = 1 means perfect overlap.

When tracking an object across frames, IoU helps in determining whether the object detected in the current frame corresponds to the same object detected in the previous frame.


In [13]:
def compute_iou(bb1: list, bb2: list):
  bb1_x1=bb1[0]
  bb1_y1=bb1[1]
  bb1_x2=bb1[2]
  bb1_y2=bb1[3]

  bb2_x1=bb2[0]
  bb2_y1=bb2[1]
  bb2_x2=bb2[2]
  bb2_y2=bb2[3]

  x_left = max(bb1_x1, bb2_x1)
  y_top = max(bb1_y1, bb2_y1)
  x_right = min(bb1_x2, bb2_x2)
  y_bottom = min(bb1_y2, bb2_y2)

  if x_right < x_left or y_bottom < y_top:
    return 0.0

  intersection_area = (x_right - x_left) * (y_bottom - y_top)

  bb1_area = (bb1_x2 - bb1_x1) * (bb1_y2 - bb1_y1)
  bb2_area = (bb2_x2 - bb2_x1) * (bb2_y2 - bb2_y1)

  iou = intersection_area / float(bb1_area + bb2_area - intersection_area)
  assert iou >= 0.0
  assert iou <= 1.0
  return iou

### Bounding Box tracker
The IOUTracker class is designed for simple object tracking in video or image sequences using Intersection-over-Union (IoU) as the basis for associating detected objects between frames. This tracker maintains a set of tracked objects and updates their positions based on new detections in each frame.

---

#### Initialization:
1. **\_\_init\_\_**(`iou_threshold`=0.5): Initializes the tracker with the following attributes:
iou_threshold: A float specifying the minimum IoU required to consider two bounding boxes as the same object (default is 0.5).
2. tracked_objects: A dictionary mapping object IDs to their bounding boxes.
next_object_id: An integer counter for assigning unique IDs to new objects.

#### Methods

1. `update(detections)`: Updates the set of tracked objects based on new detections.
  * **Parameters**: `detections`: A list of bounding boxes in the format [[x1, y1, x2, y2, class name], ...], where (x1, y1) is the top-left corner, and (x2, y2) is the bottom-right corner of the bounding box.

#### Process:
For each existing tracked object, it calculates the IoU with each detection.
Matches the detection with the highest IoU above the iou_threshold to the tracked object. Any unmatched detections are considered new objects and are assigned unique IDs. Returns:
* An updated dictionary of tracked objects (object_id -> bounding_box).

#### Key Features

* Tracks objects over multiple frames by associating new detections to existing tracked objects using IoU.
* Dynamically assigns new IDs to previously unseen objects.
* Provides a straightforward and efficient tracking mechanism, suitable for scenarios with moderate object movement and consistent detections.

---

### Example Usage

```python
tracker = IOUTracker(iou_threshold=0.5)
detections = [[100, 50, 200, 150, 'Stop Sign'], [300, 100, 400, 200, 'stop']]
tracked_objects = tracker.update(detections)
print(tracked_objects)


In [14]:
class IOUTracker:
    """
    A tracker that uses Intersection Over Union (IOU) to match and track objects across frames.

    Attributes:
        iou_threshold (float): The minimum IOU required to consider a detection as the same object.
        tracked_objects (dict): A dictionary mapping object IDs to their bounding boxes.
        next_object_id (int): The next available ID for newly detected objects.
    """

    def __init__(self, iou_threshold=0.5):
        """
        Initializes the IOUTracker with a specified IOU threshold.

        Args:
            iou_threshold (float): The IOU threshold for associating detections with tracked objects.
        """
        self.iou_threshold = iou_threshold
        self.tracked_objects = {}  # Object ID -> Bounding Box
        self.next_object_id = 0

    def update(self, detections: list):
        """
        Updates the tracker with new detections and returns the currently tracked objects.

        Args:
            detections (list): A list of bounding boxes from the current frame.
                               Each bounding box is a list [x1, y1, x2, y2, class_name].

        Returns:
            dict: A dictionary of tracked objects where keys are object IDs and values are bounding boxes.
                  Each bounding box is in the format [x1, y1, x2, y2, class_name].
        """
        updated_tracked_objects = {}
        assigned_detections = set()

        # Match existing tracked objects to new detections
        for obj_id, prev_bbox in self.tracked_objects.items():
            best_iou = 0
            best_det_idx = -1

            for idx, det_bbox in enumerate(detections):
                if idx in assigned_detections:
                    continue
                iou = compute_iou(prev_bbox[:4], det_bbox[:4])
                if iou > best_iou and iou >= self.iou_threshold:
                    best_iou = iou
                    best_det_idx = idx

            if best_det_idx != -1:
                updated_tracked_objects[obj_id] = detections[best_det_idx]
                assigned_detections.add(best_det_idx)

        # Add new detections as new objects
        for idx, det_bbox in enumerate(detections):
            if idx not in assigned_detections:
                updated_tracked_objects[self.next_object_id] = det_bbox
                self.next_object_id += 1

        # Update the tracked objects
        self.tracked_objects = updated_tracked_objects
        return self.tracked_objects


#### Video Processing with Object Detection and Tracking

This function, `create_new_video`, processes a video to detect and track objects in each frame using YOLO-based detection and an IOU tracker. Here's how it works:

1. **Input and Output Initialization**:
   - The input video file is read using OpenCV, and its properties, such as frame rate, width, and height, are extracted.
   - The output video path is generated based on the input video name, and a `VideoWriter` is set up to save the processed frames.

2. **Object Detection**:
   - Each frame is read from the video, and YOLO-based object detection is performed using the `detector` class. The detected objects, including their bounding boxes and class names, are extracted.

3. **Object Tracking**:
   - The bounding boxes are passed to an IOU-based tracker that tracks objects across frames, maintaining consistent IDs for tracked objects.

4. **Frame Annotation**:
   - Tracked objects are drawn onto each frame. This includes:
     - Bounding boxes around detected objects.
     - Class names annotated near the bounding boxes.

5. **Output Video Generation**:
   - Each annotated frame is written to the output video file.
   - Once processing is complete, all resources are released, and the path to the output video is returned.

This function is useful for creating videos that visually display the results of object detection and tracking models, enabling easier analysis and visualization of model performance.


In [30]:
def create_new_video(video_name: str):
    """
    Processes a video to detect and track objects, saving the results as a new video.

    Args:
        video_name (str): The name of the input video file.

    Returns:
        str: The file path of the processed output video.
    """
    # Construct input and output video paths
    input_path = os.path.join(traffic_lights_video_path, video_name)
    output_video_path = video_name.split('.')[0] + "-output.mp4"

    # Open the input video file
    cap = cv2.VideoCapture(input_path)

    # Get video properties
    fps = int(cap.get(cv2.CAP_PROP_FPS))
    frame_width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
    frame_height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))

    # Set up the video writer for the output video
    fourcc = cv2.VideoWriter_fourcc(*'XVID')
    out = cv2.VideoWriter(output_video_path, fourcc, fps, (frame_width, frame_height))

    # Initialize the tracker
    tracker = IOUTracker(iou_threshold=0.5)

    # Process each frame of the video
    while cap.isOpened():
        ret, frame = cap.read()
        if not ret:
            break

        # Perform detection on the current frame
        results = detector.detect(frame)

        # Extract bounding boxes from the detections
        bbox_list = []
        for det in results:
            x1, y1, x2, y2, cls = det
            bbox_list.append([x1, y1, x2, y2, cls])

        # Update tracker with current frame's bounding boxes
        tracked_objects = tracker.update(bbox_list)

        # Draw tracked objects on the frame
        for obj_id, bbox in tracked_objects.items():
            x1, y1, x2, y2 = map(int, bbox[:4])
            class_name = bbox[4]
            cv2.rectangle(frame, (x1, y1), (x2, y2), (0, 255, 0), 2)
            cv2.putText(frame, f'Class: {class_name}', (x1, y1 - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.75, (0, 255, 0), 2)

        # Write the processed frame to the output video
        out.write(frame)

    # Release resources
    cap.release()
    out.release()

    return output_video_path


### Test a video
Now that It was possible to train our system we can test the YOLO architecture on a video, by using the create\_new\_video function, then after parsing the video and checking If in each frame there is an possible object we create the output video that is possible to visualize by the following cell.

In [None]:
  print(os.listdir(traffic_lights_video_path))
  video = create_new_video("traffic_lights_red_1.mp4")
  mp4 = open(video,'rb').read()
  data_url = "data:video/mp4;base64," + b64encode(mp4).decode()

  HTML(""" <video controls><source src="%s" type="video/mp4"></video>""" % data_url)