## Webcam Detection for traffic signs

The TSD (Traffic Signs Detection) was widely used to detect different signs shown while the vehicle is driving and to make sure the vehicle has obeyed the traffic rules. A new TSD model was trained based on a pretrained model-YOLOv8l by using a customized dataset. This dataset contains five different classes of traffic signs (U-turn, Turn left, Turn right, Go straight and Park), after training the TSD model and using mobile phone camera as a input for live testing, this model can successfully detect and classify all five kinds of traffic signs.

The following shows the workflow of how to carry out the training and use mobile phone built-in camera for testing.


### Step 1. Process the data

Before officially started, the following external packages should be imported for further uses.

In [None]:
import os
import random
import shutil
import torch
from ultralytics import YOLO
import argparse
import cv2
import supervision as sv

The original dataset was built to contain 1789 pictures in total. To be more specific, there were 200 pictures for the class U-turn 389 pictures for Turn right class and 400 pictures in each of the rest three classes.

The dataset was been splitted into two different datasets which were the train and validation dataset to ensure the correct workflow to train the model.

The original dataset should be rearranged like the tree diagram produced by code below.

In [None]:
def print_directory_tree(path, indent=""):
    for file in os.listdir(path):
        file_path = os.path.join(path, file)
        if os.path.isdir(file_path):
            print(f"{indent}├── {file}/")
            print_directory_tree(file_path, indent + "│   ")
        else:
            print(f"{indent}├── {file}")

print("dataset/")
print_directory_tree("dataset")

The following code was used to split the dataset and stored in stated direction.

In [None]:
# Paths to dataset
images_path = "dataset/images"
labels_path = "dataset/labels"
train_images_path = "dataset_split/train/images"
train_labels_path = "dataset_split/train/labels"
val_images_path = "dataset_split/val/images"
val_labels_path = "dataset_split/val/labels"

# Create directories for train and validation splits
os.makedirs(train_images_path, exist_ok=True) # Use exist_ok=True to make sure no error raised when process again
os.makedirs(train_labels_path, exist_ok=True)
os.makedirs(val_images_path, exist_ok=True)
os.makedirs(val_labels_path, exist_ok=True)

# List all image files (end with jpg or png)
image_files = [f for f in os.listdir(images_path) if f.endswith(('.jpg', '.png'))]

# Ensure randomization of the dataset
random.shuffle(image_files)

# Split the dataset
split_ratio = 0.8  # 80% training, 20% validation
split_index = int(len(image_files) * split_ratio)
train_files = image_files[:split_index]
val_files = image_files[split_index:]


# Copy files to their respective directories
def copy_files(files, src_images_path, src_labels_path, dest_images_path, dest_labels_path):
    for image_file in files:
        # Define source and destination paths for images
        src_image = os.path.join(src_images_path, image_file)
        dest_image = os.path.join(dest_images_path, image_file)

        # Copy the image file
        shutil.copy(src_image, dest_image)

        # Define source and destination paths for labels
        label_file = os.path.splitext(image_file)[0] + ".txt"  # Match the label file
        src_label = os.path.join(src_labels_path, label_file)
        dest_label = os.path.join(dest_labels_path, label_file)

        # Check if the label file exists before copying
        if os.path.exists(src_label):
            shutil.copy(src_label, dest_label)
        else:
            print(f"Warning: Label file not found for {image_file}")


# Copy the training and validation files
copy_files(train_files, images_path, labels_path, train_images_path, train_labels_path)
copy_files(val_files, images_path, labels_path, val_images_path, val_labels_path)

print(f"Dataset split completed! Training images: {len(train_files)}, Validation images: {len(val_files)}")

After ran the code, the splitted data should be stored in a folder called dataset_split and already splitted into train and validation data which can be used for further training.

### Step 2. Train the model

Before start training, the dataset.yaml file was needed to be created and put under the dataset_split folder.

This file was mainly used to state the path of train and validation data, and also stated the total class number and their names. The order of classes' names needed to be the same as the one produced when create labels of the pictures (e.g. the class 'U-turn' should have the number of 0 which means it should be putted in the first place).

The dataset.yaml file contents were shown below (these contents can only be useful when putted into a yaml file).

In [None]:
train: train/images
val: val/images

nc: 5

names: ['U-turn','Turn-right','Turn-left','Park','Go-straight']

After created the yaml file, the dataset_split folder should looked like the tree diagram produced by code below.

In [None]:
print("dataset_split/")
print_directory_tree("dataset_split")

Now, the training can be officially started.

The following code was used to create some command-line arguments which can simplify the training process. 

The code can be putted in another python file separately (e.g. train.py) and be run by entering commands in terminal. (e.g. python train.py --epochs 50 --batch-size 8, which means the model will processes 8 training examples simultaneously and repeats the process 50 times).

In [None]:
data_path = r'/Users/andrewyuyy/Documents/GitHub/Webcam_Detection/dataset_split/dataset.yaml' 
# Can be replaced to own path to dataset.yaml file

def parse_args():
    """Parse command-line arguments."""
    parser = argparse.ArgumentParser(description="Train YOLO model on custom dataset")
    parser.add_argument('--model', type=str, default='yolov8l.pt',help="Pretrained YOLO model (e.g., yolov8n.pt, yolov8s.pt, yolov8l.pt)")
    parser.add_argument('--epochs', type=int, default=100, help="Number of training epochs")
    parser.add_argument('--batch-size', type=int, default=16, help="Batch size for training")
    parser.add_argument('--imgsz', type=int, default=640, help="Image size for training and validation")
    parser.add_argument('--device', type=str, default='mps', help="Device to train on: 'cpu', 'cuda', or 'mps'")
    return parser.parse_args()


def main():
    # Parse arguments
    args = parse_args()

    # Verify the device
    device = torch.device(args.device if torch.backends.mps.is_available() or args.device == 'cpu' else 'cuda')
    print(f"Using device: {device}")

    # Initialize YOLO model
    model = YOLO(args.model)

    # Train the model
    print("Starting training...")
    model.train(
        data=data_path,
        epochs=args.epochs,
        batch=args.batch_size,
        imgsz=args.imgsz,
        device=device
    )
    print("Training completed!")


if __name__ == "__main__":
    main()

After the training process, the training details can be found in runs folder. The best trained model was stored under the folder named 'weights', which was named as 'best.pt'. This model can then be used in our project to detect traffic signs.

### Step 3. Use the model to detect traffic signs

By running the following code, the built-in camera of laptop was invoked to detect traffic signs. The camera used can be changed by changing the order of connected cameras captured. The cameras' order can be checked by using **FFmpeg** in the terminal. 0 here is the default built-in camera of Macbook.

These lines of code will created a new window named as **'Traffic_sign_detection'**. The traffic signs captured by the camera will be detected to have a bounding box, the related class' name and also the value of confidence around.

In [None]:
def parse_args():
    parser = argparse.ArgumentParser(description='YOLOv8 live')
    parser.add_argument(
        '--webcam-resolution',
        default=[1280,720],
        nargs=2,
        type=int
    )
    args = parser.parse_args()
    return args

def main():
    args = parse_args()
    frame_width, frame_height = args.webcam_resolution

    cap = cv2.VideoCapture(0) # Used the laptop built-in camera, can be changed to use other cameras connected
    cap.set(cv2.CAP_PROP_FRAME_WIDTH, frame_width)
    cap.set(cv2.CAP_PROP_FRAME_HEIGHT, frame_height)

    model = YOLO(r'/Users/andrewyuyy/Documents/GitHub/Webcam_Detection/runs/detect/train4/weights/best.pt') 
    # Path to best.pt file

    box_annotator = sv.BoxAnnotator(
        thickness=2,
        text_thickness=2,
        text_scale=1,
    )

    while True:
        ret, frame = cap.read()

        result = model(frame)[0]
        detections = sv.Detections.from_yolov8(result)
        labels = [
            f"{model.names[class_id]}{confidence:0.2f}"
            for bbox, confidence, class_id
            in zip(detections.xyxy, detections.confidence, detections.class_id)
        ]

        frame = box_annotator.annotate(
            scene=frame,
            detections=detections,
            labels=labels
        )

        cv2.imshow('Traffic_sign_detection', frame)

        if (cv2.waitKey(30) == 27): # Press esc button to exist
            cap.release()
            cv2.destroyAllWindows()
            break

if __name__ == "__main__":
    main()

### Issues & Improvements

While processing the detection, there were some issues occurred. In this section, these issues will be discussed and some possible improvements will be introduced by the end of each issues.

As shown in picture below the **U-turn** sign was correctly detected with a confidence value of **0.79**.

![Webcam_Detection](Test/U-turn-Upward.png)

But when the **U-turn** sign was placed in another direction, the current model cannot detect it effectively. As shown in the diagram below, the **U-turn** sign was placed upside down and it was accidentally detected as a **Park** and two **Go-straight** signs.

![Webcam_Detection](Test/U-turn-Upside_down.png)

The same situation was occurred when the U-turn sign was pointed to the right and left.

The main reason caused this might be the lack of training data variety. The majority of the training pictures were collected while the signs were placed upright. This caused the trained model to have weak performance when detecting the signs placed in other directions.

Also, the U-turn sign data collected was less than other classes' data, which might be another reason why it has a poorer performance when taking out the detection. The picture below shows the Park sign detected while placed upright. The value of confidence was higher than that of the U-turn sign which is also placed in the upright way.

![Webcam_Detection](Test/Park-Upward.png)

The picture shown below indicated that the Park sign can be correctly detected even putted direct to the right with a confidence of 0.85. The unique features contained in Park sign might be the reason why the model had a better performance in this sign detection.

![Webcam_Detection](Test/Park-To_right.png)

This issue might be fixed by collecting more data of U-turn signs and signs placed in different directions, then trained the model again based on the new dataset.