
<h1 align=center><font size = 5>CAPSTONE PROJECT</font></h1>
<h2 align=center><font size = 5>AIML Certification Programme</font></h2>



## Student Name and ID:
Mention your name and ID if done individually<br>
If done as a group,clearly mention the contribution from each group member qualitatively and as a precentage.<br>
1. KUNA MURALI (ID: 2024AIML030)                          

2. MADIRE MAHESHKUMAR (ID: 2024AIML079)

3. V VIJAY KUMAR (ID: 2024AIML100)

4. GADIGA MOUNESWAR BABU (ID: 2024AIML095)


## Helmet Violation Detection from Indian CCTV Video

**Problem statement:**
    Detect and flag two-wheeler helmet violations (helmetless riding) from traffic camera frames in Indian cities in real-time.

**Description:**
Create a computer vision system using YOLOv8 and object tracking to detect two-wheeler riders and classify helmet usage. Optionally perform license plate OCR for enforcement.

**Dataset:**

    •	Indian Helmet Detection Dataset
    
    •	Research-generated dataset of Indian two-wheeler violations (images+video with annotations for helmet & plate) 

   

## Setup

Import libraries:

In [None]:
!pip install opencv-python==4.9.0.80
!pip install matplotlib==3.8.4
!pip install numpy==1.26.4
!pip install pillow==10.3.0
!pip install pandas==2.2.2
!pip install seaborn==0.13.2
!pip install scikit-learn==1.4.2
!pip install torch==2.3.0
!pip install notebook==7.2.0
!pip install albumentations==1.4.8
!pip install albucore==0.0.16
!pip install ultralytics==8.0.134
!pip install --upgrade ultralytics torch

In [None]:
import sys
import os
import random
import shutil
import hashlib
import warnings
import numpy as np
import cv2
import matplotlib.pyplot as plt
%matplotlib inline
warnings.filterwarnings("ignore")
from PIL import Image, ImageDraw, ImageEnhance, ImageFilter
import albumentations as A
import glob
import pandas as pd
import seaborn as sns
from itertools import combinations
from sklearn.decomposition import PCA
from sklearn.cluster import KMeans

scripts_dir = os.path.abspath(os.path.join(os.getcwd(), '..', 'scripts'))
sys.path.append(scripts_dir)
for entry in os.listdir(scripts_dir):
    entry_path = os.path.join(scripts_dir, entry)
    if os.path.isdir(entry_path):
        sys.path.append(entry_path)

from utils import show_images_grid, split_and_copy_dataset, show_random_images_grid
from flip import HorizontalFlip
from zoom import DynamicZoomer
from mosaic import MosaicAugmentor
from cutout import CutoutAugmentor
from synthetic import SyntheticImageAugmentor
from edgedetect import EdgeDetectAugmentor
from cutmix import CutMixAugmentor
from rotate import RotateAugmentor
from shadow import ShadowCastingAugmentor
from grayscale import GrayscaleAugmentor
from noise import NoiseInjectionAugmentor

## Data Augmentation

In [None]:
image_folder = "..\\data\\raw\\train\\images"
label_folder = "..\\data\\raw\\train\\labels"

class_map = {
    0: "NumberPlate",
    1: "Person",
    2: "Helmet",
    3: "Head",
    4: "Motorbike"
}

images = os.listdir(image_folder)
labels = os.listdir(label_folder)


print("Total images:", len(images))
print("Total label files:", len(labels))
print("Missing label files:", set(os.path.splitext(i)[0] for i in images) - set(os.path.splitext(l)[0] for l in labels))


## Geometric Transformations

**Horizontal Flipping** – Simulates helmets from different directions.

* Horizontal flip in data augmentation is a technique that mirrors images along the vertical axis (left to right). 
* It effectively creates a flipped version of each image, doubling the variety of orientations the model sees during training. 
* This simple transformation helps models learn that objects and features can appear in different left-right positions, improving their generalization and robustness.

<b>Benefits:</b>

* Horizontal flipping doubles the dataset size by creating mirrored versions of images.

* Helps models generalize better by exposing them to left-right variations of objects, improving robustness, especially in tasks like object detection.

* Ensures the bounding box labels remain accurate after flipping, which is critical for supervised learning correctness.

* *his augmentation technique combats overfitting and enhances model performance on unseen test data by diversifying training examples.

In [None]:
flip_image_folder = '..\\data\\processed\\train-flip\\images'
flip_label_folder = '..\\data\\processed\\train-flip\\labels'

flipper = HorizontalFlip(image_folder, label_folder, flip_image_folder, flip_label_folder)
flipper.process()


In [None]:
show_images_grid(
            [image_folder, flip_image_folder],
            [label_folder, flip_label_folder],
            ['013_16_jpeg.rf.cc06dc94c659d717549dc88f601c9ff1.jpg','013_16_jpeg.rf.cc06dc94c659d717549dc88f601c9ff1_flip.jpg'],
            class_map=class_map
        )

In [None]:
show_random_images_grid(flip_image_folder, flip_label_folder, class_map, N=6)

**Scaling/Zooming** – Randomly zoom in to focus on object of interest in the frame.

* Image zoom augmentation randomly scales in on the image, cropping and resizing it back to the original size.
* This simulates objects appearing larger or closer, helping the model learn to detect helmets and motorbikes at different scales and distances.
* It improves robustness to camera zoom and real-world variations in object size.

<b>Benefits:</b>

* Focuses Training on Relevant Objects:
By zooming in tightly on objects (e.g., helmets), the model sees them larger and in more detail, which can improve detection accuracy, especially for small objects.

* Increases Variation in Scale:
Varying zoom levels mimic real-world scenarios where objects may be closer or farther.

* Augments Dataset Without Changing Image Size:
Keeps input size consistent with model expectations but changes spatial content.

* Improves Model Robustness to Object Size Variability:
Helps the model learn to detect objects at different scales and reduces bias toward object size distribution in original data.

In [None]:
zoomed_image_folder = '..\\data\\processed\\train-zoom\\images'
zoomed_label_folder = '..\\data\\processed\\train-zoom\\labels'

zoomer = DynamicZoomer(image_folder, label_folder, zoomed_image_folder, zoomed_label_folder)
zoomer.process()

In [None]:
show_images_grid(
            [image_folder, zoomed_image_folder],
            [label_folder, zoomed_label_folder],
            ['013_48_jpeg.rf.dee0ddaded9e687ea455ad1ddb14d7fc.jpg','013_48_jpeg.rf.dee0ddaded9e687ea455ad1ddb14d7fc_zoom.jpg'],
            class_map=class_map
        )

In [None]:
show_random_images_grid(zoomed_image_folder, zoomed_label_folder, class_map, N=6)

**Rotation Data Augmentation** – Rotates images and their bounding boxes by a random angle (e.g., ±10–30 degrees). 

* This technique helps the model become invariant to the orientation of objects, simulating real-world scenarios where two-wheelers and helmets may not always appear perfectly upright in the frame. 
* Rotation augmentation improves robustness to camera tilt, road slope, and diverse viewpoints, leading to better generalization on unseen data.

<b>Benefits:</b>

Rotation simulates viewpoint changes and varied camera angles, increasing model robustness to object orientation variance.

* Albumentations handles proper geometric transformation of bounding boxes, preventing label mismatch.

* Controlled rotation angles prevent unrealistic large rotations that would harm training.

* Filling borders with a neutral color avoids visual artifacts near image edges.

* Augmented data help models generalize better in real-world scenarios where objects/domain may be rotated.

In [None]:
rotation_image_folder = '..\\data\\processed\\train-rotation\\images'     # Directory to save cutout images
rotation_label_folder = '..\\data\\processed\\train-rotation\\labels'  # Directory to save cutout labels

rotate_augmentor = RotateAugmentor(image_folder, label_folder, rotation_image_folder, rotation_label_folder)
rotate_augmentor.process()

In [None]:
show_random_images_grid(rotation_image_folder, rotation_label_folder, class_map, N=6)

In [None]:
show_images_grid(
            [image_folder, rotation_image_folder],
            [label_folder, rotation_label_folder],
            ['013_16_jpeg.rf.cc06dc94c659d717549dc88f601c9ff1.jpg','013_16_jpeg.rf.cc06dc94c659d717549dc88f601c9ff1_rotation.jpg'],
            class_map=class_map
        )

## Bounding Box–Specific Techniques

**Mosaic Augmentation** – Mosaic augmentation combines four different images into one by stitching them together in a grid.

* This technique increases dataset diversity, helps the model learn to detect small objects, and improves robustness to varied object scales and contexts.

<b>Benefits:</b>
* Combines Multiple Contexts:
Mosaic augmentation merges four different scenes into one image, increasing contextual diversity and object co-occurrence variety.

* Improves Small Object Detection:
Resizing and grouping multiple images allows models to see varying object scales and densities.

* Increases Dataset Variety:
Creates many novel combinations from existing data, boosting effective dataset size without new data collection.

* Helps Models Generalize:
Exposes models to crowded or complex scenes, better preparing for real-world variability, especially useful in object detection tasks like helmet or number plate detection.

In [None]:
mosaic_image_folder = '..\\data\\processed\\train-mosaic\\images'     # Directory to save mirrored images
mosaic_label_folder = '..\\data\\processed\\train-mosaic\\labels'  # Directory to save updated labels

mosaic_augmentor = MosaicAugmentor(image_folder, label_folder, mosaic_image_folder, mosaic_label_folder)
mosaic_augmentor.process()

In [None]:
show_random_images_grid(mosaic_image_folder, mosaic_label_folder, class_map, N=6)

**CutMix Augmentation** – CutMix augmentation creates new training samples by cutting a patch from one image and pasting it onto another, while updating the bounding boxes accordingly. 

* This technique helps improve model robustness by exposing it to mixed-context images and encourages better generalization to occlusions and varied object arrangements.

<b>Benefits:</b>
* Increases Data Diversity:
Creates composite training samples by blending content from two images, enriching variability without collecting new data.

* Improves Robustness to Occlusion:
Simulates occlusions by partially replacing image areas with different objects/scenes, helping the model handle occluded or mixed environments.

* Enhances Learning of Multiple Object Contexts:
Mixes objects from two scenes, encouraging more generalizable feature representations.

* Balances Class Distribution:
Can increase examples of underrepresented classes by choosing pairs intelligently.

In [None]:
cutmix_image_folder = '..\\data\\processed\\train-cutmix\\images'     # Directory to save cutout images
cutmix_label_folder = '..\\data\\processed\\train-cutmix\\labels'  # Directory to save cutout labels

cutmix_augmentor = CutMixAugmentor(image_folder, label_folder, cutmix_image_folder, cutmix_label_folder)
cutmix_augmentor.process()

In [None]:
show_random_images_grid(cutmix_image_folder, cutmix_label_folder, class_map, N=6)

**Cutout/Random Erasing** – Randomly mask parts of helmets or background for occlusion robustness.

* Cutout data augmentation is a technique that randomly masks out (removes) a contiguous square region of an input image during training. 
* This masks a portion of the visual data, forcing the model to rely on less obvious or less prominent features to correctly recognize objects. By doing so, it improves the model's robustness to partial occlusions and over-reliance on specific image details.

<b>Benefits:</b>
* Increases Robustness to Occlusion:
Models learn to recognize objects despite missing or blocked parts by exposing them to images with random black regions.

* Reduces Overfitting:
By forcing reliance on multiple cues, the model generalizes better to unseen and partially occluded objects.

* Easy to Implement and Efficient:
Cutout is computationally inexpensive and can be applied online during training or preprocessing.

* Improves Performance in Object Detection and Classification:
Especially effective for scenarios where objects may be partially obstructed or appear in cluttered scenes.

In [None]:
cutout_image_folder = '..\\data\\processed\\train-cutout\\images'     # Directory to save cutout images
cutout_label_folder = '..\\data\\processed\\train-cutout\\labels'  # Directory to save cutout labels

cutout_augmentor = CutoutAugmentor(image_folder, label_folder, cutout_image_folder, cutout_label_folder)
cutout_augmentor.process()

In [None]:
show_random_images_grid(cutout_image_folder, cutout_label_folder, class_map, N=6)

## Environmental Simulations

**Synthetic Data Augmentation Techniques:**

- **Fog Simulation:** Adds artificial fog or haze to images, mimicking low-visibility conditions. This helps the model learn to detect objects in adverse weather, improving robustness to real-world foggy scenes.

- **Rain Simulation:** Overlays rain streaks or droplets onto images, simulating rainy weather. This augmentation teaches the model to recognize helmets and vehicles even when visibility is reduced by rain.

- **Blur Augmentation:** Applies motion blur or defocus blur to images, replicating camera shake or out-of-focus scenarios. This helps the model handle blurry frames from CCTV footage or fast-moving vehicles.

- **Illumination Variation:** Adjusts brightness to simulate different lighting condition. This ensures the model is robust to varying illumination and can generalize across different times of day and lighting environments.

In [None]:
synthetic_image_folder = '..\\data\\processed\\train-synthetic\\images'     # Directory to save cutout images
synthetic_label_folder = '..\\data\\processed\\train-synthetic\\labels'  # Directory to save cutout labels

synthetic_augmentor = SyntheticImageAugmentor(image_folder, label_folder, synthetic_image_folder, synthetic_label_folder)
synthetic_augmentor.process()

In [None]:
show_random_images_grid(synthetic_image_folder, synthetic_label_folder, class_map, N=6)

**Shadow Augmentation:** Adds synthetic shadows to simulate real-world conditions (vehicles passing next to flyover, under the foot over bridges)

* It involves overlaying artificial shadows onto images to simulate real-world lighting conditions and occlusions. 
* This technique helps computer vision models, such as object detectors, learn to recognize and localize objects even when parts of the scene are darkened or partially obscured by shadows. 
* By introducing varying shapes, positions, and intensities of shadows, shadow augmentation improves a model's robustness to lighting variability and partial object occlusion, which are common challenges in real-world environments.

<b>Benefits:</b>
* Simulates Realistic Environmental Shadows:
Shadows frequently occur in outdoor scenes (e.g., streets, vehicles); augmenting images with shadows enhances model robustness to illumination variability.

* Improves Generalization:
Helps models learn invariant representations despite partial shadow occlusion of objects, reducing false negatives.

* Adds Lighting Diversity Without Geometric Changes:
Shadows alter pixel intensities without modifying spatial layout; labels remain valid.

* Enhances Dataset Variability at Low Cost:
Shadows are easy to generate programmatically by polygon overlays with varied shapes and intensities.

In [None]:
shadow_image_folder = '..\\data\\processed\\train-shadow\\images'     # Directory to save cutout images
shadow_label_folder = '..\\data\\processed\\train-shadow\\labels'  # Directory to save cutout labels

shadow_augmentor = ShadowCastingAugmentor(image_folder, label_folder, shadow_image_folder, shadow_label_folder)
shadow_augmentor.process()


In [None]:
show_random_images_grid(shadow_image_folder, shadow_label_folder, class_map, N=6)

## Photometric Augmentations

**Edge Detection:** Highlights object boundaries and textures by converting images into edges

* Edge detection in data augmentation is the process of transforming images by highlighting their boundaries and shape outlines using algorithms like Sobel or Canny filters. 
* This technique enriches the dataset with edge-enhanced images, helping models focus on the contours and structural features of objects rather than textures or colors. 
* Edge-based augmentation encourages the model to learn shape cues, improving its robustness to variations, occlusions, and noise, and is particularly valuable for tasks that depend on accurate object localization and boundary recognition.

In [None]:
edge_image_folder = '..\\data\\processed\\train-edge\\images'     # Directory to save cutout images
edge_label_folder = '..\\data\\processed\\train-edge\\labels'  # Directory to save cutout labels

edge_augmentor = EdgeDetectAugmentor(image_folder, label_folder, edge_image_folder, edge_label_folder)
edge_augmentor.process()

In [None]:
show_random_images_grid(edge_image_folder, edge_label_folder, class_map, N=6)

**Greyscale Conversion:** Color transformation - removes color information, keeping only intensity

* It refers to converting color images to greyscale as part of the training process. 
* This technique encourages models to focus on texture, shape, and structural features rather than color information. 
* By exposing models to greyscale versions of images, it enhances robustness against varying lighting conditions and color distortions, helping the model perform better when color cues are unreliable or missing.

<b>Benefits</b>
* Simulates Variations in Illumination and Color:
Training on grayscale images forces models to rely more on texture, shape, and edge information, improving generalization when color cues are unavailable or misleading.

* Improves Robustness Across Domains:
Models become less sensitive to color distribution bias, helpful in scenarios where input images vary widely in color characteristics (e.g., night/day or different cameras).

* Simple yet Effective Augmentation:
Grayscale conversion is computationally efficient and easy to apply as a part of broader augmentation pipelines.

In [None]:
grayscale_image_folder = '..\\data\\processed\\train-grayscale\\images'     # Directory to save cutout images
grayscale_label_folder = '..\\data\\processed\\train-grayscale\\labels'  # Directory to save cutout labels

grayscale_augmentor = GrayscaleAugmentor(image_folder, label_folder, grayscale_image_folder, grayscale_label_folder)
grayscale_augmentor.process()

In [None]:
show_random_images_grid(grayscale_image_folder, grayscale_label_folder, class_map, N=6)

**Noise Augmentation:** Random noise is deliberately added to training data 

* Gaussian noise augmentation introduces random variations in pixel intensity following a normal distribution, which simulates sensor or environmental noise. 
* Salt and pepper noise augmentation randomly sets some pixels to black or white, mimicking impulse noise found in low-quality or corrupted images. 
* Both techniques help models become more robust to noisy, low-quality, or imperfect real-world data by training them on visually challenging samples, making predictions more reliable under non-ideal imaging conditions.

<b>Benefits:</b>
* Simulates Realistic Sensor Noise:
Noise injection mimics common real-world artifacts like sensor noise, low light grain, or transmission errors, making models robust to noisy inputs.

* Improves Model Generalization:
Forces models to learn more discriminative features by reducing reliance on exact pixel patterns corrupted by noise.

* Encourages Robust Feature Extraction:
Models trained on noisy data handle noisy or corrupted images better during inference.

* Simple and Fast to Implement:
Adds minimal computational overhead and can be used as an offline or online augmentation.

In [None]:
gaussian_image_folder = '..\\data\\processed\\train-gaussian\\images'     # Directory to save cutout images
gaussian_label_folder = '..\\data\\processed\\train-gaussian\\labels'  # Directory to save cutout labels

gaussian_noise_augmentor = NoiseInjectionAugmentor(image_folder, label_folder, gaussian_image_folder, gaussian_label_folder, noise_type='gaussian')
gaussian_noise_augmentor.process()

salt_pepper_image_folder = '..\\data\\processed\\train-salt_pepper\\images'     # Directory to save cutout images
salt_pepper_label_folder = '..\\data\\processed\\train-salt_pepper\\labels'  # Directory to save cutout labels

salt_pepper_noise_augmentor = NoiseInjectionAugmentor(image_folder, label_folder, salt_pepper_image_folder, salt_pepper_label_folder, noise_type='salt_pepper')
salt_pepper_noise_augmentor.process()

In [None]:
show_random_images_grid(gaussian_image_folder, gaussian_label_folder, class_map, N=6)

In [None]:
show_random_images_grid(salt_pepper_image_folder, salt_pepper_label_folder, class_map, N=6)

## Dataset Splitting and Copying for Train/Validation Sets
This facilitate splitting image datasets into training and validation sets and organizing them into proper directory structures. These functions support working with datasets having corresponding label files in YOLO format.

**split_and_copy_dataset**

This function performs a straightforward split of a dataset into training and validation subsets based on a specified ratio, then copies images and corresponding label files to destination folders.

**Workflow**:

Deletes any existing data in destination folders for a clean start.

Reads all images in source folder, shuffles them randomly.

Splits shuffled images into training and validation sets per split_ratio.

Copies images and associated label .txt files to train/val folders accordingly.

Prints the count of images copied to each subset.

**split_and_copy_all_processed**

This function extends the above with support for multiple augmentation subfolders inside a processed root directory. It samples a fraction of images from each augmentation folder, then splits and copies them similarly.

**Workflow**:

Iterates over augmentation folders in processed_root.

For each augmentation, samples a fraction (sample_ratio) of images randomly.

Splits sampled images into train and val sets per split_ratio.

Copies sampled, split images and labels to destination folders under model/train/<aug_type> and model/val/<aug_type>.

Removes train- prefix in folder names before copy.

Prints processed folder and counts per split.

Cleans and recreates destination directories before copying.

In [None]:
def split_and_copy_dataset(
    src_img_dir,
    src_lbl_dir,
    dst_train_img_dir,
    dst_train_lbl_dir,
    dst_val_img_dir,
    dst_val_lbl_dir,
    split_ratio=0.9
):
    """
    Splits images and labels into train/val sets and copies them to destination folders.
    """
    if os.path.exists(dst_train_img_dir):
        shutil.rmtree(dst_train_img_dir)
    if os.path.exists(dst_train_lbl_dir):
        shutil.rmtree(dst_train_lbl_dir)
    if os.path.exists(dst_val_img_dir):
        shutil.rmtree(dst_val_img_dir)
    if os.path.exists(dst_val_lbl_dir):
        shutil.rmtree(dst_val_lbl_dir)
    os.makedirs(dst_train_img_dir, exist_ok=True)
    os.makedirs(dst_train_lbl_dir, exist_ok=True)
    os.makedirs(dst_val_img_dir, exist_ok=True)
    os.makedirs(dst_val_lbl_dir, exist_ok=True)

    img_files = [f for f in os.listdir(src_img_dir) if f.lower().endswith(('.jpg', '.png'))]
    random.shuffle(img_files)
    split_idx = int(len(img_files) * split_ratio)
    train_files = img_files[:split_idx]
    val_files = img_files[split_idx:]

    def copy_files(file_list, img_dst, lbl_dst):
        for img_file in file_list:
            img_src_path = os.path.join(src_img_dir, img_file)
            lbl_src_path = os.path.join(src_lbl_dir, os.path.splitext(img_file)[0] + '.txt')
            shutil.copy2(img_src_path, os.path.join(img_dst, img_file))
            if os.path.exists(lbl_src_path):
                shutil.copy2(lbl_src_path, os.path.join(lbl_dst, os.path.splitext(img_file)[0] + '.txt'))

    copy_files(train_files, dst_train_img_dir, dst_train_lbl_dir)
    copy_files(val_files, dst_val_img_dir, dst_val_lbl_dir)
    print(f"Copied {len(train_files)} images to train and {len(val_files)} images to val folders.")

def split_and_copy_all_processed(processed_root, dst_model_root, split_ratio=0.9, sample_ratio=0.2):
    """
    Dynamically go through all augmentation folders in processed, randomly select sample_ratio of images,
    then split and copy to model/train and model/val. Removes 'train-' prefix from destination folder names.
    """
    for aug_type in os.listdir(processed_root):
        aug_img_dir = os.path.join(processed_root, aug_type, 'images')
        aug_lbl_dir = os.path.join(processed_root, aug_type, 'labels')
        if not (os.path.isdir(aug_img_dir) and os.path.isdir(aug_lbl_dir)):
            continue
        dst_folder = aug_type.replace('train-', '')
        dst_train_img = os.path.join(dst_model_root, 'train', dst_folder, 'images')
        dst_train_lbl = os.path.join(dst_model_root, 'train', dst_folder, 'labels')
        dst_val_img = os.path.join(dst_model_root, 'val', dst_folder, 'images')
        dst_val_lbl = os.path.join(dst_model_root, 'val', dst_folder, 'labels')

        # Select a random sample of images
        img_files = [f for f in os.listdir(aug_img_dir) if f.lower().endswith(('.jpg', '.png'))]
        sample_size = max(1, int(len(img_files) * sample_ratio))
        sampled_files = random.sample(img_files, sample_size)

        # Split sampled files into train/val
        random.shuffle(sampled_files)
        split_idx = int(len(sampled_files) * split_ratio)
        train_files = sampled_files[:split_idx]
        val_files = sampled_files[split_idx:]

        def copy_files(file_list, img_dst, lbl_dst):
            # Delete folders if they exist, then create again
            if os.path.exists(img_dst):
                shutil.rmtree(img_dst)
            if os.path.exists(lbl_dst):
                shutil.rmtree(lbl_dst)
            os.makedirs(img_dst, exist_ok=True)
            os.makedirs(lbl_dst, exist_ok=True)
            for img_file in file_list:
                img_src_path = os.path.join(aug_img_dir, img_file)
                lbl_src_path = os.path.join(aug_lbl_dir, os.path.splitext(img_file)[0] + '.txt')
                shutil.copy2(img_src_path, os.path.join(img_dst, img_file))
                if os.path.exists(lbl_src_path):
                    shutil.copy2(lbl_src_path, os.path.join(lbl_dst, os.path.splitext(img_file)[0] + '.txt'))

        copy_files(train_files, dst_train_img, dst_train_lbl)
        copy_files(val_files, dst_val_img, dst_val_lbl)
        print(f"Processed augmentation: {dst_folder} | Train: {len(train_files)} | Val: {len(val_files)}")

# Example usage:
split_and_copy_dataset(
    src_img_dir='../data/raw/train/images',
    src_lbl_dir='../data/raw/train/labels',
    dst_train_img_dir='../data/model/train/raw/images',
    dst_train_lbl_dir='../data/model/train/raw/labels',
    dst_val_img_dir='../data/model/val/raw/images',
    dst_val_lbl_dir='../data/model/val/raw/labels',
    split_ratio=0.9)

# Example usage:
split_and_copy_all_processed('../data/processed', '../data/model', split_ratio=0.9, sample_ratio=0.2)


## Visualizing Object Detection Results on Test Images
This code snippet demonstrates how to visualize YOLOv8 model predictions alongside ground truth annotations on a sample of test images.

**Workflow**
1.  **Load Pretrained Model**
The YOLOv8 model is loaded from saved weights located in the project directory (best.pt).

2. **Prepare Test Data Paths**
Paths to test images and their corresponding label files (in YOLO format) are specified.

3. **Random Sampling of Test Images**
A configurable number (num_images) of test images are randomly selected to display.

4. **Color Coding for Bounding Boxes**
A predefined set of RGB colors assigns unique colors to each object class for easy distinction during visualization.

**Benefits**
Allows side-by-side comparison of model output with ground truth to assess detection quality visually.

Color-coded bounding boxes enable easy distinction between different object classes.

Random sampling helps to get an unbiased view of model performance over the test set.

**Usage**
Modify num_images to control how many test images are visualized in each run. Ensure test image and label directories are correctly set to the dataset locations.

In [None]:
import random
import matplotlib.pyplot as plt
from PIL import Image, ImageDraw
import os
from ultralytics import YOLO

# Load model
model = YOLO('../runs/train/motorbike_yolov8s/weights/best.pt')

# Path to test images and label files
test_img_dir = '../data/raw/test/images'
test_label_dir = '../data/raw/test/labels'

# Configure number of images to display
num_images = 10

img_files = [f for f in os.listdir(test_img_dir) if f.endswith(('.png', '.jpg', '.jpeg'))]
num_images = min(num_images, len(img_files))  # Avoid exceeding available images
selected_files = random.sample(img_files, num_images)

colors = {0: (255, 0, 0), 1: (0, 255, 0), 2: (0, 255, 255), 3: (255, 165, 0), 4: (0, 0, 255)}

def draw_boxes(image_path, boxes, class_ids=None):
    img = Image.open(image_path).convert("RGB")
    draw = ImageDraw.Draw(img)
    for i, box in enumerate(boxes):
        color = colors[class_ids[i]] if class_ids is not None else (255, 0, 0)
        draw.rectangle(box, outline=color, width=2)
    return img

def read_label_file(label_file, image_path):
    boxes = []
    class_ids = []
    with open(label_file, 'r') as f:
        for line in f:
            parts = line.strip().split()
            class_id = int(parts[0])
            x_center, y_center, w, h = map(float, parts[1:])
            img_w, img_h = Image.open(image_path).size
            x1 = (x_center - w/2) * img_w
            y1 = (y_center - h/2) * img_h
            x2 = (x_center + w/2) * img_w
            y2 = (y_center + h/2) * img_h
            boxes.append((x1, y1, x2, y2))
            class_ids.append(class_id)
    return boxes, class_ids

fig, axes = plt.subplots(num_images, 3, figsize=(15, num_images*5))

for i, img_file in enumerate(selected_files):
    img_path = os.path.join(test_img_dir, img_file)
    label_path = os.path.join(test_label_dir, img_file.rsplit('.', 1)[0] + '.txt')

    # Original Image
    orig_img = Image.open(img_path).convert("RGB")
    axes[i, 0].imshow(orig_img)
    axes[i, 0].set_title("Original Image")
    axes[i, 0].axis('off')

    # Ground Truth
    gt_boxes, gt_classes = read_label_file(label_path, img_path)
    img_gt = draw_boxes(img_path, gt_boxes, class_ids=gt_classes)
    axes[i, 1].imshow(img_gt)
    axes[i, 1].set_title("Ground Truth")
    axes[i, 1].axis('off')

    # Model Prediction
    results = model(img_path)
    boxes_pred = results[0].boxes.xyxy.cpu().numpy()
    class_ids_pred = results[0].boxes.cls.cpu().numpy().astype(int)
    img_pred = draw_boxes(img_path, boxes_pred, class_ids=class_ids_pred)
    axes[i, 2].imshow(img_pred)
    axes[i, 2].set_title("Model Prediction")
    axes[i, 2].axis('off')

plt.tight_layout()
plt.show()
