# Project ARI3129 - Object Detection & Localisation using Yolov5
---

**Name:** Sean David Muscat

**ID No:** 0172004L

---

## Automated Dataset Management with Roboflow and Folder Organization

This script automates the process of managing a dataset using Roboflow. It creates necessary directories, checks for installed dependencies, installs them if missing, and downloads the dataset. It also organizes the dataset into a structured folder hierarchy, ensuring everything is ready for further use.

In [1]:
import os
import subprocess
import shutil
import importlib


# Constants for colored output
COLORS = {
    "green": "\033[92m",  # Green text
    "red": "\033[91m",    # Red text
    "reset": "\033[0m"    # Reset to default color
}

# Define the path to the Versions folder and the target subfolder
versions_path = os.path.abspath(os.path.join("..", "Versions"))
target_subfolder = os.path.join(versions_path, "MDD-SDM-yolov5")

# Check if the Versions folder exists, if not, create it
if not os.path.exists(versions_path):
    os.makedirs(versions_path)
    print(f"[{COLORS['green']}✔{COLORS['reset']}] Folder created at: {versions_path}")

# Check if the MDD-SDM-Yolov5 subfolder exists
if os.path.exists(target_subfolder):
    print(f"[{COLORS['green']}✔{COLORS['reset']}] The subfolder '{target_subfolder}' already exists. Skipping download!")
else:
    # Check if roboflow is installed
    if importlib.util.find_spec("roboflow") is not None:                                                                                                        # type: ignore
        print(f"[{COLORS['green']}✔{COLORS['reset']}] Roboflow is already installed!")
    else:
        # Install roboflow using pip
        try:
            subprocess.check_call(["pip", "install", "roboflow"])
            print(f"[{COLORS['green']}✔{COLORS['reset']}] Roboflow successfully installed!")
        except subprocess.CalledProcessError as e:
            print(f"[{COLORS['red']}✖{COLORS['reset']}]Failed to install Roboflow. Please check your setup.")
            raise e

    # Import and use Roboflow
    from roboflow import Roboflow                                                                                                                               # type: ignore

    # Prompt the user for their API key
    print("Please enter your Roboflow API key to download the dataset...")
    api_key = input("Please enter your Roboflow API key: ")

    # Initialize Roboflow with the provided API key
    rf = Roboflow(api_key=api_key)

    # Retrieve project and version
    project = rf.workspace("advanced-cv").project("maltese-domestic-dataset")
    version = project.version(1)

    # Download the dataset
    dataset = version.download("yolov5")

    current_folder = os.getcwd()  # Get the current working directory
    original_folder = os.path.join(current_folder, "Maltese-Domestic-Dataset--1")
    renamed_folder = os.path.join(current_folder, "MDD-SDM-yolov5")
    target_folder = os.path.join(versions_path, "MDD-SDM-yolov5")

    # Check if the original folder exists
    if os.path.exists(original_folder):
        # Rename the folder
        os.rename(original_folder, renamed_folder)

        # Move the renamed folder to ../Versions/
        shutil.move(renamed_folder, target_folder)
        print(f"[{COLORS['green']}✔{COLORS['reset']}] Folder downloaded to: {target_folder}")
    else:
        print(f"[{COLORS['red']}✖{COLORS['reset']}]Folder '{original_folder}' does not exist. No action taken.")


[[92m✔[0m] The subfolder 'e:\SEAN\Adv_CV\Versions\MDD-SDM-yolov5' already exists. Skipping download!


## Automated Library Installer in Python

This script automates checking and installing libraries from a JSON file. It verifies installations, installs missing libraries via `pip`, and provides clear, colored output for success or errors. With built-in error handling and preloaded common libraries, it simplifies dependency management in Python projects.

In [2]:
import json
import importlib.util

# Path to the JSON file
lib_file_path = os.path.join("..", "Libraries", "Task2_SDM_Lib.json")

# Read the libraries from the JSON file
try:
    with open(lib_file_path, 'r') as file:
        libraries = json.load(file)
except FileNotFoundError:
    print(f"{COLORS['red']}Error: Library file not found at {lib_file_path}{COLORS['reset']}")
    exit(1)
except json.JSONDecodeError:
    print(f"{COLORS['red']}Error: Failed to decode JSON from the library file.{COLORS['reset']}")
    exit(1)

# Function to check and install libraries
def check_and_install_libraries(libraries):
    for lib, import_name in libraries.items():
        # Check if the library is installed by checking its module spec
        if importlib.util.find_spec(import_name) is not None:
            print(f"[{COLORS['green']}✔{COLORS['reset']}] Library '{lib}' is already installed.")
        else:
            # If the library is not found, try to install it
            print(f"[{COLORS['red']}✖{COLORS['reset']}] Library '{lib}' is not installed. Installing...")
            try:
                subprocess.check_call(["pip", "install", lib])
                print(f"[{COLORS['green']}✔{COLORS['reset']}] Successfully installed '{lib}'.")
            except subprocess.CalledProcessError:
                print(f"[{COLORS['red']}✖{COLORS['reset']}] Failed to install '{lib}'. Please install it manually.")


# Execute the function to check and install libraries
check_and_install_libraries(libraries)

# Import necessary libraries 
import cv2  # OpenCV for image processing
import time
import random
import matplotlib.pyplot as plt  # For plotting and visualisation
import seaborn as sns  # For advanced visualisation
import numpy as np  # For numerical operations
import matplotlib.patches as patches  # For drawing patches on plots
import concurrent.futures
from concurrent.futures import ThreadPoolExecutor, as_completed
from tqdm import tqdm  # For progress bars
import torch  # PyTorch for Yolov5
import pandas as pd  # For handling tabular data
import yaml  # For working with data.yaml files
import scipy  # For advanced scientific calculations
from pathlib import Path  # For handling file paths
from ultralytics import YOLO


[[92m✔[0m] Library 'opencv-python' is already installed.
[[92m✔[0m] Library 'matplotlib' is already installed.
[[92m✔[0m] Library 'tqdm' is already installed.
[[92m✔[0m] Library 'ultralytics' is already installed.
[[92m✔[0m] Library 'torch' is already installed.
[[92m✔[0m] Library 'pandas' is already installed.
[[92m✔[0m] Library 'seaborn' is already installed.
[[92m✔[0m] Library 'pyyaml' is already installed.
[[92m✔[0m] Library 'scipy' is already installed.
[[92m✔[0m] Library 'tensorboard' is already installed.


In [12]:
# Absolute paths
data_yaml_path = os.path.join(os.path.abspath(os.path.join(os.getcwd(), os.pardir)), "Versions", "MDD-SDM-yolov5", "data.yaml")



# Check if data.yaml exists
if os.path.exists(data_yaml_path):
    print(f"[✔] data.yaml file exists at: {data_yaml_path}")
else:
    print(f"[✖] data.yaml file not found at: {data_yaml_path}")


[✔] data.yaml file exists at: e:\SEAN\Adv_CV\Versions\MDD-SDM-yolov5\data.yaml


In [None]:
# Path to the yolov5n.yaml file
model_config_path = os.path.join(os.path.abspath(os.path.join(os.getcwd(), os.pardir)), "Versions", "yolov5", "models", "yolov5n.yaml")

# Load the YOLO model
model = YOLO(model_config_path)

print(f"[✔] YOLOv5 model initialized with configuration: {model_config_path}")

FileNotFoundError: 'e:\SEAN\Adv_CV\Versions\yolov5\yolov5n.yaml' does not exist

In [None]:
# Path to the data.yaml file
data_yaml_path = os.path.join(os.path.abspath(os.path.join(os.getcwd(), os.pardir)), "Versions", "MDD-SDM-yolov5", "data.yaml")

# Train the YOLO model
results = model.train(
    data=data_yaml_path,  # Path to the dataset configuration
    epochs=50,            # Number of epochs (adjust as needed)
    imgsz=640,            # Image size
    batch=16,             # Batch size (adjust based on your GPU memory)
    workers=4,            # Number of data loader workers
    device='cpu'             # GPU (set to 'cpu' if GPU is unavailable)
)

print(f"[✔] Training completed! Results saved in: {results}")


Ultralytics 8.3.59  Python-3.11.9 torch-2.5.1+cpu CPU (Intel Core(TM) i5-8400 2.80GHz)
[34m[1mengine\trainer: [0mtask=detect, mode=train, model=E:\SEAN\Adv_CV\Versions\yolov5\models\yolov5n.yaml, data=E:\SEAN\Adv_CV\Versions\MDD-SDM-yolov5\data.yaml, epochs=50, time=None, patience=100, batch=16, imgsz=640, save=True, save_period=-1, cache=False, device=cpu, workers=4, project=None, name=train2, exist_ok=False, pretrained=True, optimizer=auto, verbose=True, seed=0, deterministic=True, single_cls=False, rect=False, cos_lr=False, close_mosaic=10, resume=False, amp=True, fraction=1.0, profile=False, freeze=None, multi_scale=False, overlap_mask=True, mask_ratio=4, dropout=0.0, val=True, split=val, save_json=False, save_hybrid=False, conf=None, iou=0.7, max_det=300, half=False, dnn=False, plots=True, source=None, vid_stride=1, stream_buffer=False, visualize=False, augment=False, agnostic_nms=False, classes=None, retina_masks=False, embed=None, show=False, save_frames=False, save_txt=False

100%|██████████| 755k/755k [00:00<00:00, 5.70MB/s]


Overriding model.yaml nc=80 with nc=4

                   from  n    params  module                                       arguments                     
  0                  -1  1      1760  ultralytics.nn.modules.conv.Conv             [3, 16, 6, 2, 2]              
  1                  -1  1      4672  ultralytics.nn.modules.conv.Conv             [16, 32, 3, 2]                
  2                  -1  1      4800  ultralytics.nn.modules.block.C3              [32, 32, 1]                   
  3                  -1  1     18560  ultralytics.nn.modules.conv.Conv             [32, 64, 3, 2]                
  4                  -1  2     29184  ultralytics.nn.modules.block.C3              [64, 64, 2]                   
  5                  -1  1     73984  ultralytics.nn.modules.conv.Conv             [64, 128, 3, 2]               
  6                  -1  3    156928  ultralytics.nn.modules.block.C3              [128, 128, 3]                 
  7                  -1  1    295424  ultralytics

[34m[1mtrain: [0mScanning E:\SEAN\Adv_CV\Versions\MDD-SDM-yolov5\Maltese-Domestic-Dataset--1\train\labels... 748 images, 0 backgrounds, 0 corrupt: 100%|██████████| 748/748 [00:04<00:00, 186.37it/s]


[34m[1mtrain: [0mNew cache created: E:\SEAN\Adv_CV\Versions\MDD-SDM-yolov5\Maltese-Domestic-Dataset--1\train\labels.cache


[34m[1mval: [0mScanning E:\SEAN\Adv_CV\Versions\MDD-SDM-yolov5\Maltese-Domestic-Dataset--1\valid\labels... 70 images, 0 backgrounds, 0 corrupt: 100%|██████████| 70/70 [00:00<00:00, 164.20it/s]


[34m[1mval: [0mNew cache created: E:\SEAN\Adv_CV\Versions\MDD-SDM-yolov5\Maltese-Domestic-Dataset--1\valid\labels.cache
Plotting labels to C:\Users\Sean Muscat\runs\detect\train2\labels.jpg... 
[34m[1moptimizer:[0m 'optimizer=auto' found, ignoring 'lr0=0.01' and 'momentum=0.937' and determining best 'optimizer', 'lr0' and 'momentum' automatically... 
[34m[1moptimizer:[0m AdamW(lr=0.00125, momentum=0.9) with parameter groups 69 weight(decay=0.0), 76 weight(decay=0.0005), 75 bias(decay=0.0)
[34m[1mTensorBoard: [0mmodel graph visualization added 
Image sizes 640 train, 640 val
Using 0 dataloader workers
Logging results to [1mC:\Users\Sean Muscat\runs\detect\train2[0m
Starting training for 50 epochs...

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


       1/50         0G      3.413      4.582      4.239         66        640:  51%|█████     | 24/47 [02:25<02:19,  6.06s/it]


KeyboardInterrupt: 