# UN number detection


## Business Understanding
For this project, we have been tasked with developing a machine learning model capable of recognizing UN number hazard plates. These plates, commonly displayed on freight train wagons, indicate the types of hazardous materials being transported. The successful implementation of this model will contribute to a more efficient and secure railway system across the EU.

<img src="images/hazard_plate.jpg" alt="Hazard Plate" width="500"/>

The hazard plates play a crucial role in ensuring the safety of transportation by providing essential information about the nature of the substances on board, such as flammability, toxicity, or corrosiveness. By automating the recognition process with machine learning, the handling and tracking of these hazardous materials can be streamlined, reducing manual labor and minimizing potential human errors.
 Determine business objectives

### Determine business objectives
#### Background
The specific expectations and objectives of the EU for this project are not yet fully defined, but the initiative's roots are clear. This project is spearheaded by the University of Twente, with researcher Mellisa Tijink serving as our supervisor. Our team, composed of pre-master's Computer Science students, has been tasked with developing the machine learning model. Mellisa Tijink plays a pivotal role as the intermediary between our team and major stakeholders, including ProRail, the EU, and other experts in the field.

This project is part of a broader initiative aimed at enhancing rail freight operations within Europe, aligning with the EU’s goals for improved efficiency and safety. More information on the initiative can be found on the official project site: [EU Rail FP5](https://projects.rail-research.europa.eu/eurail-fp5/).

Flagship Project 5: *TRANS4M-R aims to establish rail freight as the backbone of a low-emission, resilient European logistics chain that meets end-user needs. It focuses on two main technological clusters: 'Full Digital Freight Train Operation' and 'Seamless Freight Operation', which will develop and demonstrate solutions to increase rail capacity, efficiency, and cross-border coordination. By integrating Digital Automatic Coupler (DAC) solutions with software-defined systems, the project seeks to optimize network management and enhance cooperation among infrastructure managers. The ultimate goal is to create an EU-wide, interoperable rail freight framework with unified technologies and seamless operations across borders and various stakeholders, boosting the EU transport and logistics sector.*

#### Business objectives

**Primary Objective:** Develop an object detection model for UN number hazard plates on freight wagons.

**Sub-objectives:**
1. Detect and identify UN number hazard plates: Ensure the model can accurately locate hazard plates on freight wagons. 
2. Read and interpret the UN numbers: Implement recognition capabilities to accurately read the numbers on the detected plates.
3. Ensure model robustness and accuracy: Train the model to achieve high accuracy and reliability under various conditions (e.g., different lighting, weather).
4. Optimize model for speed: Make sure the model runs efficiently and in real-time to function on moving trains.
5. Adapt the model for moving environments: Design and test the model to handle the unique challenges of detecting and reading plates on trains in motion. 

### Assess Situation

#### Inventory of resources

**Business Experts:** Our team currently lacks extensive expertise in this area. We can consult Melissa for some questions, and we have an upcoming interview with a Swedish expert in the field of UN number hazard plates.

**Data Mining Team:** 
- Melissa Tijink (Researcher in Data Management & Biometrics/Electrical Engineering, Mathematics, and Computer Science)
- Ewaldo Nieuwenhuis (Pre-master student in Computer Science)
- Stanislav Levendeev (Pre-master student in Computer Science)

**Data:**
1. **Video Data of Freight Trains:** This consists of video footage of moving freight trains, where the freight wagons should display the UN numbers.
2. **Line Scan Camera Pictures:** These are high-resolution images of the train, but they are very spread out. It is still uncertain if these will be useful.
3. **Photos of ADR Warning Signs:** These are images of ADR signs on freight trains. However, this is not exactly what we need since our objective is to build a model that recognizes UN numbers.

**Computing Resources:** We have access to a cluster from the University of Twente, which we can use to train or fine-tune our model.

**Software:** We will use Python, Jupyter Notebook, Keras, PyTorch, and TensorFlow for analyzing, cleaning, preparing the data, and modeling. For data labeling, we will use [CVAT](https://www.cvat.ai/).

### Requirements, assumptions, and constraints

##### Requirements
- Object detection capability for UN number hazard plates.
- Text recognition to read and extract UN numbers.
- High accuracy and precision in detection and recognition.
- Robust performance under varying conditions (weather, lighting, speed).
- Speed optimization for fast processing with minimal lag
- Real-time processing for operation on moving trains.

##### Assumptions
- Consistent access to a high-performance computational cluster for model training and testing.
- The high-performance cluster is necessary due to the heavy processing demands of deep learning models.
- Local machines are not sufficient for the required high computational tasks.
- Project-specific data, including images and videos of freight trains with hazard plates, will be provided as planned.
- Data will include varied conditions (different lighting and weather) to ensure robustness.
- Access to diverse data is essential for creating a model that generalizes well to real-world scenarios.
- If the planned data is unavailable, additional time will be needed to source and prepare alternative public datasets.
- Sourcing alternative datasets may affect the project timeline and the quality of the final outcomes.
- The stakeholders will provide timely feedback to guide any changes or adaptations needed in the project.

##### Constraints
- The team has restricted experience with advanced object detection methods, which may impact the initial development and refinement of the model.
- Most of the available data is not labeled, presenting a challenge for training supervised machine learning models. Some labeled data exists but belongs to another researcher, and access to it is uncertain.
- The project must be completed within a short, 9-week period, which constrains the depth and breadth of potential research and model development.
- The dataset may be skewed with an overrepresentation of specific UN numbers from certain wagons, which could limit the model's ability to generalize across different scenarios.
- The size of the dataset makes it difficult to filter out specific wagons or relevant segments efficiently, posing a challenge for data processing and targeted training

#### Risks and Contingencies

**1. Lack of Data Access:**  
*Risk:* Currently, we do not have access to the necessary video or linescan data, and there is a risk that we may never obtain it.  
*Contingency Action:* Search for publicly available open-source datasets containing UN codes to proceed with model training and development.

**2. Loss of Access to the Computational Cluster:**  
*Risk:* While we currently have access to a high-performance cluster for training, loading, and fine-tuning models, there is a chance of losing this access due to technical failures or maintenance issues.  
*Contingency Action:* Prepare to train, load, and fine-tune a smaller version of the model locally on personal computers.

**3. Unavailability of Labeling Software:**  
*Risk:* We plan to label the data with the help of our supervisor, Melissa, which is essential for fine-tuning and evaluating the model. If this step is delayed or cannot occur, it will impede progress.  
*Contingency Action:* Learn how to use CVAT (Computer Vision Annotation Tool) and set it up on personal laptops to carry out data labeling independently.

**4. Inaccessibility of Personal Laptops:**  
*Risk:* Access to our laptops is crucial for development, data handling, and connecting to the cluster. If our laptops become unusable due to malfunction, our work will be disrupted.  
*Contingency Action:* Use backup laptops that are ready for project work to ensure continuity.

#### Terminology

**Business Terminology:**
- **UN Number Hazard Plates**: Identification plates with UN numbers that indicate the nature of hazardous materials, improving safety during transport.
- **Freight Trains**: Trains used for transporting goods, especially relevant when carrying hazardous materials.
- **Flagship Project 5**: A project within the European "Europe’s Rail" initiative, focused on applying technologies to enhance rail transport safety.
- **ADR (European Agreement concerning the International Carriage of Dangerous Goods by Road)**: International regulations governing the transport of hazardous goods.
- **ProRail**: The Dutch railway network manager responsible for maintaining the railways.
- **Line-Scan Camera**: A camera that captures images one line at a time for capturing objects like fast-moving trains.

**Data Mining Terminology:**
- **CRISP-DM (Cross-Industry Standard Process for Data Mining)**: A widely used methodology for managing data mining projects, consisting of six phases:
  - **Business Understanding**: Defining objectives from a business perspective.
  - **Data Understanding**: Collecting and analyzing data to gain insights.
  - **Data Preparation**: Preparing data, such as annotating and normalizing, for model training.
  - **Modeling**: Selecting and training models for the desired task.
  - **Evaluation**: Assessing model performance using specific metrics.
  - **Deployment**: Implementing the model in real-world applications.

- **Object Detection**: Identifying specific objects (e.g., hazard plates) within images or videos.
- **Optical Character Recognition (OCR)**: Extracting text from images, used here to read numbers on hazard plates.
- **YOLO (You Only Look Once)**: A fast object detection model ideal for real-time applications.
- **Faster R-CNN**: A more accurate but slightly slower object detection model, suitable for complex environments.
- **Annotation**: Marking data (e.g., video frames) with labels like bounding boxes to create ground truth for model training.
- **Bounding Boxes**: Rectangular boxes used in image processing to define regions of interest around an object.
- **Normalization**: Adjusting data to a standard scale to ensure consistency in model input.
- **Augmentation**: Enhancing training data through techniques like contrast adjustment to improve model robustness.
- **Average Precision (AP)**: A metric for evaluating the accuracy of object detection models.
- **Tesseract**: A commonly used OCR tool for extracting alphanumeric text from images.
- **HOG (Histogram of Oriented Gradients)**: A feature descriptor used in object detection, especially for detecting shapes or text.
- **Saliency Detection**: An algorithmic technique to identify key areas within images for focused analysis.
- **Support Vector Regression (SVR)**: A machine learning algorithm for regression tasks, sometimes used to create likelihood maps for image processing.


### Determine Data Mining Goals

#### Data Mining Goals
**Primary Data Mining Goal:** Create and train an object detection model capable of identifying and interpreting UN number hazard plates on freight wagons in real-time.

**Specific Data Mining Goals:**
1. **Object Detection and Localization**: Develop a model that achieves a high AP score for accurately detecting and localizing hazard plates on freight wagons within each video frame.

2. **OCR for UN Number Extraction:** Use Tesseract to apply Optical Character Recognition (OCR) for accurately reading UN numbers on detected plates, aiming to optimize precision and minimize errors in text recognition.

3. **Robustness Across Variable Conditions**: Enhance the model’s robustness by training it on datasets representing diverse lighting and weather conditions, with a goal to maintain high AP scores across these environments.

4. **Optimization for Real-Time Processing**: Implement real-time object detection and OCR capabilities to ensure the model operates at a frame rate suitable for analyzing images from moving trains.

#### Data Mining Success Criteria

- **Object Detection AP**: Achieve an Mean Average Precision (mAP) of at least 0.70 for detecting and localizing hazard plates across varied conditions.
- **OCR Precision for UN Numbers**: Ensure the Tesseract OCR module achieves high accuracy in reading UN numbers, even under challenging conditions, with a target precision score above 0.95.
- **Processing Speed**: Ensure the model achieves a processing time per frame under 100 milliseconds to maintain real-time functionality.
- **Environmental Robustness**: Maintain consistent mAP scores across different lighting and weather conditions.


# Imports

In [None]:
# Imports
import os
import cv2
import pandas as pd
import matplotlib.pyplot as plt
import hashlib
from ultralytics import YOLO
from IPython.display import HTML
import kagglehub
import torch
import pytesseract
import regex as re
import math as Math
import easyocr
import numpy as np
from transformers import Idefics2Processor, Idefics2ForConditionalGeneration, BitsAndBytesConfig
from PIL import Image
import time
from torch.cuda.amp import autocast, GradScaler
from torchvision import models, transforms
from torch.utils.data import DataLoader, Dataset, Subset
from torchvision.models.detection import FasterRCNN
from torchvision.models.detection.rpn import AnchorGenerator
import torchvision.transforms as T
import torchvision.transforms.functional as F
from pycocotools.coco import COCO
from pycocotools.cocoeval import COCOeval
from torchvision.models.detection.backbone_utils import resnet_fpn_backbone
from torchvision.ops import MultiScaleRoIAlign
import json
import contextlib
import io
import random
from datetime import datetime
from tqdm import tqdm
import albumentations as A

# Data Understanding

## Collect Initial Data

In [None]:
# Directory containing video files
video_directory = os.environ["PATH_TO_DATA"]
print (video_directory)

# Specify output directory for detected frames
output_path = os.environ["OUTPUT_PATH"]
output_directory = None
if not os.path.exists(output_path):
    output_directory = './data/output_frames'
    os.makedirs(output_directory, exist_ok=True)
else:
    output_directory = output_path
print (output_directory)


In [None]:
# Get all filenames in the directory
video_files = [f for f in os.listdir(video_directory) if f.endswith(('.mp4'))]
video_files[0]
df = pd.read_csv('data/data_understanding_2024-11-28.csv')
df.head()

## Describe Data

**Column Description:**

- **Unnamed: 0**: The ID of the video.
- **filename**: The name of the file, including the `.mp4` extension.
- **fps**: Frames per second.
- **frame_count**: The total number of frames in the video.
- **width**: The width of the video in pixels.
- **height**: The height of the video in pixels.
- **resolution**: The video resolution, expressed as `width x height`.
- **duration_seconds**: The video's duration in seconds.
- **hash**: Hash of the video to check if it is original
- **file_size_mb**: The file size in megabytes.
- **train_detected**: Indicates whether a train was detected using a YOLO model. If the model's confidence score exceeded 10%, a train is considered detected, though this may not always be accurate.
- **confidence**: The confidence score indicating how likely it is that the video contains a train.

In [None]:
df.shape

In [None]:
df.describe()

In [None]:
df.info()

In [None]:
df.isnull().sum()

## Explore Data

In [None]:
no_trains = sum(df['train_detected'] == 0)
print(f"On {no_trains} videos there haven't been detected any trains. Of a total of {df.shape[0]} videos.")


In [None]:
no_trains_percentage = no_trains / df.shape[0] * 100
print(f"This is {no_trains_percentage:.1f}% of all videos.")

In [None]:
df['hash'].value_counts(ascending=False)

They seem to be all orginal in terms of hashing, we probably cannot determine duplicates by hash alone

In [None]:
# Load the MP4 video
video = "1690279852.mp4"
video2 = "1690281303.mp4"
video_path = video_directory+'/'+video
video2_path = video_directory+'/'+video2
# Embed video in the notebook

def get_video_html(video_path):
    return HTML(f"""
      <h1>Video {video_path}</h1>
      <video width="480" height="320" controls>
        <source src="{video_path}" type="video/mp4">
        Your browser does not support the video tag.
      </video>
    """)


In [None]:
get_video_html(video_path)

In [None]:
get_video_html(video2_path)

**These are the same videos, the second video is only one second longer than the first video. There are duplicate video's in this dataset**

In [None]:
df['train_detected'].value_counts()

In [None]:
df[df['confidence'].isnull() == False]['confidence'].describe()

In [None]:
HTML(f"<b>Minimal confidence: {df['confidence'].min():.2f}, maximal confidence: {df['confidence'].max():.2f}, average confidence: {df['confidence'].mean():.2f}</b>")    

In [None]:
video_max_confidence = df[df['confidence'] == df['confidence'].max()].iloc[0]['filename']
video_min_confidence = df[df['confidence'] == df['confidence'].min()].iloc[0]['filename']

print(f"Video with maximal confidence: {video_max_confidence} ({df["confidence"].max():.2f}%)")
get_video_html(video_directory+"/"+video_max_confidence)

In [None]:
print(f"Video with minimal confidence: {video_min_confidence} ({df["confidence"].min():.2f}%)")
get_video_html(video_directory+"/"+video_min_confidence)

**In the second video, trains are detected, but there are no actual trains present. This indicates that the 'train_detected' column lacks certainty. It may be more effective for a human to review and filter videos to identify those with trains and those without. Alternatively, training a model specifically to recognize freight trains could be considered, although this falls outside the scope of this project.**

The frames per second (FPS) in your video data can influence the performance of training your model. FPS affects the temporal resolution and the amount of data fed into the model. When training on video data, the FPS determines how many frames are available to capture motion or other temporal patterns, which can influence model performance.

For example, if you use a higher FPS, your model will have more frames to analyze within a given time frame, potentially improving its ability to capture finer details in motion (e.g., in object detection or action recognition tasks). However, processing more frames per second can also lead to higher computational costs and may require more memory and processing power, which might reduce training efficiency unless properly optimized.
[source 1](https://library.fiveable.me/key-terms/deep-learning-systems/frames-per-second-fps)


Conversely, lower FPS can reduce computational demands but may also decrease the temporal resolution of your data, making it harder for your model to accurately capture fast movements or dynamic changes. Depending on your specific use case, you'll need to balance FPS with your model's ability to process the data effectively while managing computational resources.
[source 2](https://paulbridger.com/posts/video-analytics-pipeline-tuning/)

It's also important to consider other factors like video resolution and preprocessing techniques, which could further affect how FPS influences your model's performance.

In [None]:
def plot_distribution(df, column, xlabel, ylabel="Frequency", bins=10):
    plt.figure(figsize=(8, 5))
    plt.hist(df[column], bins=bins, color='skyblue', edgecolor='black', alpha=0.7)
    plt.title(f"Distribution of {xlabel}", fontsize=14)
    plt.xlabel(xlabel, fontsize=12)
    plt.ylabel(ylabel, fontsize=12)
    plt.grid(axis='y', linestyle='--', alpha=0.7)
    plt.show()

In [None]:

plot_distribution(df, 'fps', xlabel='Frames Per Second (FPS)')
plot_distribution(df, 'frame_count', xlabel='Total Frames', bins=20)
plot_distribution(df, 'duration_seconds', xlabel='Duration (seconds)', bins=20)
plot_distribution(df, 'file_size_mb', xlabel='File Size (MB)', bins=15)
plot_distribution(df, 'resolution', xlabel='Resolution', bins=15)


In [None]:

def plot_resolutions(df):
    plt.figure(figsize=(8, 6))
    plt.scatter(df['width'], df['height'], c='orange', alpha=0.7, edgecolors='black')
    plt.title("Resolution Scatter Plot (Width vs Height)", fontsize=14)
    plt.xlabel("Width (pixels)", fontsize=12)
    plt.ylabel("Height (pixels)", fontsize=12)
    plt.grid(linestyle='--', alpha=0.7)
    plt.show()

plot_resolutions(df)


UN number codes with labels

In [None]:
df_un = pd.read_csv("./data/un-number-labels.csv")
def get_hin_description(hin):
    if(isinstance(hin, int) == False):
        hin = int(hin)
    hin_row = df_un[df_un['number'] == hin]
    return hin_row['description'].values[0] if hin_row.shape[0] > 0 else None
df_un.head()

## Data Preperation

See `data-prep-coco.ipynb` and `data-preperation.ipynb`. 
See `data_augmentation_faster_rcnn.ipynb` for an attempt at data augmentation

# Modeling YOLOV11

YOLO (You Only Look Once) is a popular object detection algorithm known for its speed and accuracy. YOLO models are designed to detect objects in images or video frames by dividing the image into a grid and predicting bounding boxes and class probabilities for each grid cell. YOLOv3 is one of the most widely used versions of the YOLO algorithm, offering a good balance between speed and accuracy.

In [None]:
path = kagglehub.dataset_download("stanislavlevendeev/hazmat-detection")
model = YOLO("yolo11n.pt")
print(torch.cuda.is_available())
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = model.to(device)
print(path)
print(device)
torch.zeros(1).cuda()

### Training Yolo

In [None]:
results = model.train(
    data=path+'\\yolo\\dataset.yaml', 
    epochs=10,
    scale=0.5,
    shear=1.1,
    device=device,
    degrees=10.5,
    perspective=0.5,
    mosaic=0.5,
    hsv_h=0.015,
    hsv_s=0.7,
    hsv_v=0.4,
    multiscale=True,
    )
model.save("data\\yolo\\yolo11n_trained.pt")

# Evaluation YOLO v11

In [None]:
%matplotlib inline
def draw_rectangles(image_path, results):
    # Read the image
    image = cv2.imread(image_path)
    img_height, img_width, _ = image.shape
    boxes = results.boxes   
    for box in boxes:
        x_min, y_min, x_max, y_max = map(int, box.xyxy[0])  # Convert to integers
        confidence = box.conf[0]  # Confidence score
        class_id = int(box.cls[0])  # Class ID
        label = results.names[class_id]  # Class label

        # Draw the bounding box
        cv2.rectangle(image, (x_min, y_min), (x_max, y_max), (0, 255, 0), 2)
        # Put the label and confidence score
        cv2.putText(image, f"{label} {confidence:.2f}", (x_min, y_min - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.9, (0, 255, 0), 2)

    
    # Convert BGR image to RGB
    image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
    
    # Display the image using matplotlib
    plt.figure(figsize=(10, 10))
    plt.imshow(image)
    plt.axis('off')
    plt.show()
    

In [None]:
# Load model
model = YOLO(r".\data\yolo\best_scaled.pt")

In [None]:

# get predictions for this image
results = model("images/two_signs_different_distance.jpg")
for result in results:
    print(result.boxes)
    draw_rectangles(result.path, result)
    

#### Model metrics

In [None]:
from ultralytics import YOLO
import torch
import random
import matplotlib.pyplot as plt
import cv2
import pandas as pd

<p> 10 epochs </p>
<img src="images/yolo/results/results_10_epochs.png" alt="YOLOv11 Results" width="99%"/>
<p> 20 epochs </p>
<img src="images/yolo/results/results_20_epochs.png" alt="YOLOv11 Results" width="99%"/>


The YOLO11x model with the greatest amount of parameters is chosen. The model is trained for 10 and 20 epochs on the data and the results are shown above. The fintuned model Faster R-CNN has been trained for 18 epochs and reached a these maximal metrics:
- **bounding box regression loss(train/box_loss)** → Epoch 20, value: 0.98676
- **classification loss(train/cls_loss)** → Epoch 20, value: 0.40708
- **distribution focal loss(train/dfl_loss)** → Epoch 20, value: 1.02294
- **precision(metrics/precision(B))** → Epoch 19, value: 0.99512
- **recall(metrics/recall(B))** → Epoch 20, value: 0.99319
- **mAP50(metrics/mAP50(B))** →  Epoch 15, value: 0.99477
- **mAP50-95(metrics/mAP50-95(B))** → Epoch 10, value: 0.56793

The model which was chosen as the best model was the model with the highest mAP@IoU=0.50:0.95 (overall mAP), so the checkpoint at epoch 10

When evaluating the best model on the test set we get these metrics:
- train/box_loss: 0.98615
- train/cls_loss: 0.40763
- train/dfl_loss: 0.96396
- metrics/precision(B): 0.98832
- metrics/recall(B): 0.98773
- metrics/mAP50(B): 0.99365
- metrics/mAP50-95(B): 0.57693

#### Training analysis



The training metrics show that the model has performed well in terms of precision, recall, and mAP. The precision and recall values are close to 1, indicating that the model can accurately detect and classify the UN number hazard plates. The mAP50 and mAP50-95 values are also high, suggesting that the model performs well across different levels of IoU.

During the testing phase, the model trained for 20 epochs achieved worse results than the model trained for 10 epochs. This could be due to overfitting or other factors affecting the model's generalization ability. The model trained for 10 epochs showed better performance on the test set, with high precision, recall, and mAP values.

In [None]:
model_path = ".\\data\\yolo\\best_augmented_scaled.pt"
model = YOLO(model_path)
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = model.to(device)

#### Confusion matrix


<img src="images/yolo/results/confusion_matrix.png" alt="Confusion matrix" width="60%"/>

From 1047 images of validation est only 19 images were considered as false positives and only 9 false negatives. The model has a high precision and recall, which is reflected in the confusion matrix. The model has a high true positive rate and a low false positive rate, indicating that it can accurately detect and classify UN number hazard plates.

However, due to the almost the same conditions of the training dataset, the model may not generalize well to other scenarios or environments. Further testing on diverse datasets is recommended to assess the model's robustness and generalization capabilities. Also the continued data augmentation and training on more diverse data can improve the model's performance in real-world scenarios.

##### False negative



Because the dataset consisted mostly from the real-case scenarios where the un-number placard is even far from the camera, the model has a hard time detecting the placard, placed close to the camera. Even though data augmentation is used, it can still be helpful to apply some scale augmentation to improve models accuracy on the close-up images.

For example, the un number placard in the following image is not detected by the model, because it is too close to the camera. 

<img src="images/big_sign.jpg" alt="False negative" width="60%"/>

#### Wheather conditions



For this project, it was essential to develop a robust model capable of performing well under various weather conditions. However, our dataset primarily consisted of images captured under different lighting conditions, such as day and night, without significant weather variations.  

To address this limitation, we evaluated the model using augmented images generated with the **Albumentations** library, applying weather-related transformations such as **rain, sunflare, shadow, and fog.** 

The model performed reasonably well on almost of all the conditions. However, the sinflare condition was the most challenging for the model, as it significantly impacted the visibility of the UN number hazard plates. Sot the model's performance was lower under this condition compared to others.

For example in this image the model failed to detect the un-number placard, because of the sunflare.
<img src="images/yolo/results/sunflare.png" alt="Sunflare" width="60%"/>

Here are some examples of the model's performance under different weather conditions:
- **Rain**: The model was able to detect the UN number hazard plates accurately under rainy conditions, showing robustness to weather-related challenges.

<img src="images/yolo/rain_success.png" alt="Rain" width="60%"/>


- **Fog**: The model performed well under foggy conditions, indicating its ability to handle reduced visibility scenarios.


<img src="images/yolo/fog_success.png" alt="Rain" width="60%"/>


- **Shadow**: The model performed well under shadow conditions, indicating its ability to handle variations in lighting and contrast.


<img src="images/yolo/shadow_success.png" alt="Rain" width="60%"/>


- **Sunflare**: The model struggled to detect the UN number hazard plates under sunflare conditions, likely due to the glare and reduced visibility caused by direct sunlight.

<img src="images/yolo/sunflare_success.png" alt="Rain" width="60%"/>

In [None]:
path_to_augmented_images = ".\\data\\augmented_images"
def show_image_with_boxes(image_path, boxes):
    # Read the image
    image = cv2.imread(image_path)
    image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)  # Convert to RGB for matplotlib

    # Plot the image
    plt.figure(figsize=(10, 10))
    plt.imshow(image)

    for box in boxes:
        x_min, y_min, x_max, y_max = map(int, box.xyxy[0])  # Convert to integers
        confidence = box.conf[0]  # Confidence score

        
        # Draw the bounding box
        plt.gca().add_patch(plt.Rectangle((x_min, y_min), x_max - x_min, y_max - y_min, edgecolor='green', facecolor='none', linewidth=2))

        # Put the label and confidence score
        plt.text(x_min, y_min - 10, f"Code: {confidence:.2f}", color='green', fontsize=12)

    plt.axis('off')
    plt.show()

def predict_augmented_images(augment):
    results = model(path_to_augmented_images + f"\\" + augment, show_boxes=True, task="detect", verbose=False)
    random_results = random.sample(results, min(5, len(results)))
    for result in random_results:
        show_image_with_boxes(result.path, result.boxes)

##### Fog

In [None]:
predict_augmented_images("fog")

##### Rain

In [None]:
predict_augmented_images("rain")

##### Shadow

In [None]:
predict_augmented_images("shadow")

##### Sunflare

#### **Specific Data Mining Goals:**

##### **1. Object Detection and Localization:**
Develop a model that achieves a high AP score for accurately detecting and localizing hazard plates on freight wagons within each video frame.

**Approach and Outcome:**  
To accomplish this goal, we finetuned a YOLO11x model, which demonstrated promising results in localizing hazard placards. However, the model is not yet fully robust and sometimes struggles to detect placards in challenging scenarios, such as when they are placed close to the camera or under direct sunlight (sunflare). The model's performance is generally strong, with high precision and recall rates, as indicated by the confusion matrix. The false positive and false negative rates are low, suggesting that the model can accurately detect and classify UN number hazard plates.

**Improvement Strategies:**
- **Data Augmentation:** Further augmenting the dataset with scale transformations could help the model better handle placards placed close to the camera.
- **Model Optimization:** Fine-tuning the model architecture or hyperparameters could enhance its performance in challenging scenarios.
- **Diverse Dataset:** Training the model on a more diverse dataset, including images with varying distances and angles, could improve its generalization capabilities.


---

##### **2. Robustness Across Variable Conditions:**
Enhance the model’s robustness by training it on datasets representing diverse lighting and weather conditions, with a goal to maintain high AP scores across these environments.

**Approach and Outcome:**  

**Lighting Conditions:**
The model demonstrated robustness to different lighting conditions, including day and night scenarios, because the dataset contained images captured under these conditions. The model's performance was consistent across various lighting settings, indicating its ability to generalize well to different times of day.

To evaluate its performance under different weather conditions, we used data augmentation techniques from the **Albumentations** library. Augmentations like rain, sunflare, shadow, and fog were introduced to simulate adverse weather conditions. The model performed well under most conditions, except for sunflare, where it struggled due to glare and reduced visibility.

**Further Improvements:**
- **Sunflare Challenge:** Addressing the sunflare challenge could involve developing specialized augmentation techniques or training the model on additional sunflare images to improve its performance in such conditions.
- **Weather-Specific Augmentations:** Creating weather-specific augmentation strategies tailored to each condition could enhance the model's robustness and prepare it for real-world deployment.

---

##### **3. Optimization for Real-Time Processing:**
Implement real-time object detection and OCR capabilities to ensure the model operates at a frame rate suitable for analyzing images from moving trains.

**Approach and Outcome:**  
The model is optimized for real-time processing, with a frame rate suitable for analyzing images from moving trains. The YOLO11x architecture is known for its speed and efficiency, making it well-suited for real-time applications. 

The avarage processing time per frame is under **100 milliseconds**, which is sufficient for real-time processing. But due to use of idefics2 to read the un number from the placard, the overall processing time is higher than 100 milliseconds, escpecially because the processing time of idefics2 is bigger than **2 seconds**.

**Further Improvements:**
- **OCR Optimization:** Enhancing the OCR module's efficiency or exploring alternative OCR tools could reduce the overall processing time and improve real-time performance.
- **Hardware Acceleration:** Leveraging hardware accelerators like GPUs or TPUs could further optimize the model's processing speed and enhance its real-time capabilities.

### Data Mining Success Criteria evaluation

- **Object Detection AP**: Achieve a Mean Average Precision (mAP) of at least 0.70 for detecting and localizing hazard plates across varied conditions.  
  **Outcome:** Not achieved, with an mAP of 0.56793.

- **OCR Precision for UN Numbers**: Ensure the Tesseract OCR module achieves high accuracy in reading UN numbers, even under challenging conditions, with a target precision score above 0.95.  
  **Outcome:** Tesseract was unable to consistently recognize codes in difficult conditions, so we switched to using the **idefics2 VLM**, which performed significantly better, even with low-quality images. However, accuracy metrics for idefics2 have not been formally evaluated.

- **Processing Speed**: Ensure the model achieves a processing time per frame under 100 milliseconds to maintain real-time functionality.  
  **Outcome:** Not achieved. The model itself meets the processing speed requirement, but the OCR module's processing time exceeds the threshold, impacting the overall real-time performance.

- **Environmental Robustness**: Maintain consistent mAP scores across different lighting and weather conditions.  
  **Outcome:** Partly achieved. Lighting conditions were well represented in the dataset, and the model performed consistently across different lighting variations. Weather conditions, however, were not included in the dataset. The model's performance under simulated weather conditions was generally good, except for sunflare, where it failed to detect placards effectively.

In [None]:
predict_augmented_images("sunflare")  

# Modeling OCR/idefics2-8b

### Tesseract
Tesseract is an open-source OCR engine that can be used to extract text from images. It supports multiple languages and can be integrated into various programming languages, including Python. Tesseract is known for its accuracy and flexibility, making it a popular choice for text recognition tasks.

In [None]:
pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files\Tesseract-OCR\tesseract.exe'

In [None]:
def show_image(image):
    plt.figure(figsize=(10, 10))
    plt.axis('off')
    plt.imshow(image, cmap='gray' if len(image.shape) == 2 else None)
    plt.show()
    
def extract_un_number(text):
    un = re.findall(r'\d{2,}', text)
    return un

def extract_hin_number(text):
    hin = re.findall(r'\d{4,}', text)
    return hin

def get_text_from_image(image):
    #split image horizontally in two pieces
    h, w = image.shape
    image_upper = image[0:int(h/2), 0:w]
    image_lower = image[int(h/2):h, 0:w]
    psm = 6
    option = f"--psm {psm}"
    text_un = pytesseract.image_to_string(image_upper, config=option)
    text_hin = pytesseract.image_to_string(image_lower, config=option)
    return extract_un_number(text_un), extract_hin_number(text_hin)

def extract_bounding_box(image_path, xtl, ytl, xbr, ybr):
    image = cv2.imread(image_path) 
    image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
    image = cv2.equalizeHist(image)
    scale_percent = 200  # Scale by 200% (2x the size)
    width = int(image.shape[1] * scale_percent / 100)
    height = int(image.shape[0] * scale_percent / 100)
    resized = cv2.resize(image, (width, height), interpolation=cv2.INTER_CUBIC)
    crop_img = resized[
        int(ytl*scale_percent/100):int(ybr*scale_percent/100), 
        int(xtl*scale_percent/100):int(xbr*scale_percent/100)]
    return crop_img

In [None]:
model = YOLO(".\\data\\yolo\\best_augmented_scaled.pt")
image_path = ".\\images\\hazard_plate.jpg"
results = model(image_path)
for result in results:
    for box in result.boxes:
        x_min, y_min, x_max, y_max = map(int, box.xyxy[0])  # Convert to integers
        confidence = box.conf[0]  # Confidence score
        crop_img = extract_bounding_box(image_path, x_min, y_min, x_max, y_max)
        un_number, hin_number = get_text_from_image(crop_img)
        show_image(crop_img)
        print(f"UN number: {un_number}, HIN number: {hin_number}")

### EasyOCR
EasyOCR is a Python library that provides a simple interface for performing OCR tasks on images. It supports multiple languages and can detect text in various fonts and sizes. EasyOCR is designed to be user-friendly and efficient, making it a suitable choice for extracting text from images in real-time applications.

In [None]:
def get_text_from_image(image):
    # Initialize the reader for digits
    reader = easyocr.Reader(["en"])
    result = reader.readtext(image, allowlist="0123456789",detail=0)
    h,w = None, None
    try:
        h, w = image.shape
    except:
        h,w,_ = image.shape
    image_un = image[0:int(h/2), 0:w]
    image_hin = image[int(h/2):h, 0:w]
    result_un = reader.readtext(image_un, allowlist="0123456789",detail=0)
    result_hin = reader.readtext(image_hin, allowlist="0123456789",detail=0)
    return result_un,result_hin,result


def extract_bounding_box(image_path, xtl, ytl, xbr, ybr):
    image = image_path
    if(isinstance(image_path, str)):
        image = cv2.imread(image_path)
        image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
        image = cv2.equalizeHist(image)
    scale_percent = 200  # Scale by 200% (2x the size)
    width = int(image.shape[1] * scale_percent / 100)
    height = int(image.shape[0] * scale_percent / 100)
    resized = cv2.resize(image, (width, height), interpolation=cv2.INTER_CUBIC)
    crop_img = resized[
        int(ytl*scale_percent/100):int(ybr*scale_percent/100), 
        int(xtl*scale_percent/100):int(xbr*scale_percent/100)]
    return crop_img


def show_image(image):
    plt.figure(figsize=(10, 10))
    plt.axis("off")
    plt.imshow(image, cmap='gray' if len(image.shape) == 2 else None)
    plt.show()

In [None]:
model = YOLO(".\\data\\yolo\\best.pt")
image_path = ".\\images\\hazard_plate.jpg"
results = model(image_path)
for result in results:
    for box in result.boxes:
        x_min, y_min, x_max, y_max = map(int, box.xyxy[0])  # Convert to integers
        confidence = box.conf[0]  # Confidence score
        crop_img = extract_bounding_box(image_path, x_min, y_min, x_max, y_max)
        un_number = get_text_from_image(crop_img)
        show_image(crop_img)
        print(f"UN number: {un_number}")


### idefics2 
idefics2 is a Python library that offers OCR capabilities for extracting text from images. It provides an easy-to-use interface for processing images and recognizing text using optical character recognition techniques. idefics2 is designed to be fast and accurate, making it suitable for real-time text extraction tasks.

In [None]:
# Check what version of PyTorch is installed
print(torch.__version__)

# Check the current CUDA version being used
print("CUDA Version: ", torch.version.cuda)

# Check if CUDA is available and if so, print the device name
print("Device name:", torch.cuda.get_device_properties("cuda").name)

# Check if FlashAttention is available
print("FlashAttention available:", torch.backends.cuda.flash_sdp_enabled())

In [None]:
# Load the processor and model
processor = Idefics2Processor.from_pretrained("HuggingFaceM4/idefics2-8b")
quantization_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_use_double_quant=True,
    bnb_4bit_compute_dtype=torch.float16,
)
print('Processor loaded')
# Ensure the model is on the correct device
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print('Device:', device)
model = Idefics2ForConditionalGeneration.from_pretrained(
    "HuggingFaceM4/idefics2-8b",
    torch_dtype=torch.float16,
    device_map=device,
    quantization_config=quantization_config,   
    # attn_implementation="flash_attention_2",
)
model = model.to(device)
print('Model loaded')
torch.cuda.empty_cache()


In [None]:
prompt = """
Analyze the image and extract two key values:

    The UN number visible on the upper part of the placard.
    The code visible on the lower part of the placard, located below the horizontal line separating the two sections.

Both codes are printed in black. If either the upper or lower part cannot be detected, replace the missing value with "0." Output the extracted values as plain text, separated by a comma if multiple codes are present. No additional context or formatting is needed.

Input Examples:

    {98 {line} 4567}
    (not found, {line}, 8901)
    {101 {line} 3345}
    (not found, {line}, {not found})
    {45 {line} 2789}
    {22 {line} 5678}

Desired Output:

    98, 4567
    0, 8901
    101, 3345
    0, 0
    45, 2789
    22, 5678

Expected Transformation:

    For each input example, extract the UN number and the code below the horizontal line.
    If either part is missing (i.e., "not found"), replace it with 0.
    Output the extracted values as plain text, separated by a comma, without any additional context or formatting.
"""
messages = [
    {
        "role": "user",
        "content": [
            {
                "type": "text",
                "text": prompt},
            {"type": "image"},
        ],
    }
]
text = processor.apply_chat_template(messages, add_generation_prompt=True)


In [None]:
modelYolo = YOLO(".\\data\\yolo\\best_augmented_scaled.pt", task="detect")
model

In [None]:
!nvidia-smi

In [None]:
def preprocess_image(image_path):
    # Read the image using PIL
    image = Image.open(image_path).convert("RGB")
    return image

def extract_bounding_box(image_path, xtl, ytl, xbr, ybr):
    # check type of image path to check whetther its a  string
    if isinstance(image_path, str):
        image_path = Image.open(image_path).convert("RGB")
    crop_img = image_path.crop((xtl, ytl, xbr, ybr))
    return crop_img

def perform_ocr(image): 
    start_time = time.time()
    
    inputs = processor(images=image, text=text, return_tensors="pt").to(device)
    generated_text = model.generate(**inputs, max_new_tokens=500)
    generated_text = processor.batch_decode(generated_text, skip_special_tokens=True)[0]
    assistant_output = generated_text.split("Assistant:")[1].strip()
    end_time = time.time()
    elapsed_time = end_time - start_time
    print(f"Elapsed time: {elapsed_time:.2f} seconds")
    # Split the output by comma to get the individual numbers
    numbers = assistant_output.split(",")

    # Strip any leading or trailing whitespace from the numbers
    numbers = [number.strip().replace('.','') for number in numbers]
    un_number, hin_number = numbers
    return un_number, hin_number

def show_image(image):
    plt.figure(figsize=(10, 10))
    plt.axis('off')
    plt.imshow(image)
    plt.show()

In [None]:

image_path = ".\\images\\good_conditions.jpg"
results = modelYolo(image_path,stream=True)
for result in results:
    for box in result.boxes:
        x_min, y_min, x_max, y_max = map(int, box.xyxy[0])  # Convert to integers
        crop_img = extract_bounding_box(image_path, x_min, y_min, x_max, y_max)
        show_image(crop_img)
        un_number = perform_ocr(crop_img)
        print(f"UN number: {un_number[0]}, HIN number: {un_number[1]}")
        desc = get_hin_description(un_number[1])
        print(f"Description: {desc}")


# Evaluation OCR/idefics2-8b

We evaluated three different OCR techniques—Tesseract, EasyOCR, and idefics2—on low-quality images to simulate real-world conditions. The performance of these models was as follows:

- **Tesseract** preprocessed the test image in **2.0 ms** but failed to accurately extract the desired values.
- **EasyOCR** preprocessed the image in **8.3 ms**, but it also failed to extract the correct code.

The only model that successfully extracted the UN number and hazard placard code was **idenfics2-8b**. We employed a specific prompt for this model:

#### Prompt:

Analyze the image and extract two key values:

    The UN number visible on the upper part of the placard.
    The code visible on the lower part of the placard, located below the horizontal line separating the two sections.

Both codes are printed in black. If either the upper or lower part cannot be detected, replace the missing value with "0." Output the extracted values as plain text, separated by a comma if multiple codes are present. No additional context or formatting is needed.

Input Examples:

    {98 {line} 4567}
    (not found, {line}, 8901)
    {101 {line} 3345}
    (not found, {line}, {not found})
    {45 {line} 2789}
    {22 {line} 5678}

Desired Output:

    98, 4567
    0, 8901
    101, 3345
    0, 0
    45, 2789
    22, 5678

Expected Transformation:

    For each input example, extract the UN number and the code below the horizontal line.
    If either part is missing (i.e., "not found"), replace it with 0.
    Output the extracted values as plain text, separated by a comma, without any additional context or formatting.


This model produced the correct results and extracted the UN number and hazard code accurately. However, the processing time was much longer compared to the other models, taking up to **17.49 seconds per image**. To optimize this, we used quantization techniques, but the processing time still did not meet real-time requirements.

We recommend exploring further literature to identify additional VLMs or alternative OCR techniques that may improve both accuracy and processing time. Techniques such as **FlashAttention** could be explored to enhance speed.

Given its superior accuracy in handling low-quality images—which are common when processing frames from videos—**idenfics2-8b** remains our recommended OCR model for the time being, despite its slower processing speed.

# Modeling Faster R-CNN

## Functions

In [None]:
scaler = GradScaler()

# Define the dataset class
class HazmatDataset(Dataset):
    def __init__(self, data_dir, annotations_file, transforms=None):
        self.data_dir = data_dir
        self.transforms = transforms
        
        # Load annotations
        with open(annotations_file) as f:
            data = json.load(f)
        
        self.images = {img['id']: img for img in data['images']}
        self.annotations = data['annotations']
        
        # Create image_id to annotations mapping
        self.img_to_anns = {}
        for ann in self.annotations:
            img_id = ann['image_id']
            if img_id not in self.img_to_anns:
                self.img_to_anns[img_id] = []
            self.img_to_anns[img_id].append(ann)
        
        self.ids = list(self.images.keys())

    def __getitem__(self, idx):
        img_id = self.ids[idx]
        img_info = self.images[img_id]
        
        # Load image
        img_path = os.path.join(self.data_dir, 'images', img_info['file_name'])
        img = Image.open(img_path).convert('RGB')
        
        # Get annotations
        anns = self.img_to_anns.get(img_id, [])
        
        boxes = []
        labels = []
        areas = []
        iscrowd = []
        
        for ann in anns:
            bbox = ann['bbox']
            # Convert [x, y, w, h] to [x1, y1, x2, y2]
            boxes.append([
                bbox[0],
                bbox[1],
                bbox[0] + bbox[2],
                bbox[1] + bbox[3]
            ])
            labels.append(ann['category_id'])
            areas.append(ann['area'])
            iscrowd.append(ann['iscrowd'])
        
        # Convert to tensor
        boxes = torch.as_tensor(boxes, dtype=torch.float32)
        labels = torch.as_tensor(labels, dtype=torch.int64)
        areas = torch.as_tensor(areas, dtype=torch.float32)
        iscrowd = torch.as_tensor(iscrowd, dtype=torch.int64)
        
        target = {
            'boxes': boxes,
            'labels': labels,
            'image_id': torch.tensor([img_id]),
            'area': areas,
            'iscrowd': iscrowd
        }
        
        if self.transforms is not None:
            for transform in self.transforms:
                img, target = transform(img, target)
        
        return img, target

    def __len__(self):
        return len(self.ids)
    
class ToTensor(object):
    def __call__(self, image, target):
        # Convert PIL image to tensor
        image = F.to_tensor(image)
        return image, target

class RandomHorizontalFlip(object):
    def __init__(self, prob):
        self.prob = prob

    def __call__(self, image, target):
        if torch.rand(1) < self.prob:
            height, width = image.shape[-2:]
            image = F.hflip(image)
            # Flip bounding boxes
            bbox = target["boxes"]
            bbox[:, [0, 2]] = width - bbox[:, [2, 0]]  # Flip x-coordinates
            target["boxes"] = bbox
        return image, target

def get_transform(train):
    transforms = []
    # Convert PIL image to tensor
    transforms.append(ToTensor())
    if train:
        # Add training augmentations here if needed
        transforms.append(RandomHorizontalFlip(0.5))
    return transforms

def collate_fn(batch):
    return tuple(zip(*batch))

def train_one_epoch(model, optimizer, data_loader, device, scaler):
    model.train()
    total_loss = 0
    total_classifier_loss = 0
    total_box_reg_loss = 0
    total_objectness_loss = 0
    total_rpn_box_reg_loss = 0

    # Voeg tqdm toe om de voortgang te tonen
    progress_bar = tqdm(data_loader, desc="Training", leave=True)
    
    for images, targets in progress_bar:
        images = list(image.to(device) for image in images)
        targets = [{k: v.to(device) for k, v in t.items()} for t in targets]

        # Wrap the forward pass in autocast
        with autocast():
            loss_dict = model(images, targets)
            losses = sum(loss for loss in loss_dict.values())

        optimizer.zero_grad()
        # Scale the loss and call backward
        scaler.scale(losses).backward()
        # Unscales the gradients and calls or skips optimizer.step()
        scaler.step(optimizer)
        # Updates the scale for next iteration
        scaler.update()

        # Bereken de totalen
        total_loss += losses.item()
        total_classifier_loss += loss_dict['loss_classifier'].item()
        total_box_reg_loss += loss_dict['loss_box_reg'].item()
        total_objectness_loss += loss_dict['loss_objectness'].item()
        total_rpn_box_reg_loss += loss_dict['loss_rpn_box_reg'].item()

        # Update tqdm-balk
        progress_bar.set_postfix({
            "Loss": f"{losses.item():.4f}",
            "Classifier": f"{loss_dict['loss_classifier'].item():.4f}",
            "BoxReg": f"{loss_dict['loss_box_reg'].item():.4f}",
        })

    avg_loss = total_loss / len(data_loader)
    avg_classifier_loss = total_classifier_loss / len(data_loader)
    avg_box_reg_loss = total_box_reg_loss / len(data_loader)
    avg_objectness_loss = total_objectness_loss / len(data_loader)
    avg_rpn_box_reg_loss = total_rpn_box_reg_loss / len(data_loader)

    return avg_loss, avg_classifier_loss, avg_box_reg_loss, avg_objectness_loss, avg_rpn_box_reg_loss



# Load ground truth annotations
coco_val = COCO('data/data_faster_rcnn/val/annotations/instances_val.json')

# Prepare predictions in COCO format
# Assuming you have a function to convert model outputs to COCO format
# Conversion to COCO Format
def convert_to_coco_format(outputs, image_ids):
    coco_results = []
    for output, image_id in zip(outputs, image_ids):
        boxes = output['boxes'].cpu().numpy()
        scores = output['scores'].cpu().numpy()
        labels = output['labels'].cpu().numpy()
        
        for box, score, label in zip(boxes, scores, labels):
            coco_results.append({
                'image_id': image_id,
                'category_id': int(label),
                'bbox': [box[0], box[1], box[2] - box[0], box[3] - box[1]],
                'score': float(score)
            })
    return coco_results

# Validation Function
def validate(model, data_loader, coco_gt, device):
    model.eval()
    results = []

    # Add tqdm
    progress_bar = tqdm(data_loader, desc="Validation", leave=True)

    with torch.no_grad():
        for images, targets in progress_bar:
            images = list(image.to(device) for image in images)
            outputs = model(images)
            
            image_ids = [target['image_id'].item() for target in targets]
            coco_results = convert_to_coco_format(outputs, image_ids)
            results.extend(coco_results)

            # Update tqdm-bar
            progress_bar.set_postfix({"Processed": len(results)})

    if not results:
        print("No predictions generated. Skipping evaluation.")
        return [0.0] * 6  # Return dummy metrics for empty results

    # Suppress COCOeval output
    with contextlib.redirect_stdout(io.StringIO()):
        coco_dt = coco_gt.loadRes(results)
        coco_eval = COCOeval(coco_gt, coco_dt, 'bbox')
        coco_eval.evaluate()
        coco_eval.accumulate()
        coco_eval.summarize()

    return coco_eval.stats


# Custom backbone to return a dictionary of feature maps
class BackboneWithChannels(torch.nn.Module):
    def __init__(self, backbone):
        super().__init__()
        self.backbone = backbone
    def forward(self, x):
        x = self.backbone(x)
        return {'0': x}
    
# Function to create a subset of the dataset
def create_subset(dataset, percentage):
    """
    Create a subset of the dataset based on the given percentage.
    
    Parameters:
    - dataset: The full dataset.
    - percentage: The fraction of the dataset to use (value between 0.0 and 1.0).
    
    Returns:
    - subset: A subset of the dataset containing the specified percentage of data.
    """
    if not (0.0 < percentage <= 1.0):
        raise ValueError("Percentage must be between 0.0 and 1.0.")
    
    # Determine the subset size
    total_samples = len(dataset)
    subset_size = int(total_samples * percentage)
    
    # Shuffle and select a random subset of indices
    indices = list(range(total_samples))
    random.shuffle(indices)
    subset_indices = indices[:subset_size]
    
    return Subset(dataset, subset_indices)

def create_directory(base_path="data/models"):
    """
    Create a directory inside the base path named 'faster-rcnn-finetuned-{date}' 
    to store models and logs. The name includes the current date and time in the format 'DD-MM-YYYY HH:MM:SS'.

    Parameters:
    - base_path (str): Base directory where the new directory will be created.

    Returns:
    - directory_path (str): Full path to the created directory.
    """
    # Get the current date and time
    current_time = datetime.now().strftime("%d-%m-%Y %H:%M:%S")
    
    # Define the full directory path
    directory_name = f"faster-rcnn-finetuned-{current_time}"
    directory_path = os.path.join(base_path, directory_name)
    
    # Create the directory
    os.makedirs(directory_path, exist_ok=True)
    
    print(f"Directory created: {directory_path}")
    return directory_path

def train_model(directory, model, optimizer, train_loader, device, train_metrics_list, best_val_map, lr_scheduler, val_loader, coco_val, scaler, epoch):
    
    epoch+=1
    # Start the timer
    start_time = time.time()
    
    # Train for one epoch
    train_loss, train_classifier_loss, train_box_reg_loss, train_objectness_loss, train_rpn_box_reg_loss = train_one_epoch(
        model, optimizer, train_loader, device, scaler)
    
    # Validate and get all COCO-metrics
    val_metrics = validate(model, val_loader, coco_val, device)
    val_map = val_metrics[0]  # mAP@IoU=0.50:0.95
    
    # Stop the timer
    end_time = time.time()
    elapsed_time = end_time - start_time
    minutes, seconds = divmod(elapsed_time, 60)
    
    # Obtain the current learning rate
    current_lr = optimizer.param_groups[0]['lr']
    
    # Prepare data for logging
    data = {
        "epoch": epoch,
        "time_elapsed": (int(minutes), int(seconds)),
        "learning_rate": current_lr,
        "train_loss": train_loss,
        "classifier_loss": train_classifier_loss,
        "box_reg_loss": train_box_reg_loss,
        "objectness_loss": train_objectness_loss,
        "rpn_box_reg_loss": train_rpn_box_reg_loss,
        "val_metrics": val_metrics
    }
    
    # Append current epoch data to metrics list
    train_metrics_list.append(data)
    
    # Print summary for this epoch
    print(f"📊 Epoch {epoch} | ⏳ Time: {int(minutes)}m {int(seconds)}s | 🔄 LR: {current_lr:.6f}")
    print(f"📉 Train Loss: {train_loss:.4f} | 🎯 Classifier: {train_classifier_loss:.4f} | 📦 Box Reg: {train_box_reg_loss:.4f}")
    print(f"🔍 Objectness: {train_objectness_loss:.4f} | 🗂️ RPN Box Reg: {train_rpn_box_reg_loss:.4f}")
    print(f"🧪 mAP | 🟢 mAP@IoU=0.50:0.95: {val_metrics[0]:.4f} | 🔵 mAP@IoU=0.50: {val_metrics[1]:.4f} | 🟣 mAP@IoU=0.75: {val_metrics[2]:.4f}")
    print(f"📏 Small mAP: {val_metrics[3]:.4f} | 📐 Medium mAP: {val_metrics[4]:.4f} | 📏 Large mAP: {val_metrics[5]:.4f}")
    
    # Save epoch data to a log file
    save_epoch_data(directory, data)
    
    # Update learning rate
    lr_scheduler.step()
    
    # Save the latest checkpoint with all metrics
    checkpoint = {
        'epoch': epoch,
        'model_state_dict': model.state_dict(),
        'optimizer_state_dict': optimizer.state_dict(),
        'val_map': val_map,
        'train_metrics_list': train_metrics_list  # Save all metrics
    }
    torch.save(checkpoint, os.path.join(directory, "latest_model.pth"))
    
    # Save the best model if the val_map is the highest so far
    if val_map > best_val_map:
        best_val_map = val_map
        torch.save(checkpoint, os.path.join(directory, "best_model.pth"))
    
    return best_val_map
        


def save_epoch_data(directory, data):
    """
    Save training statistics for each epoch in a text file.

    Parameters:
    - directory (str): Path to the directory.
    - data (dict): Contains data on metrics such as epoch, losses, and validation metrics.
    """
    log_file_path = os.path.join(directory, "training_log.txt")
    
    with open(log_file_path, "a") as log_file:
        log_file.write(f"📊 Epoch {data['epoch']} | ⏳ Time: {data['time_elapsed'][0]}m {data['time_elapsed'][1]}s | 🔄 LR: {data['learning_rate']:.6f}\n")
        log_file.write(f"📉 Train Loss: {data['train_loss']:.4f} | 🎯 Classifier: {data['classifier_loss']:.4f} | 📦 Box Reg: {data['box_reg_loss']:.4f}\n")
        log_file.write(f"🔍 Objectness: {data['objectness_loss']:.4f} | 🗂️ RPN Box Reg: {data['rpn_box_reg_loss']:.4f}\n")
        log_file.write(f"🧪 Validation Metrics | 🟢 mAP@IoU=0.50:0.95: {data['val_metrics'][0]:.4f} | 🔵 mAP@IoU=0.50: {data['val_metrics'][1]:.4f} | 🟣 mAP@IoU=0.75: {data['val_metrics'][2]:.4f}\n")
        log_file.write(f"📏 Small mAP: {data['val_metrics'][3]:.4f} | 📐 Medium mAP: {data['val_metrics'][4]:.4f} | 📏 Large mAP: {data['val_metrics'][5]:.4f}\n")
        log_file.write("\n")

## Initialize model

In [None]:
device = torch.device('cuda:0') if torch.cuda.is_available() else torch.device('cpu')
print(f"Training model on {device}")

# Create datasets
train_dataset = HazmatDataset(
    data_dir='data/data_faster_rcnn/train',
    annotations_file='data/data_faster_rcnn/train/annotations/instances_train.json',
    transforms=get_transform(train=True)
)

val_dataset = HazmatDataset(
    data_dir='data/data_faster_rcnn/val',
    annotations_file='data/data_faster_rcnn/val/annotations/instances_val.json',
    transforms=get_transform(train=False)
)

# Set the percentage of the training dataset to use (e.g. 0.x to 1)
train_percentage = 1

# Create a subset of the training dataset
train_dataset_subset = create_subset(train_dataset, train_percentage)

# Set the percentage of the val dataset to use (e.g. 0.x to 1)
val_percentage = 1

# Create a subset of the training dataset
val_dataset_subset = create_subset(val_dataset, val_percentage)

# amount of cpu cores
workers = 2

# Create data loaders
train_loader = DataLoader(
    train_dataset_subset,
    batch_size=16,
    shuffle=True,
    collate_fn=collate_fn,
    num_workers=workers,
    pin_memory=True
)

val_loader = DataLoader(
    val_dataset_subset,
    batch_size=16,
    shuffle=False,
    collate_fn=collate_fn,
    num_workers=workers,
    pin_memory=True
)

# Initialize model
num_classes = 2  # hazmat code and background

# Create ResNet-101 backbone with FPN
backbone = resnet_fpn_backbone('resnet101', pretrained=True)

# Define anchor generator for FPN
anchor_generator = AnchorGenerator(
    sizes=((32,), (64,), (128,), (256,), (512,)),
    aspect_ratios=((0.5, 1.0, 2.0),) * 5
)

# Multi-scale RoI pooling for FPN
roi_pooler = MultiScaleRoIAlign(
    featmap_names=['0', '1', '2', '3', '4'],
    output_size=7,
    sampling_ratio=2
)

print("initializing model...")
# Initialize Faster R-CNN with ResNet-101-FPN
model = FasterRCNN(
    backbone=backbone,
    num_classes=num_classes,
    rpn_anchor_generator=anchor_generator,
    box_roi_pool=roi_pooler
)

# Move model to device
model.to(device)

In [None]:
!nvidia-smi
num_gpus = torch.cuda.device_count()
print(f"Number of GPUs available: {num_gpus}")

In [None]:
# !kill -9 7710

## Training

In [None]:
# Initialize optimizer and scheduler
params = [p for p in model.parameters() if p.requires_grad]
optimizer = torch.optim.SGD(params, lr=0.005, momentum=0.9, weight_decay=0.0005)
lr_scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=3, gamma=0.1)

# Training loop
num_epochs = 23
train_metrics_map = []
best_val_map = float('-inf')

print("Starting training...")

# Create directory to store models and logs
directory_finetuned_model = create_directory()


for epoch in range(num_epochs):
    best_val_map = train_model(
        directory=directory_finetuned_model, 
        model=model, optimizer=optimizer, train_loader=train_loader, device=device, 
        train_metrics_list=train_metrics_map, best_val_map=best_val_map, lr_scheduler=lr_scheduler, 
        val_loader=val_loader, coco_val=coco_val, scaler=scaler, epoch=epoch
    )



# Evaluation Faster R-CNN

In [None]:
# Load the model
directory_finetuned_model = "data/models"
device = torch.device('gpu:0' if torch.cuda.is_available() else 'cpu')
model_path = os.path.join(directory_finetuned_model, 'best_model.pth')
checkpoint = torch.load(model_path, map_location=device)
val_map = checkpoint['val_map']
epoch = checkpoint['epoch']
#latest
latest_model_path = os.path.join(directory_finetuned_model, 'latest_checkpoint.pth')
checkpoint_latest = torch.load(latest_model_path, map_location=device)
val_map_latest = checkpoint_latest['val_map']
epoch_latest = checkpoint_latest['epoch']

model.load_state_dict(checkpoint['model_state_dict'])
model.eval()  # Set the model to evaluation mode

print(f"Validation mAP best model: {val_map:.4f}")
print(f"Epoch best model: {epoch}")

print(f"Validation mAP latest model: {val_map_latest:.4f}")
print(f"Epoch latest model: {epoch_latest}")


In [None]:
def plot_metrics(checkpoint_path, title="Training and Validation Metrics over Epochs"):
    """
    Plot training and validation metrics from a given model checkpoint.
    
    Parameters:
    - checkpoint_path (str): Path to the model checkpoint file (e.g., 'latest_model.pth').
    - title (str): Title for the plot.
    """
    # Load the checkpoint
    checkpoint = torch.load(checkpoint_path, map_location=device)
    train_metrics_list = checkpoint['train_metrics_list']
    
    # Extract metrics per epoch
    epochs = [data['epoch'] for data in train_metrics_list]
    train_loss_list = [data['train_loss'] for data in train_metrics_list]
    classifier_loss_list = [data['classifier_loss'] for data in train_metrics_list]
    box_reg_loss_list = [data['box_reg_loss'] for data in train_metrics_list]
    objectness_loss_list = [data['objectness_loss'] for data in train_metrics_list]
    rpn_box_reg_loss_list = [data['rpn_box_reg_loss'] for data in train_metrics_list]

    # Extract validation mAP metrics
    val_map_list = [data['val_metrics'][0] for data in train_metrics_list]  # mAP@IoU=0.50:0.95
    val_map_50_list = [data['val_metrics'][1] for data in train_metrics_list]  # mAP@IoU=0.50
    val_map_75_list = [data['val_metrics'][2] for data in train_metrics_list]  # mAP@IoU=0.75
    val_map_small_list = [data['val_metrics'][3] for data in train_metrics_list]  # Small mAP
    val_map_medium_list = [data['val_metrics'][4] for data in train_metrics_list]  # Medium mAP
    val_map_large_list = [data['val_metrics'][5] for data in train_metrics_list]  # Large mAP

    # Initialize the plot
    plt.figure(figsize=(14, 10))

    # Plot training losses
    plt.plot(epochs, train_loss_list, label='Training Loss', marker='o')
    #     plt.plot(epochs, classifier_loss_list, label='Classifier Loss', marker='o')
    #     plt.plot(epochs, box_reg_loss_list, label='Box Regression Loss', marker='o')
    #     plt.plot(epochs, objectness_loss_list, label='Objectness Loss', marker='o')
    #     plt.plot(epochs, rpn_box_reg_loss_list, label='RPN Box Regression Loss', marker='o')

    # Plot validation mAP metrics
    plt.plot(epochs, val_map_list, label='Validation mAP (IoU=0.50:0.95)', linestyle='--', marker='x')
    plt.plot(epochs, val_map_50_list, label='Validation mAP (IoU=0.50)', linestyle='--', marker='x')
    plt.plot(epochs, val_map_75_list, label='Validation mAP (IoU=0.75)', linestyle='--', marker='x')
    plt.plot(epochs, val_map_small_list, label='Validation mAP (Small)', linestyle='--', marker='x')
    plt.plot(epochs, val_map_medium_list, label='Validation mAP (Medium)', linestyle='--', marker='x')
    plt.plot(epochs, val_map_large_list, label='Validation mAP (Large)', linestyle='--', marker='x')

    # Set x-axis ticks to start from 1
    plt.xticks(range(1, len(epochs) + 1))

    # Set plot details
    plt.xlabel('Epoch')
    plt.ylabel('Metric Value')
    plt.title(title)
    plt.legend()
    plt.grid(True)
    plt.show()

In [None]:
# Load latest model checkpoint
latest_model_path = os.path.join(directory_finetuned_model, 'latest_model.pth')
plot_metrics(latest_model_path, "Training and validation over epochs")

In [None]:
def load_image(image_path, transforms=None):
    image = Image.open(image_path).convert('RGB')
    if transforms:
        for transform in transforms:
            image, _ = transform(image, target=None)  # No target during inference
    return image

# Define preprocessing transforms
test_transforms = get_transform(train=False)

# Load the image
image_path = 'images/hazard_plate.jpg'  # Replace with your image path
image = load_image(image_path, transforms=test_transforms)
image = image.to(device)
# Wrap the image in a list as the model expects a batch
with torch.no_grad():
    predictions = model([image])

In [None]:
def get_color_with_opacity(score):
    """
    Get a color with opacity based on the confidence score.
    Higher confidence = more red and higher opacity.
    Lower confidence = random color and lower opacity.
    """
    if score > 0.75:
        # High confidence: Red with high opacity
        color = (1, 0, 0, min(1.0, 0.3 + score))  # Red with opacity based on score
    else:
        # Low confidence: Random color with lower opacity
        color = (random.random(), random.random(), random.random(), max(0.3, score))
    return color

def draw_predictions(image, predictions, threshold=0.5, classes=['background', 'hazmat']):
    # Convert image from tensor to numpy array
    image = image.cpu().permute(1, 2, 0).numpy()
    image = np.clip(image * 255, 0, 255).astype(np.uint8)
    
    boxes = predictions[0]['boxes'].cpu().numpy()
    labels = predictions[0]['labels'].cpu().numpy()
    scores = predictions[0]['scores'].cpu().numpy()
    
    # Filter predictions based on confidence threshold
    keep = scores >= threshold
    boxes = boxes[keep]
    labels = labels[keep]
    scores = scores[keep]
    
    fig, ax = plt.subplots(1, figsize=(12, 9))
    ax.imshow(image)
    
    for box, label, score in zip(boxes, labels, scores):
        if label == 1:  # Only plot hazmat codes
            x1, y1, x2, y2 = box
            color = get_color_with_opacity(score)
            
            # Draw rectangle with opacity
            rect = plt.Rectangle((x1, y1), x2 - x1, y2 - y1, linewidth=2, 
                                 edgecolor=color, facecolor='none')
            ax.add_patch(rect)
            
            # Add text label with confidence score
            label_name = classes[label]
            ax.text(x1, y1, f'{label_name}: {score:.2f}', 
                    color='white', 
                    bbox=dict(facecolor=color[:3], alpha=0.6), 
                    fontsize=12)
    
    plt.axis('off')
    plt.show()


In [None]:
def predict_image(image_path, threshold=0.5):
    # List of class names
    classes = ['background', 'hazmat']
    
    # Load the image
    image = load_image(image_path, transforms=test_transforms)
    image = image.to(device)
    
    # Start timing
    start_time = time.time()
    
    # Wrap the image in a list as the model expects a batch
    with torch.no_grad():
        predictions = model([image])
    
    # End timing
    end_time = time.time()
    prediction_time = end_time - start_time
    print(f"Prediction time: {prediction_time:.4f} seconds")
    
    # Filter predictions based on threshold
    boxes = predictions[0]['boxes'].cpu().numpy()
    labels = predictions[0]['labels'].cpu().numpy()
    scores = predictions[0]['scores'].cpu().numpy()
    
    # Apply threshold filter
    keep = scores >= threshold
    boxes = boxes[keep]
    labels = labels[keep]
    scores = scores[keep]
    
    # Print the predictions
    if len(boxes) == 0:
        print("No predictions meet the threshold.")
    else:
        print("Predictions:")
        for label, score in zip(labels, scores):
            class_name = classes[label]
            print(f"  {class_name}: {score:.2f}")
        # Display the predictions
        draw_predictions(image, predictions, threshold=threshold, classes=classes)


def get_color_with_opacity(score):
    """
    Get a color with opacity based on the confidence score.
    Higher confidence = more red and higher opacity.
    Lower confidence = random color and lower opacity.
    """
    if score > 0.75:
        # High confidence: Red with high opacity
        color = (1, 0, 0, min(1.0, 0.3 + score))  # Red with opacity based on score
    else:
        # Low confidence: Random color with lower opacity
        color = (random.random(), random.random(), random.random(), max(0.3, score))
    return color

def draw_predictions(image, predictions, threshold=0.5, classes=['background', 'hazmat']):
    # Convert image from tensor to numpy array
    image = image.cpu().permute(1, 2, 0).numpy()
    image = np.clip(image * 255, 0, 255).astype(np.uint8)
    
    boxes = predictions[0]['boxes'].cpu().numpy()
    labels = predictions[0]['labels'].cpu().numpy()
    scores = predictions[0]['scores'].cpu().numpy()
    
    # Filter predictions based on confidence threshold
    keep = scores >= threshold
    boxes = boxes[keep]
    labels = labels[keep]
    scores = scores[keep]
    
    fig, ax = plt.subplots(1, figsize=(12, 9))
    ax.imshow(image)
    
    for box, label, score in zip(boxes, labels, scores):
        if label == 1:  # Only plot hazmat codes
            x1, y1, x2, y2 = box
            color = get_color_with_opacity(score)
            
            # Draw rectangle with opacity
            rect = plt.Rectangle((x1, y1), x2 - x1, y2 - y1, linewidth=2, 
                                 edgecolor=color, facecolor='none')
            ax.add_patch(rect)
            
            # Add text label with confidence score
            label_name = classes[label]
            ax.text(x1, y1, f'{label_name}: {score:.2f}', 
                    color='white', 
                    bbox=dict(facecolor=color[:3], alpha=0.6), 
                    fontsize=12)
    
    plt.axis('off')
    plt.show()


In [None]:
# predict_image('data/data_faster_rcnn/val/images/1690281365_00595.jpg', threshold=0.29)
predict_image('images/hazard_plate.jpg', threshold=0)
predict_image('images/un_numbers_test/close_up_number.webp', threshold=0)
predict_image('images/un_numbers_test/2.jpg', threshold=0)
predict_image('images/un_numbers_test/3.jpg', threshold=0)
predict_image('images/two_signs_different_distance.jpg', threshold=0)
predict_image('images/un_numbers_test/6.webp', threshold=0)
predict_image('images/no_signs.jpg', threshold=0)
predict_image('images/africalane_closed_off.jpg', threshold=0)
predict_image('images/bikes_get_off.jpg', threshold=0)
predict_image('images/gevaarlijke_stoffen_route.jpg', threshold=0)
predict_image('images/great_britain_nb.jpeg', threshold=0)
predict_image('images/priority-road-sign.webp', threshold=0)
predict_image('images/reflective_un_number_on_truck.jpg', threshold=0)
predict_image('images/traffic signs.jpg', threshold=0)


## Test set evaluation

In [None]:
# Create test dataset
test_dataset = HazmatDataset(
    data_dir='data/data_faster_rcnn/test',
    annotations_file='data/data_faster_rcnn/test/annotations/instances_test.json',
    transforms=get_transform(train=False)
)

# Create test data loader
test_loader = DataLoader(
    test_dataset,
    batch_size=16,
    shuffle=False,
    collate_fn=collate_fn,
    num_workers=workers,
    pin_memory=True
)

# Load the best model checkpoint

model_path = os.path.join(directory_finetuned_model, 'best_model.pth')
checkpoint = torch.load(model_path, map_location=device)

model.load_state_dict(checkpoint['model_state_dict'])
model.to(device)

# Load ground truth annotations for test set
coco_test = COCO('data/data_faster_rcnn/test/annotations/instances_test.json')

# Evaluate on test set
test_metrics = validate(model, test_loader, coco_test, device)

# Print test metrics
print(f"Test Metrics - mAP: {test_metrics[0]:.4f}")
print(f"mAP@0.5: {test_metrics[1]:.4f}, mAP@0.75: {test_metrics[2]:.4f}")
print(f"mAP medium: {test_metrics[4]:.4f}, mAP large: {test_metrics[5]:.4f}")

In [None]:
# /tmp/ipykernel_2090903/2616491437.py:20: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
#   checkpoint = torch.load(best_checkpoint_path)
# loading annotations into memory...
# Done (t=0.00s)
# creating index...
# index created!
# Validation: 100%|██████████| 62/62 [02:38<00:00,  2.56s/it, Processed=1095]
# Test Metrics - mAP: 0.5634
# mAP@0.5: 0.9892, mAP@0.75: 0.4738
# mAP small: -1.0000, mAP medium: 0.4648, mAP large: 0.5724

## Data prep for augmented weather evaluation

In [None]:
# loop trhough all images from test and predict
test_images_path = "data/data_faster_rcnn/test/images"

# frames available
test_images_path_list = os.listdir(test_images_path)
random.shuffle(test_images_path_list)

# Predict on the first 30 images
for count, image_name in enumerate(test_images_path_list[:20]):
    image_path = os.path.join(test_images_path, image_name)
    predict_image(image_path, threshold=0.4)

In [None]:

def visualize(image):
    plt.figure(figsize=(20, 10))
    plt.axis('off')
    plt.imshow(image)

test_images_path = "data/data_faster_rcnn/test/images"

# frames available
test_images_path_list = os.listdir(test_images_path)
random.shuffle(test_images_path_list)
path = os.path.join(test_images_path, test_images_path_list[0])
image = cv2.imread(path)
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)

transform = A.Compose(
    [A.RandomRain(brightness_coefficient=0.9, drop_width=1, blur_value=5, p=1)],
)
random.seed(7)
transformed = transform(image=image)
visualize(transformed['image'])

In [None]:
transform = A.Compose(
    [A.RandomSunFlare(flare_roi=(0, 0, 1,0.5), angle_lower=1, p=1)],
)
random.seed(7)
transformed = transform(image=image)
visualize(transformed['image'])

In [None]:
transform = A.Compose(
    [A.RandomShadow(num_shadows_lower=1, num_shadows_upper=1, shadow_dimension=5, shadow_roi=(0, 0.5, 1, 1), p=1)],
)
random.seed(7)
transformed = transform(image=image)
visualize(transformed['image'])

In [None]:
transform = A.Compose(
    [A.RandomFog(fog_coef_lower=0.7, fog_coef_upper=0.8, alpha_coef=0.1, p=1)],
)
random.seed(7)
transformed = transform(image=image)
visualize(transformed['image'])


In [None]:
import os
import random
from PIL import Image
import cv2
import albumentations as A

# frames available
train_images_path = "data/data_faster_rcnn/train/images"
train_list = os.listdir(train_images_path)
random.shuffle(train_list)

def add_to_dataset(image, dataset_name):
    """
    Add an image to the specified dataset.
    
    Parameters:
    - image (numpy.ndarray): The image to add (as a NumPy array).
    - dataset_name (str): The name of the dataset to add the image to.
    """
    # Get the dataset directory
    dataset_dir = os.path.join('data/augmented_images', dataset_name)
    
    # Create the dataset directory if it doesn't exist
    os.makedirs(dataset_dir, exist_ok=True)
    
    # Get the image filename
    image_id = len(os.listdir(dataset_dir)) + 1
    image_filename = f"{image_id}.jpg"
    
    # Convert NumPy array to PIL image
    image_pil = Image.fromarray(cv2.cvtColor(image, cv2.COLOR_BGR2RGB))
    
    # Save the image
    image_pil.save(os.path.join(dataset_dir, image_filename))


# Loop over train list
for count, image_name in enumerate(train_list):
    image_path = os.path.join(train_images_path, image_name)
    
    # Load image using OpenCV
    image = cv2.imread(image_path)
    
    #check if we have max images per set
    random_string = random.choice(["rain", "sunflare", "shadow", "fog"])

    

    # Apply augmentation based on the random choice
    if random_string == "rain":
        transform = A.Compose(
            [A.RandomRain(brightness_coefficient=0.9, drop_width=1, blur_value=5, p=1)],
        )

    elif random_string == "sunflare":
        transform = A.Compose(
            [A.RandomSunFlare(flare_roi=(0, 0, 1, 0.5), angle_lower=1, p=1)],
        )

    elif random_string == "shadow":
        transform = A.Compose(
            [A.RandomShadow(num_shadows_lower=1, num_shadows_upper=1, shadow_dimension=5, shadow_roi=(0, 0.5, 1, 1), p=1)],
        )

    elif random_string == "fog":
        transform = A.Compose(
            [A.RandomFog(fog_coef_lower=0.7, fog_coef_upper=0.8, alpha_coef=0.1, p=1)],
        )

    # Apply transformation
    transformed = transform(image=image)
    
    # Save transformed image
    add_to_dataset(transformed['image'], random_string)


In [None]:
# go over each set rain, sunflare, shadow, fog and predict the images

# base path
base_path = "data/augmented_images"

# loop over each set

for set_name in ["rain", "sunflare", "shadow", "fog"]:
    # Get all images in the set
    set_path = os.path.join(base_path, set_name)

    # loop over each image and use the predict_image function
    for count, image_name in enumerate(os.listdir(set_path)):
        image_path = os.path.join(set_path, image_name)
        predict_image(image_path, threshold=0.4)

## Model metrics

<img src="data/models/validation.png" alt="validation over epochs" width=1000>



In [None]:
# open training log file

log_file_path = os.path.join("data","models", "training_log.txt")

with open(log_file_path, "r") as log_file:
    print(log_file.read())

The fintuned model Faster R-CNN has been trained for 18 epochs and reached a these maximal metrics:

- **mAP@IoU=0.50:0.95 (overall mAP)** → Epoch 9, value: 0.5303
- **mAP@IoU=0.50** → Epoch 7 and 12 to 18 value: 0.9792
- **mAP@IoU=0.75** → Epoch 6, value: 0.4170
- **Medium mAP** → Epoch 8, value: 0.4431
- **Large mAP** → Epoch 6, value: 0.5395
- **RPN Box Reg** → Epoch 5 to 18, value: 0.0008
- **Objectness** → Epoch 5 to 18, value: 0.0007
- **Box Reg** → Epoch 18, value: 0.0341
- **Classifier** → Epoch 4 and 5, value: 0.0182
- **Train loss** → Epoch 18, 12, 10, 7, value: 0.0540


The model which was chosen as the best model was the model with the highest mAP@IoU=0.50:0.95 (overall mAP), so the checkpoint at epoch 9

When evaluating the best model on the test set we get these metrics:
- mAP@=0.50:0.95: 0.5634
- mAP@0.5: 0.9892
- mAP@0.75: 0.4738
- mAP medium: 0.4648
- mAP large: 0.5724

Which are slightly better than the results on the validation set

## Training analysis
Based on the analysis of the training and validation metrics, it can be concluded that additional training with the current configuration (data and hyperparameters) yields diminishing returns. The training loss remains stable at approximately 0.0540 over the last 7 to 18 epochs, indicating little improvement with further training. Additionally, the validation mAP scores have plateaued between epochs 4 and 18, showing no significant change. 

Therefore, it can be inferred that training for around 7 epochs provides near-optimal results while minimizing the time spent on training.

## False positives

Predictions:
  - hazmat: 0.88
  - hazmat: 0.15
  - hazmat: 0.14
  - hazmat: 0.06

<br>
<img src="images/predictions/bmw_prediction.png" alt="Model prediction on a BMW licence plate" width=600>


Predictions:
- hazmat: 0.82
- hazmat: 0.81
- hazmat: 0.14
- hazmat: 0.11
- hazmat: 0.06
- hazmat: 0.06
- hazmat: 0.06
- hazmat: 0.06
- hazmat: 0.05
- hazmat: 0.05

<img src="images/predictions/trafficsignpred.png" alt="Model prediction on a traffic sign" width=600>


The model appears to produce some false positives, mistakenly identifying certain objects as hazmat placards when they are not. This issue is particularly prevalent with objects that are square and have colors such as yellow, red, or orange. Expanding the training dataset and applying data augmentation techniques should help mitigate this problem by improving the model's ability to differentiate between actual placards and visually similar objects.

## False negatives

<img src="images/predictions/hazmatclose.png" width=600>

In certain cases, the model failed to detect UN number placards in high-resolution images. This issue likely arises because the training dataset primarily consists of images where UN number placards were captured from a distance. As a result, the model may have overfitted to the assumption that UN numbers appear relatively small in images. To address this, applying targeted data augmentation techniques—such as zooming in on UN numbers, rotating them (e.g., upside down), and varying their orientation—can help the model generalize better to different scales and perspectives.

## Weather Conditions

For this project, it was essential to develop a robust model capable of performing well under various weather conditions. However, our dataset primarily consisted of images captured under different lighting conditions, such as day and night, without significant weather variations.  

To address this limitation, we evaluated the model using augmented images generated with the **Albumentations** library, applying weather-related transformations such as **rain, sunflare, shadow, and fog.** The model performed reasonably well on these augmented images, successfully identifying objects in most cases. However, it is important to note that data augmentation does not always provide a fully realistic representation of real-world weather conditions.  

While the results indicate that the model can likely handle different weather conditions to some extent, further improvements are needed to ensure robust performance in real-life scenarios. Incorporating actual weather-diverse data, along with advanced augmentation techniques that simulate real-world complexity more accurately, would enhance the model's generalization capabilities.

### Predictions on Augmented Images:

<h4> Rain </h4>
<img src="images/predictions/rainpred.png" alt="Augmented image with rain effect" width="400"/>
<h4> Flare </h4>
<img src="images/predictions/flare.png" alt="Augmented image with sunflare effect" width="400"/>
<h4> Shadows </h4>
<img src="images/predictions/shadowspred.png" alt="Augmented image with shadow effect" width="400"/>
<h4> Fog </h4>
<img src="images/predictions/fog_pred.png" alt="Augmented image with fog effect" width="400"/>

## Reflection on Data Mining Goals

### **Primary Data Mining Goal:**
Create and train an object detection model capable of identifying and interpreting UN number hazard plates on freight wagons in real-time.

---

### **Specific Data Mining Goals:**

#### **1. Object Detection and Localization:**
Develop a model that achieves a high AP score for accurately detecting and localizing hazard plates on freight wagons within each video frame.

**Approach and Outcome:**  
To accomplish this goal, we finetuned a Faster R-CNN model, which demonstrated promising results in localizing hazard placards. However, the model is not yet fully robust and occasionally produces false positives. These false detections often occur when objects with similar visual characteristics—such as square shapes and colors resembling hazard placards (e.g., yellow, red, or orange)—are present in the scene.

**Improvement Strategies:**
- **1.1: Expanding the training dataset** with a greater variety of real-world scenarios to improve generalization.
- **1.2: Advanced data augmentation**, such as applying transformations that simulate real-life conditions (e.g., partial occlusion, varying angles, and different lighting conditions).

---

#### **2. Robustness Across Variable Conditions:**
Enhance the model’s robustness by training it on datasets representing diverse lighting and weather conditions, with a goal to maintain high AP scores across these environments.

**Approach and Outcome:**  
While we did not have access to datasets covering a wide range of weather conditions, we leveraged data captured at different times of the day, covering various lighting conditions such as daytime, nighttime, and low-light scenarios. The model demonstrated strong performance across these lighting variations, indicating a certain level of robustness in this aspect.

To evaluate its performance under different weather conditions, we used data augmentation techniques from the **Albumentations** library. Augmentations like rain, sunflare, shadow, and fog were introduced to simulate adverse weather conditions. Although the model performed reasonably well on these augmented images, it is important to acknowledge that synthetic augmentations do not fully replicate real-world conditions.

**Further Improvements:**
To improve the model’s robustness, it would be beneficial to collect and incorporate real-world data by filming freight wagons across different seasons and weather conditions. This would ensure the model can generalize better to practical scenarios. Additional techniques to enhance robustness, such as advanced augmentation strategies, are discussed in sections **1.1** and **1.2**.

---

#### **3. Optimization for Real-Time Processing:**
Implement real-time object detection and OCR capabilities to ensure the model operates at a frame rate suitable for analyzing images from moving trains.

**Approach and Outcome:**  
Currently, the model does not perform in real-time. Inference can take up to **2.8 seconds per frame**, and reading the hazard placard (OCR) requires an additional **2 seconds**, making the total processing time **at least 4.8 seconds per frame**. This delay is far from real-time performance requirements, which typically demand processing speeds of **30 frames per second (FPS)** or faster, depending on the train's speed and camera setup.

**Optimization Strategies:**
To enhance processing speed, several optimization strategies can be considered:
- **4.1: Model Quantization:** Reducing the precision of model parameters (e.g., from 32-bit floating point to 8-bit integers) to speed up computations with minimal accuracy loss.
- **4.2: Efficient Attention Mechanisms:** Using lightweight attention models to focus computational resources on relevant regions, improving both speed and accuracy.
- **4.3: Model Pruning:** Removing redundant weights and layers to reduce computation overhead.
- **4.4: Hardware Acceleration:** Leveraging GPUs, TPUs, or edge AI devices for faster inference.

#### Data Mining Success Criteria evaluation

- **Object Detection AP**: Achieve a Mean Average Precision (mAP) of at least 0.70 for detecting and localizing hazard plates across varied conditions.  
  **Outcome:** Not achieved, with an mAP of 0.5303.

- **OCR Precision for UN Numbers**: Ensure the Tesseract OCR module achieves high accuracy in reading UN numbers, even under challenging conditions, with a target precision score above 0.95.  
  **Outcome:** Tesseract was unable to consistently recognize codes in difficult conditions, so we switched to using the **idefics2 VLM**, which performed significantly better, even with low-quality images. However, accuracy metrics for idefics2 have not been formally evaluated.

- **Processing Speed**: Ensure the model achieves a processing time per frame under 100 milliseconds to maintain real-time functionality.  
  **Outcome:** Not achieved. The model takes longer to process predictions, and the OCR stage, which involves **idefics2**, also contributes to longer processing times, resulting in a total time greater than 100 milliseconds.

- **Environmental Robustness**: Maintain consistent mAP scores across different lighting and weather conditions.  
  **Outcome:** Partly achieved. Lighting conditions were well represented in the dataset, and the model performed consistently across different lighting variations. Weather conditions, however, were not included in the dataset, but the model performed reasonably well when evaluated with augmented data. The mAP score was not measured for augmented weather conditions.

# Deployment

## Demo for YOLO model

In [None]:
# Load the trained YOLO model
model = YOLO("./data/yolo/best_augmented_scaled.pt", task="detect")
# Check if CUDA is available and set the device
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = model.to(device)

# Function to draw bounding boxes on the frame
def draw_boxes(frame, results):
    for result in results:
        boxes = result.boxes
        for box in boxes:
            x_min, y_min, x_max, y_max = map(int, box.xyxy[0])  # Convert to integers
            confidence = box.conf[0]  # Confidence score
            class_id = int(box.cls[0])  # Class ID
            label = result.names[class_id]  # Class label

            # Draw the bounding box
            cv2.rectangle(frame, (x_min, y_min), (x_max, y_max), (0, 255, 0), 2)
            # Put the label and confidence score
            cv2.putText(frame, f" {confidence:.2f}", (x_min, y_min - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 255, 0), 1)
    return frame

# Open the camera
cap = cv2.VideoCapture(0)

while True:
    # Capture frame-by-frame
    ret, frame = cap.read()
    if not ret:
        break

    # Predict the frame using the YOLO model
    results = model(frame,stream=True)

    # Draw bounding boxes on the frame
    frame = draw_boxes(frame, results)

    # Display the resulting frame
    cv2.imshow('Frame', frame)

    # Break the loop on 'q' key press
    if cv2.waitKey(1) & 0xFF == ord('q'):
        break

# Release the camera and close all OpenCV windows
cap.release()
cv2.destroyAllWindows()