<a href="https://www.kaggle.com/code/andriybabiy/03-cnn-applications-indoor-object-detection?scriptVersionId=201094349" target="_blank"><img align="left" alt="Kaggle" title="Open in Kaggle" src="https://kaggle.com/static/images/open-in-kaggle.svg"></a>

# 03-CNN_Applications-Indoor-Object-Detection
Using the indoor-object-detection dataset, my goal is to develop and train a deep learining model that will recognise indoor items from an image.

# Importing the required libraries

In [None]:
!pip install --upgrade ultralytics
!pip install --upgrade -U ray[tune]
!pip install -U ipywidgets

In [None]:
import warnings
warnings.filterwarnings("ignore")

import os
import re
import glob
import random
import yaml

import torch
import gc

import numpy as np
import pandas as pd

import matplotlib.pyplot as plt
from matplotlib.patches import Rectangle
import seaborn as sns

from PIL import Image
import cv2

from ultralytics import YOLO

%matplotlib inline

! wandb disabled

# Importing and preparing the dataset of images

In [None]:
class CFG:
    DEBUG = False
    FRACTION = 0.05 if DEBUG else 1.0
    SEED = 42
    
    CLASSES = ['door',
               'cabinetDoor',
               'refrigeratorDoor',
               'window',
               'chair',
               'table',
               'cabinet',
               'couch',
               'openedDoor',
               'pole']
    NUM_CLASSES_TO_TRAIN = len(CLASSES)
    
    EPOCHS = 3 if DEBUG else 100 #70
    BATCH_SIZE = 16
    
    BASE_MODEL = 'yolov9s'
    BASE_MODEL_WEIGHTS = f'{BASE_MODEL}.pt'
    EXP_NAME = f'ppe_css_{EPOCHS}_epochs'
    
    OPTIMIZER = 'auto'
    LR = 1e-5
    LR_FACTOR = 0.01
    WEIGHT_DECAY = 5e-4
    DROPOUT = 0.0
    PATIENCE = 30
    PROFILE = False
    LABEL_SMOOTHING = 0.0
    
    CUSTOM_DATASET_DIR = '/kaggle/input/indoor-object-detection'
    OUTPUT_DIR = '/kaggle/working/'

In [None]:
dict_file = {
    'train': os.path.join(CFG.CUSTOM_DATASET_DIR, 'train'),
    'val': os.path.join(CFG.CUSTOM_DATASET_DIR, 'valid'),
    'test': os.path.join(CFG.CUSTOM_DATASET_DIR, 'test'),
    'nc': CFG.NUM_CLASSES_TO_TRAIN,
    'names': CFG.CLASSES
    }

with open(os.path.join(CFG.OUTPUT_DIR, 'data.yaml'), 'w+') as file:
    yaml.dump(dict_file, file)

In [None]:
### read yaml file created
def read_yaml_file(file_path = CFG.CUSTOM_DATASET_DIR):
    with open(file_path, 'r') as file:
        try:
            data = yaml.safe_load(file)
            return data
        except yaml.YAMLError as e:
            print("Error reading YAML:", e)
            return None

### print it with newlines
def print_yaml_data(data):
    formatted_yaml = yaml.dump(data, default_style=False)
    print(formatted_yaml)

file_path = os.path.join(CFG.OUTPUT_DIR, 'data.yaml')
yaml_data = read_yaml_file(file_path)

if yaml_data:
    print_yaml_data(yaml_data)

# Looking into the dataset components

In [None]:
def display_image(image, print_info = True, hide_axis = False):
    if isinstance(image, str):  # Check if it's a file path
        img = Image.open(image)
        plt.imshow(img)
    elif isinstance(image, np.ndarray):  # Check if it's a NumPy array
        image = image[..., ::-1]  # BGR to RGB
        img = Image.fromarray(image)
        plt.imshow(img)
    else:
        raise ValueError("Unsupported image format")

    if print_info:
        print('Type: ', type(img), '\n')
        print('Shape: ', np.array(img).shape, '\n')

    if hide_axis:
        plt.axis('off')

    plt.show()

In [None]:
dict_file['train']

In [None]:
example_image_path = f'{CFG.CUSTOM_DATASET_DIR}/train/images/000bf0ddff4c7310.jpg'

display_image(example_image_path)

In [None]:
def plot_random_images_from_folder(folder_path, num_images=20, seed=CFG.SEED):
    random.seed(seed)
    
    # Get a list of image files in the folder
    image_files = [f for f in os.listdir(folder_path) if f.endswith(('.jpg', '.png', '.jpeg', '.gif'))]
    
    # Ensure that we have at least num_images files to choose from 
    if len(image_files) < num_images: 
        raise ValueError("Not enough images in the folder")
    
    # Randomly select num_images image files
    selected_files = random.sample(image_files, num_images)
    
    # Create a subplot grid
    num_cols = 5
    num_rows = (num_images + num_cols - 1) // num_cols
    fig, axes = plt.subplots(num_rows, num_cols, figsize=(12, 8))
    
    for i, file_name in enumerate(selected_files):
        # Open and display the images using PIL
        img = Image.open(os.path.join(folder_path, file_name))
        
        if num_rows == 1:
            ax = axes[i % num_cols]
        else:
            ax = axes[i // num_cols, i % num_cols]
            
        ax.imshow(img)
        ax.axis('off')
        
    # Remove empty subplots
    for i in range(num_images, num_rows * num_cols):
        if num_rows == 1:
            fig.delaxes(axes[i % num_cols])
        else:
            fig.delaxes(axes[i // num_cols, i % num_cols])
    
    plt.tight_layout()
    plt.show()

In [None]:
folder_path = CFG.CUSTOM_DATASET_DIR + '/train/images'
plot_random_images_from_folder(folder_path)

In [None]:
def get_image_properties(image_path):
    # Read the image file
    img = cv2.imread(image_path)
    
    # Check if the image file is read correctly
    if img is None:
        raise ValueError("Could not read image file")
        
    properties = {
        "width": img.shape[1],
        "height": img.shape[0],
        "channels": img.shape[2] if len(img.shape) == 3 else 1,
        "dtype": img.dtype
    }
    
    return properties

In [None]:
img_properties = get_image_properties(example_image_path)
img_properties

In [None]:
class_idx = {str(i): CFG.CLASSES[i] for i in range(CFG.NUM_CLASSES_TO_TRAIN)}

class_stat = {}
data_len = {}
class_info = []

    
for mode in ['train', 'valid', 'test']:
    class_count = {CFG.CLASSES[i]: 0 for i in range(CFG.NUM_CLASSES_TO_TRAIN)}

    path = os.path.join(CFG.CUSTOM_DATASET_DIR, mode, 'labels')

    for file in os.listdir(path):
        with open(os.path.join(path, file)) as f:
            lines = f.readlines()

            for cls in set([line[0] for line in lines]):
                class_count[class_idx[cls]] += 1

    data_len[mode] = len(os.listdir(path))
    class_stat[mode] = class_count

    class_info.append({'Mode': mode, **class_count, 'Data_Volume': data_len[mode]})
            
dataset_stats_df = pd.DataFrame(class_info)
with pd.option_context('display.max_columns', None):
    display(dataset_stats_df)


In [None]:
fig, axes = plt.subplots(1, 3, figsize=(15, 5))

for i, mode in enumerate(['train', 'valid', 'test']):
    sns.barplot(
        data=dataset_stats_df[dataset_stats_df['Mode'] == mode].drop(columns="Mode"),
        orient='v',
        ax=axes[i],
        palette='Set2'        
    )
    
    axes[i].set_title(f'{mode.capitalize()} Class Statistics')
    axes[i].set_xlabel('Classes')
    axes[i].set_ylabel('Count')
    axes[i].tick_params(axis='x', rotation=90)
    
    # add annotations on top of each bar
    for p in axes[i].patches:
        axes[i].annotate(f"{int(p.get_height())}", (p.get_x() + p.get_width() / 2., p.get_height()),
                         ha='center', va='center', fontsize=8, color='black', xytext=(0, 5),
                         textcoords='offset points')

plt.tight_layout()
plt.show()

# Training the model.

In [None]:
model = YOLO(CFG.BASE_MODEL_WEIGHTS)

results = model.predict(
    source = example_image_path,
    
    classes = [0],
    conf = 0.30,
#     device = [0, 1], # inference with dual GPU
    device = None, # inference with GPU
    imgsz = (img_properties['height'], img_properties['width']),
    
    save = True,
    save_txt = True,
    save_conf = True,
    exist_ok = True
)

In [None]:
example_image_inference_output = example_image_path.split('/')[-1]
display_image(f'runs/detect/predict/{example_image_inference_output}')

In [None]:
print('Model:', CFG.BASE_MODEL_WEIGHTS)
print('Epochs: ', CFG.EPOCHS)
print('Batch: ', CFG.BATCH_SIZE)

In [None]:
model = YOLO(CFG.BASE_MODEL_WEIGHTS)

In [None]:
# def cleanup_memory_callback(trainer):
#     gc.collect()
#     torch.cuda.empty_cache()
#     print("Memory cleaned after epoch")

def cleanup_before_training():
    gc.collect()
    torch.cuda.empty_cache()
    print("Memory cleaned befort training")

In [None]:
%%time

cleanup_before_training()
os.environ['WANDB_MODE'] = 'offline'

### train
model.train(
    data = os.path.join(CFG.OUTPUT_DIR, 'data.yaml'),

    task = 'detect',

    imgsz = (img_properties['height'], img_properties['width']),

    epochs = CFG.EPOCHS,
    batch = CFG.BATCH_SIZE,
    optimizer = CFG.OPTIMIZER,
    lr0 = CFG.LR,
    lrf = CFG.LR_FACTOR,
    weight_decay = CFG.WEIGHT_DECAY,
    dropout = CFG.DROPOUT,
    fraction = CFG.FRACTION,
    patience = CFG.PATIENCE,
    profile = CFG.PROFILE,
    label_smoothing = CFG.LABEL_SMOOTHING,

    name = f'{CFG.BASE_MODEL}_{CFG.EXP_NAME}',
    seed = CFG.SEED,
    
    val = True,
    amp = True,    
    exist_ok = True,
    resume = False,
    device = [0], 
#     device = None, # CPU run
    verbose = False,
)

In [None]:
model.export(
    format = 'onnx',
    imgsz = (img_properties['height'], img_properties['width']),
    half = False,
    int8 = False,
    simplify = False,
    nms = False
)

In [None]:
results_paths = [
    i for i in 
    glob.glob(f'{CFG.OUTPUT_DIR}runs/detect/{CFG.BASE_MODEL}_{CFG.EXP_NAME}/*.png') +
    glob.glob(f'{CFG.OUTPUT_DIR}runs/detect/{CFG.BASE_MODEL}_{CFG.EXP_NAME}/*.png')
    if 'batch' not in i
]

results_paths

In [None]:
for file in sorted(results_paths):
    print(file)
    display_image(file, print_info = False, hide_axis = True)
    print('\n')

In [None]:
df = pd.read_csv(f'{CFG.OUTPUT_DIR}runs/detect/{CFG.BASE_MODEL}_{CFG.EXP_NAME}/results.csv')
df = df.rename(columns=lambda x: x.replace(" ", ""))
df.to_csv(f'{CFG.OUTPUT_DIR}training_log_df.csv', index=False)
df

In [None]:
print('*'*50)
print('\nBest Training Box loss: ', df['train/box_loss'].min(), ', on epoch: ', df['train/box_loss'].argmin() + 1, '\n')
print('\nBest Validation Box loss: ', df['val/box_loss'].min(), ', on epoch: ', df['val/box_loss'].argmin() + 1, '\n')

print('='*50)
print('\nBest Training Cls loss: ', df['train/cls_loss'].min(), ', on epoch: ', df['train/cls_loss'].argmin() + 1, '\n')
print('\nBest Validation Cls loss: ', df['val/cls_loss'].min(), ', on epoch: ', df['val/cls_loss'].argmin() + 1, '\n')

print('='*50)
print('\nBest Training DFL loss: ', df['train/dfl_loss'].min(), ', on epoch: ', df['train/dfl_loss'].argmin() + 1, '\n')
print('\nBest Validation DFL loss: ', df['val/dfl_loss'].min(), ', on epoch: ', df['val/dfl_loss'].argmin() + 1, '\n')

In [None]:
fig, (ax1, ax2, ax3) = plt.subplots(3, 1, figsize=(10, 15), sharex=True)

### Training and Validation box_loss
ax1.set_title('Box Loss')
ax1.plot(df['epoch'], df['train/box_loss'], label='Training box_loss', marker='o', linestyle='-')
ax1.plot(df['epoch'], df['val/box_loss'], label='Validation box_loss', marker='o', linestyle='-')
ax1.set_ylabel('Box Loss')
ax1.legend()
ax1.grid(True)

### Training and Validation cls_loss
ax2.set_title('Cls Loss')
ax2.plot(df['epoch'], df['train/cls_loss'], label='Training cls_loss', marker='o', linestyle='-')
ax2.plot(df['epoch'], df['val/cls_loss'], label='Validation cls_loss', marker='o', linestyle='-')
ax2.set_ylabel('cls_loss')
ax2.legend()
ax2.grid(True)

### Training and Validation dfl_loss
ax3.set_title('DFL Loss')
ax3.plot(df['epoch'], df['train/dfl_loss'], label='Training dfl_loss', marker='o', linestyle='-')
ax3.plot(df['epoch'], df['val/dfl_loss'], label='Validation dfl_loss', marker='o', linestyle='-')
ax3.set_xlabel('Epochs')
ax3.set_ylabel('dfl_loss')
ax3.legend()
ax3.grid(True)

plt.suptitle('Training Metrics vs. Epochs')
plt.show()

In [None]:
validation_results_paths = [
    i for i in
    glob.glob(f'{CFG.OUTPUT_DIR}runs/detect/{CFG.BASE_MODEL}_{CFG.EXP_NAME}/*.png') +
    glob.glob(f'{CFG.OUTPUT_DIR}runs/detect/{CFG.BASE_MODEL}_{CFG.EXP_NAME}/*.jpg')
    if 'val_batch' in i
]

len(validation_results_paths)

In [None]:
if len(validation_results_paths) >= 1:
    print(validation_results_paths[-1])

In [None]:
### check predictions or labels from a random validation batch
if len(validation_results_paths) >= 1:
    val_img_path = random.choice(validation_results_paths)
    print(val_img_path)
    display_image(val_img_path, print_info = False, hide_axis = True)

# Result analysis

In this analysis, using the indoor-object-detection dataset and the YOLO model it is evident that the convolutional neural network for classification of elments found in the images.

Following the initial running of the model the results were that the model worked but the accuracy was quite low, here I trained the model using 100 epochs with the YOLOv8s model the ultimate result showed an F1 confidence of 42% at a confidence level of around 20%. This was following around 40minues of training required to train the model. This earlier version of YOLO also showed that there were many of the smaller elements that were often confused for the background. However the learning curves for the 

Following this result I devided to once agin run the analysis using a newer version being YOLOv9s. This trianing was similarly training for 100 epochs, however took 2.2hours. This newer model showed slightly improved results in the space of the F1-Confidence curves with the model being able to accurately identify 45% of the items at a 23% confidence, this is an improvement on the previous model however shows that there is still a lot of room for imporvement. For the F1, precision and recall graphs it is evident that there are classes that are outperforming (cabinetDoor and refrigeratorDoor) who are getting F1 scores close to 80% showing that the classes with more instances have better overall perfromance. This behaviour is also evident in the normalized confusion matrix graphs as the classes that have low represntation in the dataset are often confused with the background. Another interesting ocurrence is that the openDoor and door class are often confused with each other in the normalized confusion matrix showing that there is possibly training ocurring based on the general shape of the door including the frame which causes the confusion.

The loss function grpahs also show that there is some overtraining ocurring as the box loss and cls loss are flattening on the test data while they are decreasing in the train data, the overfitting is also most evident in the DFL loss, which is actually becoming worse over time showing that the extra training on the train data is not generalizing to the test dataset.

Based on these elements, the conculsion is that the training was adequate for some of the classes but not for all of them. In order to further improve the performance of the model there is room to decrease the learning rate from the current 1e-5 and to increase the amount of epochs. This will increase the training time, however may bring the results needed. This recommendation is based on the volatile graphs seen in the percision and recall metrics graphs. Another possible area would be to increase the amount of images in the training dataset using image transformation as due to the low amount of some classes, there were issues with the training.
