# Fine-tuning the YOLOv8 model

After the data preparation from the `preparation.ipynb` notebook, we can fine-tune the computer vision model for our specific needs. This should be done on a GPU-enabled machine, as the training process is computationally expensive. We recommend using `Google Colab` for this purpose if you don't have access to a paying Cloud license.

## 1. Connect to your drive and import packages

In [None]:
# # Let colab access the google drive where your files are stored
# from google.colab import drive
# drive.mount('/content/drive')

Mounted at /content/drive


In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import tensorflow as tf
import os
import shutil
from sklearn.model_selection import train_test_split

!pip install ultralytics
from ultralytics import YOLO

In [None]:
# Check if GPU is available (non-empty string means GPU available)
tf.test.gpu_device_name()

'/device:GPU:0'

## 2. Training and validation data preparation

In [None]:
# Set the DataFrame to match images and labels

# Set the directories
images_directory = "/content/drive/MyDrive/Masters 24/DeepLearning/DataSets/assignment_initial/split"
labels_directory = "/content/drive/MyDrive/Masters 24/DeepLearning/DataSets/assignment_initial/labels"

# Fetch the file names and directories
image_dir = [images_directory+"/"+file for file in os.listdir(images_directory)]
image_lab = [file.removesuffix('.jpg') for file in os.listdir(images_directory)]
label_dir = [labels_directory+"/"+file for file in os.listdir(labels_directory)]
label_lab = [file.removesuffix('.txt') for file in os.listdir(labels_directory)]

# We define two DataFrames and merge just to make sure that the keys match
image_df = pd.DataFrame({'image_dir':image_dir, 'image_lab':image_lab})
labels_df = pd.DataFrame({'label_dir':label_dir, 'label_lab':label_lab})
image_label_df = pd.merge(image_df, labels_df, left_on='image_lab', right_on='label_lab')
image_label_df.head()

Unnamed: 0,image_dir,image_lab,label_dir,label_lab
0,/content/drive/MyDrive/Masters 24/DeepLearning...,xai_med_0,/content/drive/MyDrive/Masters 24/DeepLearning...,xai_med_0
1,/content/drive/MyDrive/Masters 24/DeepLearning...,xai_med_6,/content/drive/MyDrive/Masters 24/DeepLearning...,xai_med_6
2,/content/drive/MyDrive/Masters 24/DeepLearning...,xai_med_4,/content/drive/MyDrive/Masters 24/DeepLearning...,xai_med_4
3,/content/drive/MyDrive/Masters 24/DeepLearning...,xai_med_3,/content/drive/MyDrive/Masters 24/DeepLearning...,xai_med_3
4,/content/drive/MyDrive/Masters 24/DeepLearning...,xai_med_1,/content/drive/MyDrive/Masters 24/DeepLearning...,xai_med_1


Based on our DataFrame, we then split the data into training and validation sets (70/30). We also create a `data.yaml` file that will be used during the trainig to know where to find the data to fine-tune the model. It can be found in the `resources` folder.

In [None]:
#split into training and validation sample
seed = 123
train01, val01 = train_test_split(image_label_df, test_size=0.3, random_state = seed)

We will structure our data according to the `data.yaml` file as follows:

```bash
> data
    > train
        > images
        > labels
    > valid
        > images
        > labels
    data.yaml
```

The `copy_data_to_yolofolders` is essentially a wrapper of the `shutil.copy2` function. It has been created by [Philippe Baecke](https://www.linkedin.com/in/philippebaecke/) and all credits go to him.

In [None]:
#This function will transfer the images and labels from their original directory indicated in the input_df and copy this to an output_path with a structure that can be used for YOLO
#please note that training data should have folder = "train", validation ="valid", test = "test"

def copy_data_to_yolofolders(input_df, image_dir, label_dir, folder, output_path):
    # Create the output directory if it doesn't exist
    output_folder = os.path.join(output_path, folder)

    # Delete existing files in the output directory
    if os.path.exists(output_folder):
        shutil.rmtree(output_folder)

    # Create the output directories
    os.makedirs(os.path.join(output_folder, 'images'), exist_ok=True)
    os.makedirs(os.path.join(output_folder, 'labels'), exist_ok=True)

    # Iterate through the dataframe and copy files
    for index, row in input_df.iterrows():
        image_path = row[image_dir]
        label_path = row[label_dir]

        # Extract the filename from the source path
        image_filename = os.path.basename(image_path)
        label_filename = os.path.basename(label_path)

        # Define output destinations
        output_image = os.path.join(output_folder, 'images', image_filename)
        output_label = os.path.join(output_folder, 'labels', label_filename)

        # Copy image and label to the output directory
        shutil.copy2(image_path, output_image)
        shutil.copy2(label_path, output_label)

    print(f"Data copied to {output_folder}")

In [None]:
# copy_data_to_yolofolders(input_df=train01,
#                          image_dir = "image_dir",
#                          label_dir = "label_dir",
#                          folder = "train",
#                          output_path = "/content/drive/MyDrive/Masters 24/DeepLearning/DataSets/assignment" )

# copy_data_to_yolofolders(input_df=val01,
#                          image_dir = 'image_dir',
#                          label_dir = "label_dir",
#                          folder = "valid",
#                          output_path = "/content/drive/MyDrive/Masters 24/DeepLearning/DataSets/assignment" )

Data copied to /content/drive/MyDrive/Masters 24/DeepLearning/DataSets/assignment/train
Data copied to /content/drive/MyDrive/Masters 24/DeepLearning/DataSets/assignment/valid


## 3. Model fine-tuning

We fine-tune the largest YOLOv8 model with our data. The model is trained for 100 epochs, with a patience of 5. Then the model is moved from the `runs` folder to the `chosen` path.

The `yolo_transfer_results` function is a wrapper of `shutil.move` function and has also been created by [Philippe Baecke](https://www.linkedin.com/in/philippebaecke/).

In [None]:
model = YOLO('yolov8x.pt')
results = model.train(data='/content/drive/MyDrive/Masters 24/DeepLearning/DataSets/assignment/data.yaml', epochs=100, imgsz=640, patience = 5, plots = True)

Downloading https://github.com/ultralytics/assets/releases/download/v0.0.0/yolov8n.pt to 'yolov8n.pt'...


100%|██████████| 6.23M/6.23M [00:00<00:00, 139MB/s]


In [None]:
import os
import shutil

def yolo_transfer_results(source_path, destination_path, overwrite=1):
# Transfer results from runs to a destination path.
# Parameters:
# - source_path (str): The source path to the results.
# - destination_path (str): The destination path to transfer the results.
# - overwrite (int): Set overwrite to 0 to check for an existing destination path, set to 1 to overwrite, set to 2 to add content to the destination.

    # Check if the destination path exists
    if os.path.exists(destination_path):
        if overwrite == 0:
            raise ValueError("Destination path already exists. Set 'overwrite' to 1 to overwrite.")
        elif overwrite == 1:
            # If overwrite is set to 1, clear the destination path
            shutil.rmtree(destination_path, ignore_errors=True)  # Remove any remaining files or subdirectories

    # Move the source directory to the destination
    shutil.move(source_path, destination_path)

In [None]:
yolo_transfer_results(source_path = "runs/detect/train",
                      destination_path = "/content/drive/MyDrive/Masters 24/DeepLearning/Models/assignment/Large Model",
                      overwrite = 0)