# Data preparation and training

These scripts are designed to prepare data for YOLO training.

The training folder must contain three elements :
- a 'labels' folder: in which annotation files are stored,
- an 'images' folder: in which image files are stored,
- a 'labels.txt' file: containing annotation data in YOLO format: 
    - '0': 'class 0',/n '1': 'class 1',/n etc.
    

The *Image transformation* section is designed to double the training dataset by starting from annotated image data, using image warping: images and annotations are deformed according to the same transformation matrix.

For each image and annotation file, a new file is generated with the following characteristics:

+ A new image with a modified perspective: a new image is created by applying a perspective transformation matrix to the original image.
+ A new text file containing annotations adjusted according to the transformation applied to the image: each annotation associated with the original image is also transformed using the same perspective transformation matrix. The new annotation coordinates are then saved in a new text file.

The data generated by perspective transformations are labelled "filename_TP".

All these scripts are designed to process image and text data with the same name (except for the extension) contained in the 'labels' and 'images' folders.

You can use unannotated images for the training session. In this case you can create an empty file or no file, the result will be the same: https://github.com/ultralytics/yolov5/discussions/7148#discussioncomment-2440612 

**Notice concerning use** 
Any use, even partial, of the content of this notebook must be accompanied by an appropriate citation.

&copy; 2023 Marion Charpier

## Environment


In [3]:
import time
from datetime import datetime
import os
import shutil
import random
from pathlib import Path

import cv2
import matplotlib.pyplot as plt
import numpy as np
import torch
from ultralytics import YOLO
import yaml

import sys
sys.path.append(os.path.join('..', 'modules'))

from class_names_functions import get_labels
from corners_functions import get_corners, from_corners_to_relative
from transform_coordinates_functions import from_relative_coordinates_to_absolute
from device_function import which_device

## Cleaning annotation files (.txt)

In [12]:
def clean_comma(training_folder):
    """
    This function removes any commas that may appear in the annotation `.txt` files within the specified training folder.
    This is particularly useful when annotation files are generated or modified from CSV files, 
    as commas can accidentally be included and cause issues during model training.

    :param training_folder: 
        - Type: str
        - Description: The absolute path to the training folder containing the 'labels' subdirectory. 
                       The function will look for `.txt` files in the 'labels' folder and remove commas from them.

    :return: 
        - Type: None
        - Description: This function does not return a value. It modifies the content of the `.txt` files in place, 
                       removing any commas that are found.

    This function ensures that annotation files are formatted correctly, preventing errors during the training process.
    """

    for filename in os.listdir(os.path.join(training_folder, 'labels')):
        if filename.endswith('.txt'):
            file_path = os.path.join(training_folder, 'labels', filename)
            # print(file_path)
            
            # Read the file content
            with open(file_path, 'r') as file:
                content = file.read()
            
            # Remove commas
            content_without_comma = content.replace(',', '')
            
            # Write the modified content in the file
            with open(file_path, 'w') as file:
                file.write(content_without_comma)

## Increase dataset

### Image transformation

In [13]:
def perspective_transformation(img_file):
    """
    This function applies a perspective transformation to the given image. It generates a new transformed image 
    with a different perspective by adjusting the image dimensions randomly. The function saves the transformed 
    image with the same name as the original image, appending '_PT' to indicate the perspective transformation.

    :param img_file: 
        - Type: str
        - Description: The absolute path to the image file that needs to be transformed.

    :return: 
        - Type: numpy.ndarray
        - Description: Returns the transformation matrix used for the perspective transformation. This matrix 
                       can be used to understand the geometric changes applied to the image.

    This function is useful for data augmentation and generating variations of the original images, which can be 
    beneficial for training machine learning models with diverse perspectives.
    """

    # Open image and get dimensions
    img = cv2.imread(img_file)
    rows, cols = img.shape[:2]

    # Define the points of origin for the perspective transformation.
    # These points form a quadrilateral covering the entire original image.
    pts1 = np.float32([[0, 0], [cols, 0], [cols, rows], [0, rows]])

    # Generate a new random width and height for the transformed image, between 30% and 80% of the original width.
    new_width = random.randint(int(cols*0.3), int(cols*0.8))
    new_height = random.randint(int(rows*0.3), int(rows*0.8))

    # Define the new points for the perspective transformation
    pts2 = np.float32([[0, 0], [new_width, 0], [new_width, new_height], [0, new_height]])

    # Calculate the perspective transformation matrix
    M = cv2.getPerspectiveTransform(pts1, pts2)

    # Apply the perspective transformation to the original image.
    dst = cv2.warpPerspective(img, M, (new_width, new_height))

    # Save the transformed image in the output folder
    transformed_img = Path(img_file).with_suffix('').as_posix() + '_PT' + Path(img_file).suffix
    cv2.imwrite(transformed_img, dst)

    return M

### Annotations transformation

In [None]:
def perspective_transformation_annotation(ann_file, img_file, M):
    """ 
    This function applies a perspective transformation matrix to the bounding box annotations of an image 
    and saves the new transformed annotations in a separate file. The transformed annotations correspond to
    the modified perspective and dimensions of the image after applying the perspective transformation.

    :param ann_file: 
        - Type: str
        - Description: The absolute path to the annotation file (`.txt`) associated with the image. 
                       The file should contain bounding box annotations in YOLO format 
                       (label, x_center, y_center, width, height).

    :param img_file: 
        - Type: str
        - Description: The absolute path to the original image file. This is used to retrieve 
                       the original image dimensions and the dimensions of the transformed image.

    :param M: 
        - Type: numpy.ndarray
        - Description: The transformation matrix used for perspective transformation. This matrix 
                       is used to transform the bounding box coordinates to match the new image perspective.

    :return: 
        - Type: list of tuples
        - Description: Returns a list of transformed bounding box annotations. Each tuple contains 
                       the label and new relative coordinates (x_center, y_center, width, height) 
                       after applying the perspective transformation.

    The function ensures that the bounding box annotations remain consistent with the perspective changes applied to the image,
    which is essential for maintaining annotation accuracy after transformations.
    """

    img_height, img_width = cv2.imread(img_file).shape[:2]
    TP_img_height, TP_img_width = cv2.imread(Path(img_file).with_suffix('').as_posix() + '_PT' + Path(img_file).suffix).shape[:2]
    
    # print(f"Origal size: {img_height}, {img_width}\nNew size: {TP_img_height}, {TP_img_width}")

    # Initialising a list to store the new bounding box coordinates 
    bb_coordinates = []

    with open(ann_file, 'r') as annotations:
        for annotation in annotations:

            # Extraire les coordonnées de l'annotation
            label, x_center, y_center, width, height = annotation.split()
            
            #print(type(label), type(x_center), type(y_center), type(width), type(height))

            # Convertir les coordonnées relatives en coordonnées absolues
            corners = get_corners(x_center, y_center, width, height, img_width, img_height)

            # Appliquer la transformation aux coins de la boîte d'annotation
            corners = np.array(corners, dtype=np.float32).reshape(-1, 1, 2)
            transformed_corners = cv2.perspectiveTransform(corners, M).reshape(-1, 2)

            # Calculer les nouvelles coordonnées relatives
            new_upper_left = transformed_corners[0]
            new_bottom_right = transformed_corners[2]

            #print(f'new_upper_left = {new_upper_left}, new_bottom_right = {new_bottom_right}')

            # Transformer les nouvelles coordonnés en relatives
            transformed_x_center, transformed_y_center, transformed_width, transformed_height = from_corners_to_relative(
                new_upper_left, new_bottom_right, TP_img_width, TP_img_height)

            bb_coordinates.append((label, transformed_x_center, transformed_y_center, transformed_width, transformed_height))

    new_annotation = Path(ann_file).with_suffix('').as_posix() + '_PT' + Path(ann_file).suffix

    with open(new_annotation, 'w') as transformed_annotations:
        for bb in bb_coordinates:
            label, x, y, w, h = bb
            transformed_annotations.write(f"{label} {x} {y} {w} {h}\n")


    return bb_coordinates

### Generate transformed data

In [15]:
def generate_transformed_data(training_folder):
    """
    This function generates a set of transformed images and their corresponding annotations by applying 
    perspective transformations to each image and adjusting the bounding box annotations accordingly. 
    The new images and annotations are saved in the appropriate folders within the specified training folder.

    :param training_folder: 
        - Type: str
        - Description: The absolute path to the training folder containing the dataset. The folder should 
                       have subdirectories named 'images' for storing image files and 'labels' for storing 
                       the corresponding annotation files.

    :return: 
        - Type: None
        - Description: This function does not return a value. It generates transformed images and annotation files 
                       in place, saving them in the same subdirectories with modified filenames.

    This function automates the data augmentation process by generating new variations of the dataset, 
    which can be used to enhance model robustness during training.
    """
    
    print('Images tranformation has started..')
    
    img_folder = os.path.join(training_folder, 'images')
    annotations_folder = os.path.join(training_folder, 'labels')

    images = [img for img in os.listdir(img_folder) if img.endswith(('.jpg', '.jpeg', '.png'))]

    for img in images:
        img_file = os.path.join(img_folder, img)
        ann_file = os.path.join(annotations_folder, Path(img).stem + '.txt')
        
        if os.path.exists(ann_file):
            M = perspective_transformation(img_file)
            perspective_transformation_annotation(ann_file, img_file, M)

    print(f'New images stored in {img_folder}\nNew annotations stored in {annotations_folder}')

## Create the training dataset

In [4]:
def create_training_dataset(training_folder, pretrained_model, Preexisting_Distribution):
    """
    This function prepares the training and validation datasets by creating text files that list the images used for each subset.
    It generates three text files in the 'dataset_statistics' subdirectory of the training folder:

    1. `traindata.txt`: Contains the list of images used for training (80% of the data).
    2. `valdata.txt`: Contains the list of images used for validation (20% of the data).
    3. `training_dataset.txt`: Contains the list of all images used (both training and validation).

    If a pre-existing distribution of training and validation data is provided, the function uses it instead 
    of creating new text files. The function then splits the data into subdirectories for training and 
    validation based on the text files.

    :param training_folder: 
        - Type: str
        - Description: The absolute path to the folder containing the dataset. The folder should include 
                       an 'images' subdirectory for image files and a 'labels' subdirectory for corresponding 
                       annotation files.

    :param pretrained_model: 
        - Type: str
        - Description: The absolute path to the folder containing the pre-trained model data. 
                       This parameter is currently unused in the function but may be required for future enhancements.

    :param Preexisting_Distribution: 
        - Type: bool
        - Description: If `True`, the function will create new text files for training, validation, 
                       and the entire dataset. If a path is provided, the function will use the pre-existing 
                       distribution from that path instead of creating new files.

    :return: 
        - Type: None
        - Description: This function does not return a value. It creates and saves the text files 
                       `traindata.txt`, `valdata.txt`, and `training_dataset.txt` and organizes the images 
                       and labels into separate subdirectories for training and validation.

    This function ensures that the training and validation data are correctly organized and ready for model training.
    """

    folder_base = os.path.dirname(training_folder)
    dataset_name = os.path.basename(training_folder)
    
    # Folder in which all training-related files are stored
    stat_folder = os.path.join(training_folder, 'dataset_statistics')
    
    if Preexisting_Distribution:
        print(f'Use pre-existing files from {Preexisting_Distribution}.')
        shutil.copytree(os.path.join(Preexisting_Distribution, 'dataset_statistics'), stat_folder)
   
    else:
        # Get a list of the images
        files = os.listdir(os.path.join(training_folder, 'images'))

        # Filter file names to keep only those with ".jpg" and ".png" extensions
        image_files = [f for f in files if f.endswith(".jpg") or f.endswith(".png")]

        # Shuffle file names randomly
        random.shuffle(image_files)


        """
        Ajouter ici le sklearn pour split avec équilibre des classes
        """
        # Calcul le nombre d'images pour chaque ensemble
        num_images = len(image_files)
        num_train = int(num_images * 0.8)
        num_val = int(num_images - num_train)

        # Divide file names into two sets : one for the training, one for the validation
        train_files = image_files[:num_train]
        val_files = image_files[num_train:num_train+num_val]
        
        # Check if the destination folder exists, if not create it
        os.makedirs(stat_folder, exist_ok=True)

        # Create a file with the list for the train data
        with open(os.path.join(stat_folder, 'traindata.txt'), 'w') as f:
            for image_file in train_files:
                f.write(os.path.join(training_folder, 'image_inputs/ground_truth_images', image_file) + "\n")
        print(f"File create in {os.path.join(stat_folder, 'traindata.txt')}")

        # Create a file with the list for valdidation data
        with open(os.path.join(stat_folder,'valdata.txt'), 'w') as f:
            for image_file in val_files:
                f.write(os.path.join(training_folder, 'image_inputs/ground_truth_images', image_file) + "\n")
        print(f"File create in {os.path.join(stat_folder, 'valdata.txt')}")


        # Create a file with all the dataset
        with open(os.path.join(stat_folder, 'training_dataset.txt'), 'w') as f:
            for image_file in image_files:
                    f.write(os.path.join(training_folder, 'image_inputs/ground_truth_images', image_file) + "\n")
            print(f"File create {os.path.join(stat_folder, 'training_dataset.txt')}")
    

    # Split images and txt files into folders from a .txt file
    split_data_for_training(os.path.join(stat_folder, 'traindata.txt'), 
                            os.path.join(training_folder, 'labels'), 
                            os.path.join(folder_base, 'datasets', dataset_name, 'images/train'), 
                            os.path.join(folder_base, 'datasets', dataset_name, 'labels/train'))
    
    split_data_for_training(os.path.join(stat_folder,'valdata.txt'),
                            os.path.join(training_folder, 'labels'),
                            os.path.join(folder_base, 'datasets', dataset_name, 'images/val'),
                            os.path.join(folder_base, 'datasets', dataset_name, 'labels/val'))

## Split the data for training

In [17]:
def split_data_for_training(txt_list, txt_folder, output_img_folder, output_txt_folder):
    """
    This function organizes images and annotation files into the appropriate subdirectories required by YOLOv8 
    for training and validation. It moves the images and annotations listed in the specified `.txt` file 
    into designated 'train' or 'val' folders for both images and labels. This function simplifies the preparation 
    of datasets by automating the process of organizing images and annotations into the required structure for model training.

    :param txt_list: 
        - Type: str
        - Description: The path to the `.txt` file containing the list of image file paths to be used. Each line 
                       in the file should contain the absolute path to an image file.

    :param txt_folder: 
        - Type: str
        - Description: The path to the folder where the corresponding `.txt` annotation files are stored. 
                       These annotation files should have the same base name as the images but with `.txt` extensions.

    :param output_img_folder: 
        - Type: str
        - Description: The path to the folder where the image files should be moved. This folder should be organized 
                       into 'train' or 'val' subdirectories under an 'images' folder as required by YOLO.

    :param output_txt_folder: 
        - Type: str
        - Description: The path to the folder where the annotation files should be moved. This folder should be 
                       organized into 'train' or 'val' subdirectories under a 'labels' folder as required by YOLO.

    :return: 
        - Type: None
        - Description: This function does not return a value. It moves the images and annotations into the specified 
                       directories and organizes them for training and validation.

    The function will move the images and annotations to the respective train or val folders under the `output_img_folder` 
    and `output_txt_folder` paths, organizing them for YOLO training.    
    """


    # Create the output folder if it does not already exist
    os.makedirs(output_img_folder, exist_ok=True)
    os.makedirs(output_txt_folder, exist_ok=True)
    
    folder_base = os.path.dirname(training_folder)
    dataset_name = os.path.basename(training_folder)
    
    # Open the text file containing the image paths
    with open(txt_list, "r") as f:
        # Browse through each line of the file
        for line in f:
            # Get the image path and text file name
            image_path = line.strip()
            image_name = os.path.basename(image_path)

            txt_file = os.path.join(txt_folder, image_name).replace(image_name.split('.')[-1], 'txt')
            
            # Copy image to output folder
            shutil.move(image_path, os.path.join(output_img_folder, os.path.basename(image_path)))
        
            # Copy text file to output folder
            try:
                shutil.move(txt_file, os.path.join(output_txt_folder, os.path.basename(txt_file)))

            except FileNotFoundError:
                print(f'Text file {txt_file} does not exist')
    print(f'Image files move in {output_img_folder}')
    print(f'Text files move in {output_txt_folder}')
    
    # Create the yaml file
    write_yaml_file(training_folder, dataset_name, folder_base)
    

## Create the .yaml file

In [18]:
def write_yaml_file(training_folder, dataset_name, folder_base):
    """
    This function creates a `.yaml` configuration file that specifies the paths to the training and validation 
    datasets, as well as the class labels used for model training. This file is essential for training YOLOv8 models, 
    as it provides information on the dataset structure and class mappings.
    The file will contain paths to the training and validation data, along with class names.

    :param training_folder: 
        - Type: str
        - Description: The absolute path to the training folder containing the `labels.txt` file. 
                       This file is used to retrieve the class labels for the dataset.

    :param dataset_name: 
        - Type: str
        - Description: The name of the training session or dataset. This name is used as the base name 
                       for the `.yaml` file and to construct the paths to the dataset.

    :param folder_base: 
        - Type: str
        - Description: The absolute path to the root directory where the `datasets` folder is located. 
                       This folder will contain the training data organized in the required structure.

    :return: 
        - Type: None
        - Description: This function does not return a value. It creates a `.yaml` file in the dataset directory 
                       with the specified configuration.
                    
    This `.yaml` file can then be used to train a YOLO model by specifying it as the configuration file during training. 
    """

    
    # Get the annotations classes
    annotation_classes = get_labels(os.path.join(training_folder, 'labels.txt'))
    
    # Convertir les clés du dictionnaire annotation_classes en entiers
    annotation_classes_int = {int(key): value for key, value in annotation_classes.items()}

    # Formater la chaîne avec les éléments dans l'ordre souhaité
    yaml_data = f"path: {os.path.join(folder_base, 'datasets', dataset_name)}/\n" \
                f"train: 'images/train'\n" \
                f"val: 'images/val'\n" \
                f"\n" \
                f"#class names\n" \
                f"names:\n"
    for key, value in annotation_classes_int.items():
        yaml_data += f"  {key}: '{value}'\n"
        
        # Ajouter Albumentation

    with open(os.path.join(folder_base, 'datasets', dataset_name, dataset_name + '.yaml'), 'w') as yaml_file:
        yaml_file.write(yaml_data)
    print(f"File edit in {os.path.join(folder_base, 'datasets', dataset_name, dataset_name + '.yaml')}")

## Model training

In [20]:
def yolo_training(training_folder, use_model, img_size, epochs, batch, workers, label_smoothing, pretrained_model):
    """
    This function initiates the training process for a YOLO model using the specified training parameters and dataset.
    The function provides flexibility in selecting the model architecture, adjusting training settings, 
    and optionally using a pre-trained model to fine-tune the results.

    :param training_folder: 
        - Type: str
        - Description: The path to the folder containing the training data. This folder should include 
                       a `.yaml` file specifying the dataset paths and class names.

    :param use_model: 
        - Type: str
        - Description: The YOLO model architecture to use for training (e.g., 'yolo11x.pt'). 
                       If a pre-trained model is provided, this parameter is overridden.

    :param img_size: 
        - Type: int
        - Description: The size of the input images. Larger image sizes can increase model accuracy 
                       but may also increase computational load.

    :param epochs: 
        - Type: int
        - Description: The number of epochs to train the model. More epochs allow the model to learn 
                       better but may result in overfitting if set too high.

    :param batch: 
        - Type: int
        - Description: The batch size to use during training. Larger batch sizes require more memory 
                       but can stabilize gradient updates.

    :param workers: 
        - Type: int
        - Description: The number of workers for data loading. Increasing this number can speed up data 
                       loading but may require more computational resources.

    :param label_smoothing: 
        - Type: float
        - Description: The smoothing factor applied to the labels to prevent overconfidence in predictions. 
                       Typically set between 0 and 1.

    :param pretrained_model: 
        - Type: str
        - Description: The path to a pre-trained model, if any. If provided, the function will use this model 
                       as the starting point for training. If not provided, it uses the `use_model` parameter 
                       to select the model architecture.

    :return: 
        - Type: None
        - Description: This function does not return a value. It trains the YOLO model using the provided 
                       parameters and saves the results to a specified output folder.

    This function automates the YOLO training process, providing flexibility in configuration and managing results storage.
    """
    
    # Derive additional paths and model name
    folder_base = os.path.dirname(training_folder)
    dataset_name = os.path.basename(training_folder)
    common_base = os.path.commonpath([training_folder, os.getcwd()])
    
    date = datetime.now().strftime('%Y%m%d')
    yaml_file = os.path.join(folder_base, 'datasets', dataset_name, dataset_name + '.yaml')

    # Check if yaml_file exists
    if not os.path.exists(yaml_file):
        raise FileNotFoundError(f"YAML file not found: {yaml_file}")

    # Determine which model to use
    if pretrained_model == '':
        use_model = use_model
        model_name = f'{dataset_name}_{date}_{use_model[-5:-3]}_i{img_size}_e{epochs}_b{batch}_w{workers}'
    else:
        use_model = pretrained_model
        model_name = f'{dataset_name}_{date}_{use_model[-7:-3]}_i{img_size}_e{epochs}_b{batch}_w{workers}'

    # Check if the GPU is available - if not, use the CPU
    device = which_device()
    
    # Load a YOLO model
    model = YOLO(use_model, task='detect').to(device)

    # Train the model
    results = model.train(
       data = yaml_file, # path to the datasets and classes
       imgsz = img_size, #image size
       epochs = epochs,
       batch = batch,
       label_smoothing = label_smoothing,
       workers = workers, # increases training speed, default setting is 8
       name = model_name, # output folder
       project = os.path.join(common_base, 'output', 'runs', 'train')
    )

    # Evaluate the model's performance on the validation set
    results = model.val(
        name = model_name + '/'+ model_name +'_val')
    
    print(f"Training completed. Results saved to {results}")

### Resuming interrupted trainings(Optional)

In [21]:
def resume_training(interrupted_model_folder):
    """
    This function resumes an interrupted YOLO model training session from the last saved checkpoint. 
    It retrieves the last trained weights and other session parameters to continue training without 
    losing previously achieved progress.
    This function is useful for resuming training sessions that were stopped due to hardware limitations, 
    time constraints, or other interruptions.

    :param interrupted_model_folder: 
        - Type: str
        - Description: The path to the folder containing the partially trained model's data. 
                       This folder should include a `weights` subdirectory with the `last.pt` file 
                       (the last saved weights).

    :return: 
        - Type: None
        - Description: This function does not return a value. It resumes the training process from 
                       the last checkpoint and evaluates the model's performance after training.

    The results of the resumed training and validation will be stored in the same directory, maintaining 
    the continuity of the training session and its associated metrics.
    """
    
    last_weight = os.path.join(interrupted_model_folder, 'weights/last.pt')
    model_name = os.path.basename(interrupted_model_folder)

    # Check if the GPU is available - if not, use the CPU
    device = which_device()
    
    # Load a model
    model = YOLO(last_weight).to(device)  # load a partially trained model

    # Resume training
    results = model.train(resume=True)

    # Evaluate the model's performance on the validation set
    results = model.val(
        name = model_name + '/'+ model_name +'_val')

## Re-arrange in pristine state

In [22]:
def dispatch_data(training_folder, pretrained_model, interrupted_model_folder):
    """
    This function organizes and finalizes the data used for training by moving relevant files and directories 
    into the model folder. It also restores the original structure of the dataset folder by moving image and 
    annotation files back to their respective subdirectories and deletes the temporary training folder.

    :param training_folder: 
        - Type: str
        - Description: The absolute path to the folder where the dataset used for training is stored. 
                       This folder should contain subdirectories like 'images' and 'labels' used during training.

    :param pretrained_model: 
        - Type: str
        - Description: The path to a pre-trained model, if any. This is used to determine the model folder 
                       structure for storing training data and configurations.

    :param interrupted_model_folder: 
        - Type: str
        - Description: The path to the folder containing data from a previously interrupted training session. 
                       If provided, the function will continue organizing data into this folder.

    :return: 
        - Type: None
        - Description: This function does not return a value. It organizes and moves files into appropriate folders, 
                       restores the original dataset structure, and deletes the temporary training folder.

    This function ensures that all data and configurations used for training are stored in a dedicated model folder, 
    making it easy to track and manage different training sessions.
    """

    folder_base = os.path.dirname(training_folder)
    dataset_name = os.path.basename(training_folder)
    common_base = os.path.commonpath([training_folder, os.getcwd()])
    
    date = datetime.now().strftime('%Y%m%d')
    
    # Determine which model name to use
    if interrupted_model_folder:
        model_folder = interrupted_model_folder
    else:
        # Déterminer quel nom de modèle utiliser
        if not pretrained_model:
            model_name = f'{dataset_name}_{date}_{use_model[-5:-3]}_i{img_size}_e{epochs}_b{batch}_w{workers}'
        else:
            model_name = f'{dataset_name}_{date}_{pretrained_model[-7:-3]}_i{img_size}_e{epochs}_b{batch}_w{workers}'
        
        model_folder = os.path.join(common_base, 'output', 'runs/train', model_name)
        
    # Move the data used for the training session into the model folder
    shutil.move(os.path.join(folder_base, 'datasets', dataset_name, dataset_name + '.yaml'), os.path.join(model_folder, dataset_name + '.yaml'))
    print(f'The .yaml file has been moved into {model_folder}')
    
    shutil.copyfile(os.path.join(training_folder, 'labels.txt'), os.path.join(model_folder, 'labels.txt'))
    print(f'The labels.txt file has been copied in {model_folder}')
    
    shutil.move(os.path.join(training_folder, 'dataset_statistics'), model_folder)
    print(f'The statistics folder with the training data have been moved to {model_folder}.')
  
    # Replace the data in the dataset folder
    img_folder_train = os.path.join(folder_base, 'datasets', dataset_name, 'images/train') 
    txt_folder_train = os.path.join(folder_base, 'datasets', dataset_name, 'labels/train')
    img_folder_val = os.path.join(folder_base, 'datasets', dataset_name, 'images/val')
    txt_folder_val = os.path.join(folder_base, 'datasets', dataset_name, 'labels/val')

    os.makedirs(os.path.join(training_folder, 'images'), exist_ok=True)
    for file in os.listdir(img_folder_train):
        shutil.move(os.path.join(img_folder_train, file), os.path.join(training_folder, 'images', file))
    print(f"Files from {img_folder_train} have been moved into {os.path.join(training_folder, 'images')}")
    
    for file in os.listdir(img_folder_val):
        shutil.move(os.path.join(img_folder_val, file), os.path.join(training_folder, 'images', file))
    print(f"Files from {img_folder_val} move into {os.path.join(training_folder, 'images')}")

    os.makedirs(os.path.join(training_folder, 'labels'), exist_ok=True)
    for file in os.listdir(txt_folder_train):
        shutil.move(os.path.join(txt_folder_train, file), os.path.join(training_folder, 'labels', file))
    print(f"Files from {txt_folder_train} move into {os.path.join(training_folder, 'labels')}")
    
    for file in os.listdir(txt_folder_val):
        shutil.move(os.path.join(txt_folder_val, file), os.path.join(training_folder, 'labels', file))
    print(f"Files from {txt_folder_val} move into {os.path.join(training_folder, 'labels')}")

    shutil.rmtree(os.path.join(folder_base, 'datasets', dataset_name))
    print(f"The {os.path.join(folder_base, 'datasets', dataset_name)} has been deleted")

## Processing

In [None]:
training_folder = 'TRAINING_FOLDER' # to be changed, absolute path to the dataset you will use for the training session

# Set to the absolute path of the pretrained model if you want to use pretrained model
pretrained_model = ''
# To change if you want to use a pre-existing distribution for a training session
Preexisting_Distribution = '' # absolute path to the model folder containing the distribution to be reused

# to change if you want to use pre-existing files or if you want to resume an uncompleted training session
interrupted_model_folder = '' # to be changed, absolute path to the model folder

### Clean, increase and split data

In [24]:
# Clean the file txt if needed
clean_comma(training_folder)

In [None]:
%%prun
# Use the perspective transformation to extend the dataset
generate_transformed_data(training_folder)

In [None]:
# Generate data distribution file for train/val sets
create_training_dataset(training_folder, pretrained_model, Preexisting_Distribution)

### Start a training session

In [31]:
use_model = 'yolo11n.pt' # to be changed as needed, by default use 'yolov11x.pt'
img_size = 640 # to be changed as needed, by default use 640
epochs = 10 # to be changed as needed
batch = -1 # to be changed as needed, by default use 8 or or -1 for AutoBatch
workers = 8 # to be changed as needed, by default 24, or 8 (https://docs.ultralytics.com/modes/train/#train-settings)
label_smoothing = 0.1 # to be changed as needed,by default 0. Can improve generalization
dropout = 0.1 # Elimine aléatoirement 10% connaissance à chaque époque

In [None]:
# Start a training session
yolo_training(training_folder, use_model, img_size, epochs, batch, workers, label_smoothing, pretrained_model)

### Resume an uncompleted training session (Optional)

In [None]:
# Resume an interrupted training
#resume_training(interrupted_model_folder)

### Dispatch the data

In [None]:
# Move the .txt files describing the distribution of images/labels in train and val of the training data into the model folder and replace the image/label data themself in their original folders
dispatch_data(training_folder, pretrained_model, interrupted_model_folder)