# Using and handling corrected files

## Generating ground truth

This notebook lets you use the corrected annotation data generated in previous notebooks. For the code to work, you must have run an inference and corrected the annotations produced.

This step concludes a training, prediction and correction session. The data in the correction files have a ground truth value and can be used to launch a new training session.

**Warning**

The code is efficient only if the project structure is correct. If the tree structure differs from the one shown below, errors will occur.

To ensure that applying the code is as straightforward as possible, let's review the structure of the working folder and its naming constraints. This is not the whole structure, although here are the extracts that will have an influence on the code's effectiveness.

```
working_folder
├─── partage
│    ├─── project_name
│    │    ├─── in
│    │    │    ├─── non_annotated_images
│    │    │    └─── annotated_images
│    │    └─── out
│    │         ├─── annotations
│    │         └─── corrections
└─── output
     └─── runs
          ├─── train
          │    └─── model_folder
          └─── predict
               └─── result_folder
                    └─── correctedLabels
```

The only freely nameable folder is '**project_name**'. The name of the '**model_folder**' folder has already been named automatically if you have run the previous notebooks.

The same applies to the name of the '**result_folder**' folder. It is generated from the names of the '**project_name**' and '**model_folder**' folders, separated by an underscore. If you rename either folder, make sure you always follow this naming scheme. For instance:

```
project_name = projet01
model_folder = model01

result_folder = projet01_model01
```

This notebook will essentially act on the '**partage**' folder. It is however essential to access the '**model_folder**' in order to retrieve the '**label.txt**' file to then process the corrected annotations.

The same applies to the '**result_folder**' folder, as its subfolder '**correctedLabels**' shall host the .txt files resulting from the processing.

Consequently, make sure you dispose of the following folders and/or files:
- A folder containing the unannotated images ('**partage/project_name/in/non_annotated_images**');
- A folder containing the '**labels.txt**' file with annotation labels ('**output/runs/train/model_folder**');
- A folder containing the corrected JSON files ('**partage/project_name/out/corrections**').

**Notice concerning use** 

Any use, even partial, of the content of this notebook must be accompanied by an appropriate citation.

&copy; 2024 Marion Charpier 
&copy; 2024 Natacha Grim

## Environment

In [3]:
import os
import json
import shutil
import glob
import random
from pathlib import Path

import pandas as pd
from PIL import Image

import sys
sys.path.append(os.path.join('..', 'modules'))

from transform_coordinates_functions import from_ls_to_yolo
from class_names_functions import get_labels, get_class_code
from folders_path import *
from manipulate_files import open_json_file, change_id

## Functions

### Create a new dataset with the correction files

In [5]:
def create_new_ground_truth(img_dataset_folder:str, yolo_model_folder:str, wanna_create:bool) -> None:
    """
    This function creates a new dataset based on inference corrections, allowing for the creation of an updated 
    ground truth dataset. The new dataset can be used to start a new training session. If the `wanna_create` 
    parameter is set to `False`, no new dataset is generated, and the function exits.
    
    :param img_dataset_folder: 
        - Type: str
        - Description: The absolute path to the folder containing the project images. This folder is used to 
                       locate the images that will be copied to the new dataset.
    :param yolo_model_folder: 
        - Type: str
        - Description: The path to the folder containing the YOLO model and its associated files. This folder 
                       is used to retrieve the `labels.txt` file and results folder for creating the new dataset.
    :param wanna_create: 
        - Type: bool
        - Description: A flag indicating whether to create the new dataset. If `True`, the new dataset is generated. 
                       If `False`, the function prints a message and exits without creating a dataset.
    
    :return: 
        - Type: None
        - Description: This function does not return a value. It creates a new folder containing the corrected 
                       labels and images for further training.

    This function is useful for managing and organizing datasets during iterative training and evaluation processes, 
    ensuring that corrected annotations are properly integrated into new training sessions.
    """

    if not wanna_create:
        print('No new dataset generated')
        return

    # Recompose the path to the model results folder
    results_folder = os.path.join(os.path.dirname(os.path.dirname(yolo_model_folder)), 
        'predict',
        img_dataset_folder.split('/')[-3] + '_' + os.path.basename(yolo_model_folder))
    
    # Create the folder for the new dataset
    new_folder = os.path.join(os.path.commonpath([img_dataset_folder, yolo_model_folder]), 'data', img_dataset_folder.split('/')[-3])

    # Check and create a unique folder
    counter = 1
    original_new_folder = new_folder
    
    while os.path.exists(new_folder):
        new_folder = f"{original_new_folder}_{counter}"
        counter += 1
    
    os.makedirs(new_folder)
    # Display the path of the new folder
    print("New Dataset Folder created:", new_folder)

    # Move corrected files to the new dataset folder
    corrections_folder = os.path.join(results_folder, 'correctedLabels')
    if os.path.exists(corrections_folder):
        shutil.move(corrections_folder, os.path.join(new_folder, 'labels'))
        print(f"Corrections folder found: {corrections_folder}")
    else:
        print(f"Corrections folder not found: {corrections_folder}")

    # Copy images to the new dataset
    os.makedirs(os.path.join(new_folder, 'images'))
    for file in os.listdir(img_dataset_folder):
        if file.endswith(('jpg','png','tiff')):
            shutil.copy(os.path.join(img_dataset_folder, file), os.path.join(new_folder, 'images', file))
    print(f"Images copied to {os.path.join(new_folder, 'images')}")

    # Copy the labels file
    shutil.copyfile(os.path.join(yolo_model_folder, 'labels.txt'), os.path.join(new_folder, 'labels.txt'))
    print(f"Labels file copied to: {os.path.join(new_folder, 'labels.txt')}")

    print(f'New dataset folder created')

### Move correction files and annotated images to the proper folders

In [45]:
def move_correction_files_and_images(img_dataset_folder:str) -> None:
    """
    This function moves images and correction JSON files to their respective folders based on a predefined directory structure. 
    It ensures that images are moved to an "annotated_images" folder, and corrected annotations are moved to an "annotations" folder, 
    with unique filenames for each annotation file.
    
    :param img_dataset_folder: 
        - Type: str
        - Description: The absolute path to the folder containing non-annotated images. This folder is used to locate 
                       the images and determine the destination paths for the annotated images and correction files.
    
    :return: 
        - Type: None
        - Description: This function does not return a value. It moves images and correction files to their designated 
                       folders and ensures that annotation filenames are unique.

    This function helps organize project data by structuring folders for annotated images and correction files, making 
    it easier to manage annotations and integrate them into the training workflow.
    """

    
    # Recompose the path of the "annotated_images" folder
    annotated_images_folder = img_dataset_folder.replace('eval_images', 'ground_truth_images')
    
    # Recompose the path of the "corrections" folder
    corrections_folder = img_dataset_folder.replace('image_inputs/eval_images', 'annotations/prediction_corrections')
    
    # Recompose the path of the "annotations" folder
    annotations_folder = img_dataset_folder.replace('image_inputs/eval_images', 'annotations/ground_truth')

    # Move .jpg images to the annotated images folder
    for file in os.listdir(img_dataset_folder):
        if file.lower().endswith(('jpg', 'jpeg', 'png', 'tiff')):
            shutil.move(os.path.join(img_dataset_folder, file), os.path.join(annotated_images_folder, file))
    print(f"Images moved to {annotated_images_folder}")

    # Move correction files to the annotations folder
    for file in os.listdir(corrections_folder):
        if not file.startswith('.'):
            # Ensure unique file name
            new_annotation = os.path.join(annotations_folder, file)
            annotation_number = int(os.path.basename(file))

            while os.path.exists(new_annotation):
                annotation_number += 1
                new_annotation = os.path.join(annotations_folder, f"{annotation_number}")
                
            shutil.move(os.path.join(corrections_folder, file), new_annotation)

            # Changes the 'id' field in the JSON file to the basename of the file path
            change_id(new_annotation)
            
    print(f"Annotations files corrected and moved to {annotations_folder}")


### Add the image data to the pre-existing CSV (or create one)

In [None]:
def add_csv_data(img_dataset_folder:str, yolo_model_folder:str) -> None:
    """
    This function merges the CSV data of non-annotated images with the CSV file of annotated images. 
    It ensures that all relevant image metadata is consolidated into a single CSV file located in the 
    annotated images folder. If no CSV file exists in the annotated folder, the function moves the 
    CSV file from the non-annotated folder and updates the folder paths.
    
    :param img_dataset_folder: 
        - Type: str
        - Description: The absolute path to the folder containing the non-annotated images. 
                       This folder is used to locate the CSV file of non-annotated images.
    :param yolo_model_folder: 
        - Type: str
        - Description: The path to the folder containing the trained YOLO model. This folder is used to 
                       access the `labels.txt` file for retrieving class names and IDs.

    :return: 
        - Type: None
        - Description: This function does not return a value. It merges or moves the CSV file into the 
                       annotated images folder, ensuring data consistency.
    
    This function ensures that all image metadata is consolidated in a single CSV file, making it easier 
    to track and manage annotations and corrections across different stages of the project.
    """

    # 1. Compute paths
    root              = Path(img_dataset_folder)
    project_folder    = root.parent.parent
    project_name      = project_folder.name

    non_annotated_csv = root / f"{project_name}.csv"
    annotated_folder  = root.parent / 'ground_truth_images'
    annotated_csv     = annotated_folder / f"{project_name}_data.csv"

    # 2. Existence checks
    exists_non = non_annotated_csv.exists()
    exists_ann = annotated_csv.exists()
    if not exists_non and not exists_ann:
        print("No CSV found.")
        
        data = []
        images = [
            img for img in annotated_folder.iterdir()
            if img.name.lower().endswith(('.jpg', '.png', '.tiff'))
        ]

        for file in images:
            img_name = file.name

            with Image.open(file) as img:
                absolute_path = img.filename
                format = img.format
                width, height  = img.size
                img_size = width * height

            img_data = {
                'Image_name'   : img_name,
                'Folder'       : str(annotated_folder),
                'Absolute_path': absolute_path,
                'Format'       : format,
                'Width'        : width,
                'Height'       : height,
                'Image_size'   : img_size
            }

            data.append(img_data)

        # Create a DataFrame from the image data list
        df = pd.DataFrame(data)

        # Save DataFrame to a CSV file
        csv_filename = os.path.join(annotated_folder, os.path.basename(project_folder) + '_data.csv')
        df.to_csv(csv_filename, sep=';', index=False)

        print(f"Image data saved to {csv_filename}")

    # 3. Case: only the annotated CSV exists -> add the non-annotated images
    if exists_ann and not exists_non:
        print("Adding non-annotated images to the existing annotated CSV…")
        # Load the existing CSV to get already included names
        df_annot = pd.read_csv(annotated_csv, sep=';')
        existing_names = set(df_annot['Image_name'])

        # Build the list of images to consider
        patterns = ['*.[pj][pn]g', '*.tiff']
        images = [
            file_path
            for pat in patterns
            for file_path in root.glob(pat)
        ]

        new_rows = []
        for img_path in images:
            name = img_path.stem
            # If this name is already in the annotated CSV, skip it
            if name in existing_names:
                continue

            with Image.open(img_path) as img:
                w, h = img.size

            new_rows.append({
                'Image_name'   : name,
                'Folder'       : str(annotated_folder),
                'Absolute_path': str(img_path).replace('eval_images', 'ground_truth_images'),
                'Format'       : img.format,
                'Width'        : w,
                'Height'       : h,
                'Image_size'   : w * h
            })

        if not new_rows:
            print("No new images to add.")
            return

        df_new = pd.DataFrame(new_rows)
        # Concatenate only the new rows
        pd.concat([df_annot, df_new], ignore_index=True) \
          .to_csv(annotated_csv, sep=';', index=False)
        print(f"{len(df_new)} image(s) added to {annotated_csv}")
        return

    # 4. Case: both CSVs exist -> intelligent merge
    df_annot = pd.read_csv(annotated_csv, sep=';') if exists_ann else pd.DataFrame()
    df_non   = pd.read_csv(non_annotated_csv, sep=';') if exists_non else pd.DataFrame()

    # Update paths from eval_images → ground_truth_images
    if not df_non.empty:
        df_non['Folder'] = df_non['Folder'].str.replace('eval_images', 'ground_truth_images')
        df_non['Absolute_path'] = df_non['Absolute_path'].str.replace(
            'eval_images', 'ground_truth_images'
        )

    # Remove from df_non the images already present in df_annot
    existing_names    = set(df_annot['Image_name'])
    df_non_filtered   = df_non.query("Image_name not in @existing_names")

    if df_non_filtered.empty:
        print("No additional non-annotated images to merge.")
    else:
        merged = pd.concat([df_annot, df_non_filtered], ignore_index=True)
        merged.to_csv(annotated_csv, sep=';', index=False)
        print(f"{len(df_non_filtered)} image(s) merged into {annotated_csv}")

## Processing

### Enter absolute paths for variables

In [5]:
img_dataset_folder = 'ABSPATHTOTHEFOLDER' # To be changed. Absolute path to the folder named after your project.
yolo_model_folder = 'ABSPATHTOTHEMODELFOLDER' # To be changed. Asbolute path to the folder with the training data.

### Create the next dataset

In [None]:
create_new_ground_truth(img_dataset_folder, yolo_model_folder, wanna_create=True)

### Move images and JSON files to the proper folders

In [None]:
move_correction_files_and_images(img_dataset_folder)

### Add the image data to the pre-existing CSV (or create one)

In [None]:
add_csv_data(img_dataset_folder, yolo_model_folder)