# Beet segmentation (optional) data preparation

Date: 11.01.2024  
Authors: Gustav Schimmer & Philipp Friedrich

**This notebook is purposed for data preparation before training a YOLOv6 algorithm in detecting sugar beet plants on images.**  
  
  
Major steps are:
- Downsampling and Resizing of images
- Create custom dataset with labeled data

Before we train our model, we need to prepare a proper dataset containing images for testing and validation in the right resolution and size. This notebook is a suggestion on how to generate such a dataset. In case you dont want to generate and use own training data, we provided a ready to use example dataset, which can be found in the data section of this project (example_dataset).

## Import necessary libraries

In [3]:
import os
import cv2

## Data preparation: Downsampling & Resizing

Before creation of a custom dataset, data needs to be resampled to a lower resolution to minimize needed computation power.

#### Define data paths

In [4]:
# Input data path
input_folder = r'..\beet-segmentation\data\20230514\field_1'

# Output data path
output_folder = r'..\beet-segmentation\data\20230514\field_1_test_img'

#### Write function to resample images

In [5]:
# Write function to resample images to taret width and height
def crop_and_resize_image(input_path, output_folder, square_size, target_width, target_height):
    image = cv2.imread(input_path)
    if image is not None:
        # Verkleinere das Bild auf die Zielgröße
        image = cv2.resize(image, (target_width, target_height))
        # Zuschneiden in 256x256 Quadraten
        for y in range(0, target_height - square_size + 1, square_size):
            for x in range(0, target_width - square_size + 1, square_size):
                square = image[y:y + square_size, x:x + square_size]
                # Speichere das Quadrat mit einem fortlaufenden Index
                output_path = os.path.join(output_folder, f"{os.path.splitext(os.path.basename(input_path))[0]}_{y // square_size * (target_width // square_size) + x // square_size}.jpg")
                cv2.imwrite(output_path, square)

#### Image resampling

As we tried different image resolutions, for our images a resolution of 1500x2000 yields a good compromise between results and computation power. This relates to a Ground Sampling Distance (GSD) of 0.1 centimeters. To make the images usable for YOLOv6 algorithm we additionally need to resize them to a squared size, for wich we use 500x500 pixels.

In [None]:
# Define target image width and height
target_width, target_height = 1500, 2000
square_size = 500

# Erstelle den Ausgabeordner, wenn er nicht existiert
os.makedirs(output_folder, exist_ok=True)

# Durchlaufe alle Bilder im Eingabeordner
for filename in os.listdir(input_folder):
    if filename.endswith(".jpg"):
        input_path = os.path.join(input_folder, filename)
        crop_and_resize_image(input_path, output_folder, square_size, target_width, target_height)

print("Image resampling done.")

## Create training lables

To train the algorithm training data consisting of annotations are necessary. This often is cost and time intensive. Some of the open source tools available online are:

- https://roboflow.com/annotate?ref=blog.roboflow.com

- https://blog.roboflow.com/cvat/

- https://blog.roboflow.com/labelimg/

- https://www.makesense.ai/

In our case we need to create lables for the single sugar beet plants on the images. If you want to create your own lables for your own images you can use one of the above mentioned tools. We recomend using Make Sense AI as it is open source and supports Yolo output file formats. 


One image should corresponds to one label file, and the label format example is presented as below.

```json
# class_id center_x center_y bbox_width bbox_height
0 0.300926 0.617063 0.601852 0.765873
1 0.575 0.319531 0.4 0.551562
```

## Create custom dataset

The generated images and lables should be devided into training and validation data (approximately 80:20). Organize your directory of the custom dataset as follows:

```shell
custom_dataset
├── images
│   ├── train
│   │   ├── train0.jpg
│   │   └── train1.jpg
│   ├── val
│   │   ├── val0.jpg
│   │   └── val1.jpg
│   └── test
│       ├── test0.jpg
│       └── test1.jpg
└── labels
    ├── train
    │   ├── train0.txt
    │   └── train1.txt
    ├── val
    │   ├── val0.txt
    │   └── val1.txt
    └── test
        ├── test0.txt
        └── test1.txt
```

Your custom datset is now ready to use. You can continue with model training in this [Jupyter Notebook](beet_segmentation_model.ipynb).