# 1. Data Preparation
Before starting to train the model, some steps need to be done.

# 1.1. ROI Extraction
The images are large in size (above 2000 x 2000 pixels), so it is better to extract the area that is important to us by using a cropping function. This cropping process should be applied to both the image and the respective masks.

<div style="text-align: center;">
    <img src="Images/image-11.png" alt="Alt text" style="display: block; margin: 0 auto;">
</div>


* TASK:

In [1]:
import cv2
import os
from pathlib import Path
from PIL import Image

# Input and output directories
raw_imgs = Path("../images/raw_images").glob("*.jpg")
raw_labels = Path("../images/raw_labels").glob("*.png")
cropped_img_dir = Path("../images/") / "cropped_images"
cropped_label_dir = Path("../images/") / "cropped_labels"
cropped_img_dir.mkdir(exist_ok=True)
cropped_label_dir.mkdir(exist_ok=True)

def crop_imgs(input_imgs, output_dir) -> None:
   for img_path in input_imgs:
      #crop the center section ~> 1400W * 1840H
      with Image.open(img_path) as img:
         width, height = img.size
         left = (width - 1400) // 2
         top = (height - 1840) // 2
         right = left + 1400
         bottom = top + 1840
         cropped = img.crop((left, top, right, bottom))
         cropped.save(output_dir / img_path.name)
   return
   
crop_imgs(raw_imgs, cropped_img_dir)
crop_imgs(raw_labels, cropped_label_dir)

# 1.2. Image Format Converting 
It is recommended to convert the format of both the image and mask to TIFF format, which is suitable for the recommended Convolutional Neural Network (CNN) model.

* TASK:

In [None]:
### Code
##ConTif
input_folder = "Input Directory"
output_folder = "Output Directory"

for filename in os.listdir(input_folder):
    if filename.endswith(('.jpg', '.jpeg', '.png')):  # Add other image formats if needed
    ###Convert it to Tif

# 1.3. Image Augmentation

After extracting the ROI and converting the image and mask to TIFF format, we need to increase our dataset size using augmentation techniques. Here's how you can implement this method, ensuring that the image name and the respective mask name are the same:

* Define Augmentation Parameters: Determine the augmentation techniques to apply, such as rotation, flipping, scaling, etc.

* Loop Through Images: Iterate through each image and its corresponding mask.

* Apply Augmentation: Apply the defined augmentation techniques to both the image and its mask.

* Save Augmented Images: Save the augmented images and their masks with the same names as the original images and masks.

<div style="text-align: center;">
    <img src="Images/image-12.png" alt="Alt text" style="display: block; margin: 0 auto;">
</div>

* TASK:

In [None]:
### Code
import numpy as np
import random
import os
from scipy.ndimage import rotate

#Define functions for each operation
def rotation(image, seed):
    ...
    return r_img

def h_flip(image, seed):
    ...
    return  hflipped_img

def v_flip(image, seed):
    ...
    return vflipped_img

def v_transl(image, seed):
    ...
    return vtranslated_img

def h_transl(image, seed):
    ...
    return htranslated_img

* TASK:

In [None]:
###Use the functions to implement the augmentation for both images and masks
transformations = {
                 }                #use dictionary to store names of functions
images_path=  #path to original images
masks_path = #path to original masks
img_augmented_path=  # path to store aumented images
msk_augmented_path= # path to store aumented masks
images=[] # to store paths of images from folder
images_name=[]
masks=[]
masks_name=[]

for im in os.listdir(images_path):  # read image name from folder and append its path into "images" array     
    images.append(os.path.join(images_path,im))
    images_name.append(im) 

for msk in os.listdir(masks_path):  # read image name from folder and append its path into "images" array     
    masks.append(os.path.join(masks_path,msk))
    masks_name.append(msk) 