# TEXT LEVEL MODEL TRAINING

This notebook contains the necessary code to train a YOLOv8 model for detecting the individual text elements in a desktop
screenshot given a raw dataset.

### CONFIG VARS

In [None]:
from utils import *
import cv2
from ultralytics import YOLO

In [2]:
DATASET_PATH="../Datasets/Dataset_TextLevel"
AUGMENTED_PATH="../Augmented_Datasets/TextLevel"
AUGMENTED_PATH_TRAIN_EX="../Augmented_Datasets/TextLevel_train" # Data exclusively for training, not validating
YOLO_PATH="../YOLO_Datasets/TextLevel"
YOLO_PATH_TRAIN_EX="../YOLO_Datasets/TextLevel_train" # Data exclusively for training, not validating

### AUGMENTATION

For this model we will apply the following augmentation techniques:
- Hue transformations (-100º to +100º)
- Contrast inversion (To simulate dark and light modes)

In [3]:
create_slices(DATASET_PATH, AUGMENTED_PATH, 3, 3, 0.2, 0.2, "bbox")

In [4]:
resize_dataset_images(AUGMENTED_PATH, AUGMENTED_PATH, 640, 360)

In [5]:
hue_augmentation(AUGMENTED_PATH, AUGMENTED_PATH_TRAIN_EX, 0.15, 100)

In [6]:
contrast_inversion_augmentation(AUGMENTED_PATH, AUGMENTED_PATH_TRAIN_EX, 0.15)

In [7]:
# Add the augmented data to the train exclusively folder
for file in os.listdir(AUGMENTED_PATH):
    shutil.copy(os.path.join(AUGMENTED_PATH, file), AUGMENTED_PATH_TRAIN_EX)

### FORMAT CONVERSION

Up to now, we have treated with labelme format datasets, but we need to convert it to YOLOv8 format if we want to train
a model, which has the following format:

```
YOLOv8_Dataset/
├── data.yaml
├── train/
│   ├── images/
│   │   ├── img1.jpg
│   │   ├── img2.jpg
│   │   └── ...
│   ├── labels/
│   │   ├── img1.txt
│   │   ├── img2.txt
│   │   └── ...
├── valid/
│   ├── images/
│   │   ├── img1.jpg
│   │   ├── img2.jpg
│   │   └── ...
│   ├── labels/
│   │   ├── img1.txt
│   │   ├── img2.txt
│   │   └── ...
└── test/ (OPTIONAL)
    ├── images/
    │   ├── img1.jpg
    │   ├── img2.jpg
    │   └── ...
    └── labels/
        ├── img1.txt
        ├── img2.txt
        └── ...
```

The format of the data.yml file is:
```
path: <path_to_dataset_root_dit>
train: <path_to_train_images>
val: <path_to_validation_images>
test: <path_to_test_images> (OPTIONAL)

names:
    0: class1
    1: class2
    2: class3
...
```

The labels for Instance segmentation have the following format for each annotation:
```
<class-index> <x_center> <y_center> <width> <height>
```

In [8]:
labelme_to_yolo(AUGMENTED_PATH_TRAIN_EX, YOLO_PATH_TRAIN_EX, 0.7,["Text"], "bbox")

### TRAIN

We will perform fine-tuning over the mobile-sam model using the hyperparameter tuning provided by Ultralytics to get the
best results we can. Since this is a non-standard dataset in terms of object features it is not clear what are the values
we should use.

We will also configure the training to not do any augmentation over the train set

In [1]:
from ultralytics import YOLO
# Initialize the YOLO model
model = YOLO("yolov8s")

In [10]:
import torch
# check if CUDA is available
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print('Using {} device'.format(device))

Using cuda device


In [2]:
# Tune hyperparameters on dataset for 30 epochs
model.tune(data="../YOLO_Datasets/TextLevel_train/data.yaml", workers=1, epochs=30, iterations=20, optimizer='AdamW', plots=False, save=True, hsv_h=0.0, hsv_s=0.0, hsv_v=0.0, translate=0.0, fliplr=0.0)

[34m[1mTuner: [0mInitialized Tuner instance with 'tune_dir=runs\detect\tune6'
[34m[1mTuner: [0m Learn about tuning at https://docs.ultralytics.com/guides/hyperparameter-tuning
[34m[1mTuner: [0mStarting iteration 1/20 with hyperparameters: {'lr0': 0.01, 'lrf': 0.01, 'momentum': 0.937, 'weight_decay': 0.0005, 'warmup_epochs': 3.0, 'warmup_momentum': 0.8, 'box': 7.5, 'cls': 0.5, 'dfl': 1.5, 'hsv_h': 0.0, 'hsv_s': 0.0, 'hsv_v': 0.0, 'degrees': 0.0, 'translate': 0.0, 'scale': 0.5, 'shear': 0.0, 'perspective': 0.0, 'flipud': 0.0, 'fliplr': 0.0, 'mosaic': 1.0, 'mixup': 0.0, 'copy_paste': 0.0}
Saved runs\detect\tune6\tune_scatter_plots.png
Saved runs\detect\tune6\tune_fitness.png

[34m[1mTuner: [0m1/20 iterations complete  (7427.89s)
[34m[1mTuner: [0mResults saved to [1mruns\detect\tune6[0m
[34m[1mTuner: [0mBest fitness=0.55011 observed at iteration 1
[34m[1mTuner: [0mBest fitness metrics are {'metrics/precision(B)': 0.83999, 'metrics/recall(B)': 0.77318, 'metrics/mAP50(

In [6]:
model = YOLO("runs/detect/tune6/weights/best.pt")

In [None]:
# Validate the model
metrics = model.val(workers=1, device="cpu")  # no arguments needed, dataset and settings remembered

In [11]:
metrics.seg.map    # map50-95
metrics.seg.map50  # map50
metrics.seg.map75  # map75
metrics.seg.maps   # a list contains map50-95 of each category

array([    0.19274,     0.32869,     0.40419,     0.39927,       0.561,     0.28032,     0.10029,     0.16814,     0.26469,     0.19651,     0.35886,     0.31838,     0.15703])

In [12]:
metrics.box.map    # map50-95
metrics.box.map50  # map50
metrics.box.map75  # map75
metrics.box.maps   # a list contains map50-95 of each category

array([    0.44217,     0.57988,      0.5628,     0.66923,     0.72803,     0.39053,     0.33478,     0.53026,     0.41717,     0.49925,     0.55777,     0.43547,      0.2592])