## Libraries
#### Mandatory
**`torch`**: PyTorch library contains the necessary tools to build deep learning models. It is used to train the model and make predictions.<br />
**`yolov5`**: YOLOv5's library contains tools required to build the model. We use utils in this case to load the model correctly.<br />
**`IPython`**: IPython library contains the necessary tools to display images in the notebook. It is used to display the images.<br />
**`pathlib`**: pathlib is a Python library for object-oriented path manipulations. It is used to get the path of the images.<br />
**`numpy`**: numpy library is used to work with arrays. It is used to convert the images to numpy arrays.<br />
**`sklearn`**: sklearn library is used to perform machine learning tasks. It is used to calculate the mean squared error and such.<br />

#### Optional
**`yaml`**: yaml library is used to read the yaml files that contain the configuration of the model.<br />
**`matplotlib`**: matplotlib is a Python library for creating static, animated, and interactive visualizations. It is also used to display the images.<br />
**`glob`**: glob library is used to retrieve files/pathnames matching a specified pattern.<br />
**`io`**: io library is used to handle various types of I/O (input/output) operations.<br />
**`os`**: os library is used to interact with the operating system. It is used to create directories.<br />
**`cv2`**: cv2 library is used to read and write images. Allows you to perform image processing and computer vision tasks.<br />
**`json`**: json library is used to work with JSON data. It is used to read the JSON file that contains the labels.<br />
**`shutil`**: shutil library is used to perform high-level operations on files and collections of files. It is used to copy the images to the output directory.<br />

In [8]:
# Mandatory libraries.
import torch
from yolov5 import utils
from IPython import display
from IPython.display import clear_output
from pathlib import Path
import yaml
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
import glob
import io
import os
import cv2
import json
import shutil
import numpy as np
from sklearn.model_selection import train_test_split

%matplotlib inline

## Written functions
#### (WIP) Model configuration
**`model.conf`**: Used to set a threshold on the confidence of the model. This helps to filter out the predictions that are not accurate enough.<br />
**`model.iou`**: Used to set a threshold on the (non maximum suppression) intersection over union of the model. This helps to filter out the predictions that are not accurate enough.<br />

In [6]:
model = torch.load('../PINT/yolov5/', 'custom', 
                       path='../PINT/yolov5/runs/train/exp/weights/best.pt', source='local')  # custom model

model.conf = 0.6
model.iou = 0.45

IsADirectoryError: [Errno 21] Is a directory: '../PINT/yolov5/'

#### Constants
**`PROJECT_NAME`**: Name of the project.<br />
**`BASE_MODEL`**: Name of the base model.<br />
**`TRAIN_BATCH`**: Batch size used for training. Training batch is the number of training samples that will be propagated through the network in one forward/backward pass.<br />
**`TRAIN_EPOCHS`**: Number of epochs used for training. Epochs are defined as the number of times the model will see the entire dataset.<br />
**`VAL_BATCH`**: Batch size used for validation. Validation batch is the number of validation samples that will be propagated through the network in one forward/backward pass.<br />

In [9]:
PROJECT_NAME = "divers_ml"
BASE_MODEL = "yolov5m.pt"
TRAIN_BATCH = 5
TRAIN_EPOCHS = 10
VAL_BATCH = 5

#### Renaming
Cell text below strictly used to rename data in order for it to rename any input into the correct format that the model can process.<br />
This folder hierarchy could _in theory_ be changed, but it's best we don't touch it.

In [3]:
# In case there is new data introduced from same data set, re-name:

# def move_files_to_dir(dirname):
#     # /{dirname}/images
#     for dir in os.listdir(os.path.join(dirname, 'images')):
#         for file in os.listdir(os.path.join(dirname, 'images', dir)):
#             image_file_name = dir + '_' + file
#             shutil.move(os.path.join(dirname, 'images', dir, file), os.path.join(dirname, 'images', image_file_name))
#         os.rmdir(os.path.join(dirname, 'images', dir))


# # Move files outside of train and val and test
# move_files_to_dir('data/test')
# move_files_to_dir('data/train')
# move_files_to_dir('data/valid')

#### Data paths
**`train_path`**: Leads to the training data (images).<br />
**`test_path`**: Leads to the testing data.<br />
**`valid_path`**: Leads to the validation data.<br />

In [10]:
train_path = "../data/train/images"
test_path = "../data/test/images"
valid_path = "../data/valid/images"

#### YAML-reading
Opening a file in write mode using `open()` and assigning it to the variable `file`. Using `yaml.dump()` to serialise a Python dictionary into a YAML format and write it to the opened file. <br/>
The dictionary contains several key-value pairs:<br/>
- The _"train"_ key contains the path to the training data.<br/>
- The _"test"_ key contains the path to the testing data.<br/>
- The _"val"_ key contains the path to the validation data.<br/>
- The _"nc"_ key contains the number of classes.<br/>
- The _"names"_ key contains the names of the objects that should be found.<br/>

In [11]:
with open("data.yaml", "w") as file:
    yaml.dump({
        "train": train_path,
        "test": test_path,
        "val": valid_path,
        "nc": 1,
        "names": ["diver"]
    }, stream=file, default_flow_style=None)

## Machine learning
#### Delete old data
See below - This is a function that clears previous results and before training the model.<br/> 
Placed in a separate cell in case you wish to keep the previous results.<br />

In [None]:
# Delete old results -- Training.
wildcard = f"{PROJECT_NAME}/feature_extraction*"
! rm -r $wildcard

#### Train the model
The whole line below is the line responsible for running the training of the model. Let's visit the functionality of it in order to grasp how it functions.<br/>
- `!` at the beginning suggests that this line might be executed in a Jupyter Notebook or a similar environment where shell commands can be run.<br/>
- `python yolov5/train.py` invokes the Python interpreter to execute the train.py script located in the yolov5 directory. This implies that the script is expected to be executed with the Python language. <br/>
- `--batch $TRAIN_BATCH` is a command-line argument passed to the `train.py` script. It specifies the batch size for training. The value of `$TRAIN_BATCH` is a placeholder replaced with the actual value (defined before) before running the command.<br/>
- `--epochs $TRAIN_EPOCHS` is another command-line argument specifying the number of epochs for training. Similar to the previous argument, `$TRAIN_EPOCHS` is replaced with a specific value previously defined.<br/>
- `--data "data.yaml"` is an argument specifying the path to a YAML file that contains data configuration for the training process. The file named "`data.yaml`" is expected to be present in the current directory.<br/>
- `--weights $BASE_MODEL` specifies the path to the base model or pre-trained weights to be used for training. The value of $BASE_MODEL is replaced with the actual path to the desired model weights.<br/>
- `--project $PROJECT_NAME` is an argument specifying the name of the project. It is used to organize the training outputs or checkpoints under a specific project name.<br/>
- `--name 'feature_extraction'` specifies a name for the current training run. In this case, the name is set as "`feature_extraction`".<br/>
- `--cache` is an argument indicating that caching should be enabled during training. Caching can help improve training performance by reducing data loading time.<br/>
- `--freeze 12` specifies the number of initial layers or stages to freeze during training. In this case, the first 12 layers will be frozen, and only the remaining layers will be fine-tuned (Don't worry about what "frozen" means here).<br/>

In [8]:
! python yolov5/train.py --batch $TRAIN_BATCH --epochs $TRAIN_EPOCHS --data "data.yaml" --weights $BASE_MODEL --project $PROJECT_NAME --name 'feature_extraction' --cache --freeze 12

[34m[1mtrain: [0mweights=yolov5m.pt, cfg=, data=data.yaml, hyp=yolov5/data/hyps/hyp.scratch-low.yaml, epochs=10, batch_size=5, imgsz=640, rect=False, resume=False, nosave=False, noval=False, noautoanchor=False, noplots=False, evolve=None, bucket=, cache=ram, image_weights=False, device=, multi_scale=False, single_cls=False, optimizer=SGD, sync_bn=False, workers=8, project=divers_ml, name=feature_extraction, exist_ok=False, quad=False, cos_lr=False, label_smoothing=0.0, patience=100, freeze=[12], save_period=-1, seed=0, local_rank=-1, entity=None, upload_dataset=False, bbox_interval=-1, artifact_alias=latest
[34m[1mgithub: [0mup to date with https://github.com/ultralytics/yolov5 ✅
YOLOv5 🚀 v7.0-174-g5eb7f7d Python-3.10.11 torch-2.0.1+cu117 CPU

[34m[1mhyperparameters: [0mlr0=0.01, lrf=0.01, momentum=0.937, weight_decay=0.0005, warmup_epochs=3.0, warmup_momentum=0.8, warmup_bias_lr=0.1, box=0.05, cls=0.5, cls_pw=1.0, obj=1.0, obj_pw=1.0, iou_t=0.2, anchor_t=4.0, fl_gamma=0.0, hs

#### Delete old validation results
See below - This is a function that clears previous results and before validating the model.<br/>

In [13]:
# Delete old results - Validation data (Sensible? Feasibly?)
wildcard = f"{PROJECT_NAME}/validation_on_test_data*"
! rm -r $wildcard

#### Validate the model
This script below is responsible to run the validation of the model. Let's visit the functionality of it in order to grasp how it functions.<br/>
- `!` at the beginning suggests that this line might be executed in a Jupyter Notebook or a similar environment where shell commands can be run.<br/>
- `python yolov5/detect.py` invokes the Python interpreter to execute the detect.py script located in the yolov5 directory. This implies that the script is expected to be executed with the Python language. <br/>
- `--weights $WEIGHTS_BEST` is an argument specifying the path to the weights of the model to be used for validation. The path is relative to the current directory.<br/>
- `--batch $VAL_BATCH` is a command-line argument passed to the `detect.py` script. It specifies the batch size for validation.<br/>
- `--data 'data.yaml'` is an argument specifying the path to a YAML file that contains data configuration for the validation process. The file named "`data.yaml`" is expected to be present in the current directory.<br/>
- `--task test` is an argument specifying the task to be performed. In this case, the task is set as "`test`".<br/>
- `--project $PROJECT_NAME` is an argument specifying the name of the project. It is used to organize the validation outputs or checkpoints under a specific project name.<br/>
- `--name 'validation_on_test_data'` specifies a name for the current validation run. In this case, the name is set as "`validation_on_test_data`".<br/>
- `--augment` is an argument indicating that data augmentation should be enabled during validation. Data augmentation can help improve validation performance by increasing the number of validation samples.<br/>


In [12]:
WEIGHTS_BEST = f"{PROJECT_NAME}/feature_extraction/weights/best.pt"
! python yolov5/val.py --weights $WEIGHTS_BEST --batch $VAL_BATCH --data 'data.yaml' --task test --project $PROJECT_NAME --name 'validation_on_test_data' --augment

[34m[1mval: [0mdata=data.yaml, weights=['divers_ml/feature_extraction/weights/best.pt'], batch_size=5, imgsz=640, conf_thres=0.001, iou_thres=0.6, max_det=300, task=test, device=, workers=8, single_cls=False, augment=True, verbose=False, save_txt=False, save_hybrid=False, save_conf=False, save_json=False, project=divers_ml, name=validation_on_test_data, exist_ok=False, half=False, dnn=False
YOLOv5 🚀 v7.0-174-g5eb7f7d Python-3.10.11 torch-2.0.1+cu117 CPU

Fusing layers... 
Model summary: 212 layers, 20852934 parameters, 0 gradients, 47.9 GFLOPs
[34m[1mtest: [0mScanning /home/saritak/Documents/PINT/data/test/labels... 4 images, 34 bac[0m
[34m[1mtest: [0mNew cache created: /home/saritak/Documents/PINT/data/test/labels.cache
                 Class     Images  Instances          P          R      mAP50   
                   all         38         12      0.199      0.833      0.217     0.0904
Speed: 1.2ms pre-process, 469.6ms inference, 1.1ms NMS per image at shape (5, 3, 640, 640

#### Delete old test results
See below - This is a function that clears previous results and placed before testing in case a user wishes to keep previous results.<br/>

In [16]:
# Delete old results - Detection test.
wildcard = f"{PROJECT_NAME}/detect_test*"
! rm -r $wildcard

#### Testing
Testing will be done on the test data provided (images, mainly). The script below runs the test, so do beware when running it. It shouldn't take too long with the current amount of images.<br/>
- `--weights $WEIGHTS_BEST` is an argument specifying the path to the weights of the model to be used for testing. The path is relative to the current directory.<br/>
- `--conf #` is an argument specifying the confidence threshold. It is used to filter out predictions that are not accurate enough.<br/>
- `--source $test_path` is an argument specifying the path to the test data. The path is relative to the current directory.<br/>
- `--name 'detect_test'` specifies a name for the current testing run. In this case, the name is set as "`detect_test`".<br/>
- `--augment` is an argument indicating that data augmentation should be enabled during testing. Data augmentation can help improve testing performance by increasing the number of testing samples.<br/>
- `--line=3` is an argument specifying the line thickness of the bounding boxes.<br/>
- `--iou #` is an argument specifying the intersection over union threshold. It is used to filter out predictions that are not accurate enough.<br/>
- `--save-txt` is an argument indicating that the predicted bounding boxes should be saved in a text file.<br/>
- `--save-conf` is an argument indicating that the confidence of the predicted bounding boxes should be saved in the text file.<br/>

In [17]:
! python yolov5/detect.py --weights $WEIGHTS_BEST --conf 0.6 --source 'data/test/images' --project $PROJECT_NAME --name 'detect_test' --augment --line=3 --iou 0.2 --save-txt --save-conf

[34m[1mdetect: [0mweights=['divers_ml/feature_extraction/weights/best.pt'], source=data/test/images, data=yolov5/data/coco128.yaml, imgsz=[640, 640], conf_thres=0.6, iou_thres=0.2, max_det=1000, device=, view_img=False, save_txt=True, save_conf=True, save_crop=False, nosave=False, classes=None, agnostic_nms=False, augment=True, visualize=False, update=False, project=divers_ml, name=detect_test, exist_ok=False, line_thickness=3, hide_labels=False, hide_conf=False, half=False, dnn=False, vid_stride=1
YOLOv5 🚀 v7.0-174-g5eb7f7d Python-3.10.11 torch-2.0.1+cu117 CPU

Fusing layers... 
Model summary: 212 layers, 20852934 parameters, 0 gradients, 47.9 GFLOPs
image 1/38 /home/saritak/Documents/PINT/data/test/images/Pasted image 1.png: 480x640 2 divers, 382.0ms
image 2/38 /home/saritak/Documents/PINT/data/test/images/Pasted image 10.png: 448x640 2 divers, 467.9ms
image 3/38 /home/saritak/Documents/PINT/data/test/images/Pasted image 11.png: 480x640 1 diver, 423.6ms
image 4/38 /home/saritak/Do

In [None]:
# TODO: Algorithm; Take only outside box or highest confidence (I don't know how).

# TODO: Increase data set, improve training. (Optional - Not to be done now)

# TODO: Return boolean if diver is detected.

# TODO: Return image with bounding box. <- We can do this with the coordinates and confidence. (Done?)
    # TODO: Return coordinates of diver. <- Done, returns a txt file with coordinates of the diver within the image processed (like, where he is in that image - pixels).
    # TODO: Return confidence of diver. <- Done, confidence is the last floating point value on the .txt file.