<a href="https://colab.research.google.com/github/adryduty/computer-vision-cat-project/blob/main/Training_comp_vision_cat__26_June.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# <font color = red> Training

## <font color = blue> Load all the needed modules

In [None]:
import torch
import os
import shutil
import random

* **torch:** to load the trained model
* **os:** to create folders
* **shutil:** to move files from one folder to another
* **random:** to split the dataset in training, validation and testing set

## <font color = blue> Load the YOLO with all the dependencies. This code will also tell you if you're using the GPU or not. Moreover it will tell you which GPU you are using.

In [None]:
#clone YOLOv5 and 
!git clone https://github.com/ultralytics/yolov5  # clone repo
%cd yolov5
%pip install -qr requirements.txt # install dependencies

print(f"Setup complete. Using torch {torch.__version__} ({torch.cuda.get_device_properties(0).name if torch.cuda.is_available() else 'CPU'})")

Cloning into 'yolov5'...
remote: Enumerating objects: 12338, done.[K
remote: Counting objects: 100% (3/3), done.[K
remote: Compressing objects: 100% (3/3), done.[K
remote: Total 12338 (delta 0), reused 1 (delta 0), pack-reused 12335[K
Receiving objects: 100% (12338/12338), 12.13 MiB | 20.23 MiB/s, done.
Resolving deltas: 100% (8513/8513), done.
/content/yolov5
[K     |████████████████████████████████| 596 kB 7.9 MB/s 
[?25hSetup complete. Using torch 1.11.0+cu113 (Tesla T4)


## <font color = blue> **(1)** Load images and labels. **(2)** Create new folders (training, validation and testing set) and store the images and the labels properly. **(3)** Create the yaml file.

###Load images and labels (notice that are in the same zip file that you have to load by yourself)

In [None]:
# This chunk allows to change the directory in the /content directory
%cd ..

/content


In [None]:
zip_file = "archive.zip"

if os.path.isfile(zip_file):
  shutil.unpack_archive(zip_file, "data")
else:
  print(zip_file + " not found")

### Create new folders

In [None]:
path_tree = ["/content/data/images/training",
            "/content/data/images/validation",
             "/content/data/images/testing",
             "/content/data/labels/training",
             "/content/data/labels/validation",
             "/content/data/labels/testing"]
            
for path in path_tree:
  os.makedirs(path)

In [None]:
def images_labels_dict_creator(source):
  '''
  - Input: source path where are stored the images (jpg format) and the labels (txt format)

  - Output: a tuple with a list of the names of all the pictures (without labels) and
  a dictionary where the keys are the images and the values are the labels.
  '''
  images_list = [item for item in os.listdir(source) if item.endswith("jpg")] # Images list
  labels_list = [item for item in os.listdir(source) if item.endswith("txt")] # labels list
  images_list.sort()
  labels_list.sort()
  images_labels_dict = dict(zip(images_list, labels_list))
  
  return images_list, images_labels_dict

In [None]:
def file_path_changer(source, dest): 
  '''
  This function moves the files from the path 'source' to the path 'dest'
  '''
  shutil.move(source, dest)


### The following chunk uses the functions we have created above.
**Steps:** <br>
* For reproducibility reasons we use a random seed.
* Assign to source the path of archive and then use this path as input for the images_labels_dict_creator function. The output is assigned to images_list and images_labels_dict.
* Set the size of the training set. In this case we choose $70$%.
* Assign to training_set a random sample ($70$%) of the images. 
* For each image in the training set, move it in /content/data/images/training/ path.
* images_labels_dict is a dictionary having as key the image names and as value the labels. So, take the label of the image you have already stored in /content/data/images/training/ and move it in the following path: /content/data/labels/training/ .
* So far, we have stored the training set images and labels where they have to be, now, lets store the validation set images and labels.
* As we did previously, we use the images_labels_dict_creator function with the source path to see which images are stored in this path (obviously, now there are $70$% less, because we moved them).
* We choose the size of the validation set as $66$% of the remaining $30$% images: $66$% * $30$% = $20$%.
* We assign to validation_set a random sample ($66$%) of the images in images_list.
* We iterate through each image in validation_set and move the images in /content/data/images/validation/ and the corresponding labels in /content/data/labels/validation/ .

In [None]:
random.seed(42)

source = "/content/data/archive/"

images_list, images_labels_dict = images_labels_dict_creator(source)

training_set_dim = int(len(images_list)*0.7)

training_set = random.sample(images_list, training_set_dim)

for image_name in training_set: # random.sample doesn't take two times the same element in images_list (replace = False)
  file_path_changer(source + image_name, "/content/data/images/training/ " + image_name) 
  label_name = images_labels_dict[image_name] 
  file_path_changer(source + label_name, "/content/data/labels/training/ " + label_name) 


images_list, images_labels_dict = images_labels_dict_creator(source)
validation_set_dim = int(len(images_list)*0.66)

validation_set = random.sample(images_list, validation_set_dim)

for image_name in validation_set:
  file_path_changer(source + image_name, "/content/data/images/validation/ " + image_name) 
  label_name = images_labels_dict[image_name] 
  file_path_changer(source + label_name, "/content/data/labels/validation/ " + label_name) 

### The following chunk creates a yaml file and stores it in /content/yolov5/dataset.yaml

In [None]:
f = open("/content/yolov5/dataset.yaml", "w")

f.write("train: ../data/images/training/\n")
f.write("val: ../data/images/validation/\n")
f.write("nc: 1\n")
f.write("names: ['GHIRI']\n")
f.close()

## <font color = blue> Train the model

### The following chunk changes the directory in the one of yolov5 and computes the tuning of the hyperparameters according to a genetic algorithm (more information on the appendix). We can do that thanks to the evolve parameter.
### Moreover we use resized images ($416*416$), a mini-batch size of 50, 150 epochs, the data indicated in the yaml file and the starting weights of the yolov5s.pt (in the documentation of yolov5 it's suggested to use these weights as starting weights instead of using random ones). The cache command is just for speeding up the computation.

In [None]:
%cd yolov5
!python train.py --img 416 --batch-size 128 --epochs 150 --data /content/yolov5/dataset.yaml --weights yolov5s.pt --cache --evolve 100

[1;30;43mOutput streaming troncato alle ultime 5000 righe.[0m
     Epoch   gpu_mem       box       obj       cls    labels  img_size
   143/149       12G   0.01912  0.008287         0        56       416: 100% 2/2 [00:01<00:00,  1.54it/s]

     Epoch   gpu_mem       box       obj       cls    labels  img_size
   144/149       12G   0.02007  0.008227         0        62       416: 100% 2/2 [00:01<00:00,  1.81it/s]

     Epoch   gpu_mem       box       obj       cls    labels  img_size
   145/149       12G   0.01941  0.008141         0        62       416: 100% 2/2 [00:01<00:00,  1.76it/s]

     Epoch   gpu_mem       box       obj       cls    labels  img_size
   146/149       12G   0.01825  0.008427         0        63       416: 100% 2/2 [00:01<00:00,  1.82it/s]

     Epoch   gpu_mem       box       obj       cls    labels  img_size
   147/149       12G   0.01884   0.00813         0        58       416: 100% 2/2 [00:01<00:00,  1.52it/s]

     Epoch   gpu_mem       box       obj      

### Here we are going to train using the best hyperparameters that we found in the previous code.

In [None]:
%cd yolov5
!python train.py --img 416 --batch 50 --epochs 150 --data /content/yolov5/dataset.yaml --weights yolov5s.pt --cache --hyp /content/hyp_evolve.yaml

### With the following chunk you can see the plots of the results

In [None]:
# Start tensorboard
# Launch after you have started training
# logs save in the folder "runs"
%load_ext tensorboard
%tensorboard --logdir runs

## <font color = blue> Test the model

### Eventually, the following two cells allow us to see which are the results on images that the model has never seen (testing set).

In [None]:
source = "/content/data/archive/"

images_list, images_labels_dict = images_labels_dict_creator(source)

testing_set = images_list

for image_name in testing_set: # random.sample doesn't take two times the same element in images_list (replace = False)
  file_path_changer(source + image_name, "/content/data/images/testing/ " + image_name) 
  label_name = images_labels_dict[image_name] 
  file_path_changer(source + label_name, "/content/data/labels/testing/ " + label_name)  

In [None]:
# Model

# Remember first to load the best.pt file in colab
model = torch.hub.load("ultralytics/yolov5", 'custom', path="/content/best.pt")
#model = torch.hub.load("ultralytics/yolov5", 'custom', path="/content/yolov5/runs/train/exp/weights/best.pt")

model.conf = 0.6

# Images
img = '/content/data/archive/img10.jpg'  # or file, Path, PIL, OpenCV, numpy, list

img_lists = os.listdir('/content/data/images/testing/')
path_img_lists = ['/content/data/images/testing/'+img for img in img_lists]

# Inference
results = model(path_img_lists)

# Results
results.save()  # or .show(), .save(), .crop(), .pandas(), etc.

#### The following chunk allows you to use a video for testing the network

In [None]:
!python detect.py --source /content/bracciolo.mov --weights /content/best.pt

# <a font color = blue> APPENDIX

Genetic algorithms (GAs) are stochastic search algorithms inspired by the basic principles of biological evolution and natural selection. GAs simulate the evolution of living organisms, where the fittest individuals dominate over the weaker ones, by mimicking the biological mechanisms of evolution, such as selection, crossover and mutation.

We used a GA to determine the hyperparameters by selecting the best ones that can be obtained by a combination of ”individuals” (vectors) having certain “genes” (parameters).