# Setup

Clone repo, install dependencies and check PyTorch and GPU.

In [None]:
!git clone https://github.com/ultralytics/yolov5  # clone YOLOv5
%cd yolov5
%pip install -qr requirements.txt  # install

import torch
import random
import glob, os, errno
import shutil
from google.colab import files
from PIL import Image
from pathlib import Path
import cv2
import numpy as np
from numpy import savetxt, loadtxt
from sklearn import preprocessing
from yolov5 import utils

display = utils.notebook_init()  # checks

In [None]:
print(torch.__version__)

# 1. Downloading the data

The data that is being used in this project are obtained from the <a href="https://bozcani.github.io/auairdataset">AU-AIR dataset</a>. Since the labeling was done automatically and pretty inaccurate, the labeling had to be redone manually. For this project, we are only interested in humans and if they are wearing something white or other color. </br>
The result of this relabeling was uploaded to a local Google Drive, where the next codeblock will gather the data from. Beside that, there will also be the possibility to upload a dataset.yaml to specify where the model can find the dataset later.

In [None]:
%cd data

os.path.abspath(os.getcwd())

!wget -O train_set.zip # Insert Drive Train Dataset Link Here
!unzip train_set.zip
!wget -O validatie_set.zip # Insert Drive Validation Dataset Link Here
!unzip validatie_set.zip
!wget -O test_set.zip # Insert Drive Test Dataset Link Here
!unzip test_set.zip

os.remove("train_set.zip")
os.remove("validatie_set.zip")
os.remove("test_set.zip")

%cd ../

## Import Dataset.yaml
Import the Dataset.yaml file from a local machine. Within this file, the classes are specified that has to be trained on and where the data could be found.

In [None]:
print('Upload Dataset.yaml')
uploaded = files.upload()

# 2. Preparing the data

In [None]:
labels_human = Path('data/labels/classified_humans')
images_human = Path('data/images/classified_humans')

labels_rest = Path('data/labels/rest')
images_rest = Path('data/images/rest')

## Mirroring data
Another possibility to improve the results of the model is the create extra data through data flipping. With this data augmentation method, extra data gets created through the method of horizontally flipping the image and annotation.

In [None]:
def flip_annotation(filepath, img, name_operation = "_flipped", axis=1):
  # Handle with flip data
  file_data = []
  # open file and read the content in a list
  with open(os.path.join(filepath + '.txt'), 'r') as myfile:
      for line in myfile:
          # remove linebreak which is the last character of the string
          currentLine = line[:-1]
          data = currentLine.split(" ")
          # add item to the list
          file_data.append(data)

      # Change X_center Fliplr
      for i in file_data:
          if len(i) == 5:
            i[axis] = str(1 - float(i[axis]) - 1 / img.shape[1])[0:8]


      # Write back to the file
      f = open(os.path.join(filepath + name_operation + '.txt'), 'w')
      for i in file_data:
          res = ""
          for j in i:
              res += j + " "
          f.write(res[:-1]) # Save all but ignore from the least " "
          f.write("\n")
      f.close()

In [None]:
def flip_images(filepath, img, name_operation="_flipped", axis=1):

  # Flip the original image numpy horizontally
  horz_img = cv2.flip(img, 1)
  cv2.imwrite(os.path.join(filepath + name_operation + '.jpg'), horz_img)

In [None]:
# Flip the human images and corresponding annotations
for fil in labels_human.glob("[!classes]*.txt"):
    fil = os.path.splitext(fil)[0]
    fil = fil.replace("labels", "images")
    if os.path.isfile(fil + ".jpg"):
      img = cv2.imread(fil + ".jpg")
      flip_images(fil, img)
      fil = fil.replace("images", "labels")
      flip_annotation(fil, img)

# Flip the rest images and corresponding annotations
for fil in labels_rest.glob("[!classes]*.txt"):
    fil = os.path.splitext(fil)[0]
    fil = fil.replace("labels", "images")
    if os.path.isfile(fil + ".jpg"):
      img = cv2.imread(fil + ".jpg")
      flip_images(fil, img)
      fil = fil.replace("images", "labels")
      flip_annotation(fil, img)

## Grayscaling (CURRENTLY NOT BEING USED)

One of the possibilities to improve the results of the YOLOv5s model is to add grayscaling to the images. Which means that the colors in the picture will be transformed to grayscaled pictures. It's possible that the model will still be able to tell what the difference is between white clothed and other clothed based on the amount of gray, white or black in a picture.

In [None]:
# for fil in images_human.glob("*.jpg"):
#     image = cv2.imread(fil)
#     gray_image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY) # convert to grayscale
#     cv2.imwrite(fil,gray_image) # write to location with same name

# for fil in images_rest.glob("*.jpg"):
#     image = cv2.imread(fil)
#     gray_image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY) # convert to grayscale
#     cv2.imwrite(fil,gray_image) # write to location with same name

# 3. Splitting the data



In [None]:
# Check if annotation for image is present
for fil in images_rest.glob("*.jpg"):
  if not os.path.isfile(str(fil).replace(".jpg",".txt").replace("images", "labels")):
    os.remove(fil)

## Equalizing the amount of data (CURRENTLY NOT BEING USED)
The data gets checked on how many appearances of one class is in the dataset. After that, images with the bigger class gets randomly picked and checked if the other class is included in the image. If not, the image gets deleted. This goes on until both classes are close to equal.

In [None]:
# white_clothed = 0
# other_clothed = 0

# for f in labels_human.glob("[!classes]*.txt"):
#     with open(f) as text:
#         inhoud=text.readlines()
#     for line in inhoud:
#         print(line[0])
#         first_char = line[0]
#         if first_char == "0":
#             white_clothed = white_clothed + 1
#         elif first_char == "1":
#             other_clothed = other_clothed + 1

# difference_clothing = other_clothed - white_clothed

In [None]:
# total_amount_of_other_clothes = 0

# while total_amount_of_other_clothes < round(difference_clothing*0.60289):

#     delete = True
#     file_amount_of_other_clothes = 0

#     f = random.choice(list(labels_human.glob('[!classes][frame]*.txt')))

#     with open(f) as text:
#         inhoud=text.readlines()
#     for line in inhoud:
#         first_char = line[0]
#         if first_char == "0":
#             delete = False
#             continue
#         if first_char == "1":
#             file_amount_of_other_clothes = file_amount_of_other_clothes + 1
            
#     if delete:
#         os.remove(f)
#         os.remove(str(f).replace('.txt', '.jpg').replace('labels', 'images'))
#         total_amount_of_other_clothes = total_amount_of_other_clothes + file_amount_of_other_clothes

#     print(f'{total_amount_of_other_clothes} out of {round(difference_clothing*0.60289)}')

## Splitting
After the data has been downloaded, unzipped and equalized, the data has to be split between different datasets to train and test the model with. There needs to be a train, validation and testset. 
</br></br>
The train set will exist of 80% of the total images of humans and 10% of this train amount are rest images which don't include humans. This is done, because it's being recommended by the YOLO documentation to have 10% of the dataset include pictures which doesn't include the recognized object.</br>
The validation set will exist of 10% of the images of humans and 10% of this validation amount are images which don't include humans. <br>
For the test set, the same is done as with the validation set.

In [None]:
os.makedirs('data/train/labels/', exist_ok=True)
os.makedirs('data/train/images/', exist_ok=True)
os.makedirs('data/validate/labels/', exist_ok=True)
os.makedirs('data/validate/images/', exist_ok=True)
os.makedirs('data/test/labels/', exist_ok=True)
os.makedirs('data/test/images/', exist_ok=True)

amount_of_imgs = len(next(os.walk('data/images/classified_humans'))[2])

print(amount_of_imgs)

train_amount = int(round(amount_of_imgs * 0.815, 0))-1
print(train_amount)
val_test_amount = int(round(amount_of_imgs * 0.08, 0))-1
print(val_test_amount)

try:
  os.remove(str(labels_human) + '/classes.txt')
  os.remove(str(labels_rest) + '/classes.txt')
except:
  print("Files already removed.") 

for i in range(0,train_amount):
  file = random.choice(os.listdir(labels_human))
  shutil.move(str(labels_human) + "/" + file, f"data/train/labels/{file}")
  shutil.move(str(images_human) + "/" + file.replace(file[- 4 :], '.jpg'), f"data/train/images/{file.replace(file[- 4 :], '.jpg')}")

for i in range(0,int(round((train_amount / 10),0))):
  file = random.choice(os.listdir(images_rest))
  shutil.move(str(images_rest) + "/" + file, f"data/train/images/{file}")
  shutil.move(str(labels_rest) + "/" + file.replace(file[- 4 :], '.txt'), f"data/train/labels/{file.replace(file[- 4 :], '.txt')}")

for i in range(0,val_test_amount):
  file = random.choice(os.listdir(labels_human))
  shutil.move(str(labels_human) + "/" + file, f"data/validate/labels/{file}")
  shutil.move(str(images_human) + "/" + file.replace(file[- 4 :], '.jpg'), f"data/validate/images/{file.replace(file[- 4 :], '.jpg')}")

for i in range(0,int(round((val_test_amount / 10),0))):
  file = random.choice(os.listdir(images_rest))
  shutil.move(str(images_rest) + "/" + file, f"data/validate/images/{file}")
  shutil.move(str(labels_rest) + "/" + file.replace(file[- 4 :], '.txt'), f"data/validate/labels/{file.replace(file[- 4 :], '.txt')}")

for i in range(0,val_test_amount):
  file = random.choice(os.listdir(labels_human))
  shutil.move(str(labels_human) + "/" + file, f"data/test/labels/{file}")
  shutil.move(str(images_human) + "/" + file.replace(file[- 4 :], '.jpg'), f"data/test/images/{file.replace(file[- 4 :], '.jpg')}")

for i in range(0,int(round((val_test_amount / 10),0))):
  file = random.choice(os.listdir(images_rest))
  shutil.move(str(images_rest) + "/" + file, f"data/test/images/{file}")
  shutil.move(str(labels_rest) + "/" + file.replace(file[- 4 :], '.txt'), f"data/validate/labels/{file.replace(file[- 4 :], '.txt')}")

# 4. Train

<p align=""><a href="https://roboflow.com/?ref=ultralytics"><img width="1000" src="https://uploads-ssl.webflow.com/5f6bc60e665f54545a1e52a5/615627e5824c9c6195abfda9_computer-vision-cycle.png"/></a></p>
During the training fase, the YOLOv5s model will be trained to recognize people with white or other colored clothes, based on the data that has been prepared before.
</br>

- **Pretrained [Models](https://github.com/ultralytics/yolov5/tree/master/models)** are downloaded
automatically from the [latest YOLOv5 release](https://github.com/ultralytics/yolov5/releases)
- **Training Results** are saved to `runs/train/` with incrementing run directories, i.e. `runs/train/exp2`, `runs/train/exp3` etc.

## Preparing WandB

With setting up this library, we can see statistics from the model training process during training as well as after training. These results will be kept forever, even when the Google Colab Environment will reset itself.

In [None]:
%pip install -q wandb
import wandb
wandb.login()

## Training the model

In the following code block, the model will be trained. It will standard be training with a picture scale of 920x920, with a batch of 16 and the small weight of the YOLOv5 model. Beside that it will train 50 epochs. </br>
Feel free to variate with the parameters to get better results.

In [None]:
# Train YOLOv5s on custom dataset for 50 epochs
!python train.py --img 920 --batch 16 --epochs 50 --data dataset.yml --weights yolov5s.pt

# 5. Download results locally

In [None]:
!zip -r /content/exp .zip /content/yolov5/runs/train/exp

from google.colab import files
files.download("/content/exp.zip")

# 6. Visualize

## Weights & Biases Logging

[Weights & Biases](https://wandb.ai/site?utm_campaign=repo_yolo_notebook) (W&B) is now integrated with YOLOv5 for real-time visualization and cloud logging of training runs. This allows for better run comparison and introspection, as well improved visibility and collaboration for teams. To enable W&B `pip install wandb`, and then train normally (you will be guided through setup on first use). 

During training you will see live updates at [https://wandb.ai/home](https://wandb.ai/home?utm_campaign=repo_yolo_notebook), and you can create and share detailed [Reports](https://wandb.ai/glenn-jocher/yolov5_tutorial/reports/YOLOv5-COCO128-Tutorial-Results--VmlldzozMDI5OTY) of your results. For more information see the [YOLOv5 Weights & Biases Tutorial](https://github.com/ultralytics/yolov5/issues/1289). 

<p align="left"><img width="900" alt="Weights & Biases dashboard" src="https://user-images.githubusercontent.com/26833433/135390767-c28b050f-8455-4004-adb0-3b730386e2b2.png"></p>