# YOLOv5 implementation

Dataset link: [hand-gesture-recongition-yolo-v3](https://www.kaggle.com/abdullahmujahidali/hand-gesture-recongition-yolo-v3)

Ref: [Do_Thuan.pdf](https://www.theseus.fi/bitstream/handle/10024/452552/Do_Thuan.pdf)

## 1. Check GPU usage status:
To use GPU: Go to Edit -> Notebook settings -> Hardware accelerator: GPU.
If we don't use GPU, we can skip this step.

In [None]:
!nvidia-smi

Sat Jan  1 13:04:25 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 495.44       Driver Version: 460.32.03    CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  Tesla K80           Off  | 00000000:00:04.0 Off |                    0 |
| N/A   64C    P8    32W / 149W |      0MiB / 11441MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Proces

## 2. Clone yolov5 repository:

In [None]:
%%writefile requirements.txt

# pip install -r requirements.txt

# Base ----------------------------------------
matplotlib>=3.2.2
numpy>=1.18.5
opencv-python>=4.1.2
Pillow>=7.1.2
PyYAML>=5.3.1
requests>=2.23.0
scipy>=1.4.1
# torch>=1.7.0
# torchvision>=0.8.1
tqdm>=4.41.0

# Logging -------------------------------------
tensorboard>=2.4.1
# wandb

# Plotting ------------------------------------
pandas>=1.1.4
seaborn>=0.11.0

# Export --------------------------------------
# coremltools>=4.1  # CoreML export
# onnx>=1.9.0  # ONNX export
# onnx-simplifier>=0.3.6  # ONNX simplifier
# scikit-learn==0.19.2  # CoreML quantization
# tensorflow>=2.4.1  # TFLite export
# tensorflowjs>=3.9.0  # TF.js export
# openvino-dev  # OpenVINO export

# Extras --------------------------------------
# albumentations>=1.0.3
# Cython  # for pycocotools https://github.com/cocodataset/cocoapi/issues/172
# pycocotools>=2.0  # COCO mAP
# roboflow
thop  # FLOPs computation

Writing requirements.txt


In [None]:
!pip3 install torch==1.9.1+cu102 torchvision==0.10.1+cu102 torchaudio===0.9.1 -f https://download.pytorch.org/whl/torch_stable.html

> Install torch with stable version to be able to run code with CPU. But the training is VERY slow 😭. 

> After we use GPU a period of time, we will exceed usage limit, then we will have to wait a long time for usage limit to be reseted.

In [None]:
!git clone https://github.com/ultralytics/yolov5.git

Cloning into 'yolov5'...
remote: Enumerating objects: 10367, done.[K
remote: Total 10367 (delta 0), reused 0 (delta 0), pack-reused 10367[K
Receiving objects: 100% (10367/10367), 10.54 MiB | 8.61 MiB/s, done.
Resolving deltas: 100% (7166/7166), done.


In [None]:
# If we use GPU then use default requirement file
!pip3 install -r yolov5/requirements.txt
# !pip3 install -r requirements.txt
%cd yolov5

Collecting PyYAML>=5.3.1
  Downloading PyYAML-6.0-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl (596 kB)
[K     |████████████████████████████████| 596 kB 4.2 MB/s 
Collecting thop
  Downloading thop-0.0.31.post2005241907-py3-none-any.whl (8.7 kB)
Installing collected packages: thop, PyYAML
  Attempting uninstall: PyYAML
    Found existing installation: PyYAML 3.13
    Uninstalling PyYAML-3.13:
      Successfully uninstalled PyYAML-3.13
Successfully installed PyYAML-6.0 thop-0.0.31.post2005241907
/content/yolov5


## 3. Import neccessary libraries:

In [None]:
import torch
import os
import shutil
import numpy as np
from IPython.display import Image, clear_output

clear_output()
print('Setup complete')

Setup complete


> If you have clone the repo twice, you can't delete folder directly. Insteads, use this code to delete the folder.

> ⚠️ Note: Use this code wisely.

In [None]:
import shutil

# shutil.rmtree('/content/yolov5/yolov5')

## 4. Mount your drive for uploaded data:

In [None]:
from google.colab import drive
drive.mount('/content/drive')


Mounted at /content/drive


## 5. Create neccessary folders for YOLOv5:

In [None]:
root_dir = '/content/drive/MyDrive/hand_gestures/'
os.makedirs(root_dir + 'train')
os.makedirs(root_dir + 'train/images')
os.makedirs(root_dir + 'train/labels')

os.makedirs(root_dir + 'valid')
os.makedirs(root_dir + 'valid/images')
os.makedirs(root_dir + 'valid/labels')

## 6. List all images in the uploaded folder:

The dataset have redundants .txt files, so we only want image files. Because the images dataset contains UPPERCASE extension .JPG, so we have to replace both .JPG and .jpg. 

In [None]:
root_dir = '/content/drive/MyDrive/hand_gestures/'
all_images = os.listdir(root_dir + 'data')

all_images = [img.replace('.JPG', '').replace('.jpg', '') for img in all_images if '.txt' not in img]
print(all_images)

['112', '211', '233', '217', '140', '226', '220', '138', '243', '123', '120', '137', '218', '128', '122', '100', '214', '203', '204', '121', '111', '225', '110', '115', '126', '134', '231', '124', '237', '209', '242', '239', '235', '107', '113', '136', '208', '119', '234', '118', '114', '135', '102', '132', '232', '101', '127', '206', '219', '230', '227', '200', '108', '131', '213', '117', '103', '130', '222', '212', '224', '229', '129', '240', '221', '141', '139', '215', '133', '207', '216', '241', '238', '106', '236', '205', '116', '223', '228', '105', '109', '125', '104', '202', '417', '429', '403', '348', '325', '415', '334', '336', '332', '304', '438', '323', '524', '400', '505', '331', '422', '307', '407', '534', '525', '309', '441', '322', '437', '419', '346', '405', '500', '324', '513', '412', '536', '320', '537', '302', '436', '442', '445', '501', '404', '535', '421', '518', '424', '315', '314', '317', '502', '533', '306', '514', '512', '504', '305', '528', '311', '532', '440'

## 7. Randomly select data for train set and test set:

In [None]:
np.random.shuffle(all_images)

portions = 0.8
num_train = round(len(all_images) * portions)
num_valid = len(all_images) - num_train
print(f'Number of train set: {num_train}, valid set: {num_valid}')

train_set = all_images[:num_train]
valid_set = all_images[num_train:]
print(train_set)
print(valid_set)


Number of train set: 171, valid set: 43
['131', '237', '116', '521', '220', '118', '346', '202', '531', '440', '532', '135', '231', '319', '423', '534', '337', '503', '106', '302', '326', '308', '442', '125', '108', '524', '309', '311', '306', '331', '127', '316', '522', '313', '527', '416', '211', '516', '216', '519', '330', '536', '530', '539', '225', '203', '214', '428', '228', '315', '541', '338', '238', '312', '217', '422', '114', '529', '213', '317', '103', '100', '111', '327', '303', '325', '132', '226', '518', '323', '418', '241', '339', '345', '415', '304', '336', '514', '307', '101', '500', '321', '332', '218', '123', '538', '406', '222', '322', '542', '335', '200', '204', '417', '109', '117', '421', '115', '236', '310', '208', '227', '314', '234', '138', '136', '437', '128', '348', '320', '334', '301', '300', '209', '110', '439', '329', '438', '444', '427', '305', '520', '434', '240', '229', '124', '543', '412', '318', '140', '224', '537', '107', '445', '206', '501', '219', 

## 8. Move image and label files to appropriate folder:

In [None]:
data_dir = root_dir + 'data/'
for img_name in train_set:
  # Labels dir
  shutil.copy(data_dir + img_name + '.txt', root_dir + 'train/labels/' + img_name + '.txt')
  # Image dir
  img_dir = data_dir + img_name + '.jpg'
  if os.path.exists(img_dir):
    shutil.copy(img_dir, root_dir + 'train/images/' + img_name + '.jpg')
  else:
    # Because img got UPPERCASE extensions, which is .JPG. When we move, we also rename the extension
    shutil.copy(data_dir + img_name + '.JPG', root_dir + 'train/images/' + img_name + '.jpg')


for img_name in valid_set:
  # Labels dir
  shutil.copy(data_dir + img_name + '.txt', root_dir + 'valid/labels/' + img_name + '.txt')
  # Image dir
  img_dir = data_dir + img_name + '.jpg'
  if os.path.exists(img_dir):
    shutil.copy(img_dir, root_dir + 'valid/images/' + img_name + '.jpg')
  else:
    # Because img got UPPERCASE extensions, which is .JPG. When we move, we also rename the extension
    shutil.copy(data_dir + img_name + '.JPG', root_dir + 'valid/images/' + img_name + '.jpg')

## 9. Create a data.yaml file:

In [None]:
num_classes = 5
data_dir = root_dir + 'data.yaml'
with open(data_dir, "w") as dataFile:
  dataFile.write(f"train: {root_dir + 'train'}\n")
  dataFile.write(f"val: {root_dir + 'valid'}\n")
  dataFile.write(f"nc: {num_classes}\n")
  dataFile.write(f"names: ['one', 'two', 'three', 'four', 'five']\n")

In [None]:
%cat /content/drive/MyDrive/hand_gestures/data.yaml

train: /content/drive/MyDrive/hand_gestures/train
val: /content/drive/MyDrive/hand_gestures/valid
nc: 5
names: ['one', 'two', 'three', 'four', 'five']


## 10. Create a config file:

In [None]:
config_dir = root_dir + 'custom_config.yaml'
with open(config_dir, "w") as dataFile:
  dataFile.write(f"""
# YOLOv5 🚀 by Ultralytics, GPL-3.0 license

# Parameters
nc: {num_classes}  # number of classes
depth_multiple: 0.33  # model depth multiple
width_multiple: 0.50  # layer channel multiple
anchors:
  - [10,13, 16,30, 33,23]  # P3/8
  - [30,61, 62,45, 59,119]  # P4/16
  - [116,90, 156,198, 373,326]  # P5/32

# YOLOv5 v6.0 backbone
backbone:
  # [from, number, module, args]
  [[-1, 1, Conv, [64, 6, 2, 2]],  # 0-P1/2
   [-1, 1, Conv, [128, 3, 2]],  # 1-P2/4
   [-1, 3, C3, [128]],
   [-1, 1, Conv, [256, 3, 2]],  # 3-P3/8
   [-1, 6, C3, [256]],
   [-1, 1, Conv, [512, 3, 2]],  # 5-P4/16
   [-1, 9, C3, [512]],
   [-1, 1, Conv, [1024, 3, 2]],  # 7-P5/32
   [-1, 3, C3, [1024]],
   [-1, 1, SPPF, [1024, 5]],  # 9
  ]

# YOLOv5 v6.0 head
head:
  [[-1, 1, Conv, [512, 1, 1]],
   [-1, 1, nn.Upsample, [None, 2, 'nearest']],
   [[-1, 6], 1, Concat, [1]],  # cat backbone P4
   [-1, 3, C3, [512, False]],  # 13

   [-1, 1, Conv, [256, 1, 1]],
   [-1, 1, nn.Upsample, [None, 2, 'nearest']],
   [[-1, 4], 1, Concat, [1]],  # cat backbone P3
   [-1, 3, C3, [256, False]],  # 17 (P3/8-small)

   [-1, 1, Conv, [256, 3, 2]],
   [[-1, 14], 1, Concat, [1]],  # cat head P4
   [-1, 3, C3, [512, False]],  # 20 (P4/16-medium)

   [-1, 1, Conv, [512, 3, 2]],
   [[-1, 10], 1, Concat, [1]],  # cat head P5
   [-1, 3, C3, [1024, False]],  # 23 (P5/32-large)

   [[17, 20, 23], 1, Detect, [nc, anchors]],  # Detect(P3, P4, P5)
  ]

""")

In [None]:
%cat /content/drive/MyDrive/hand_gestures/custom_config.yaml


# YOLOv5 🚀 by Ultralytics, GPL-3.0 license

# Parameters
nc: 5  # number of classes
depth_multiple: 0.33  # model depth multiple
width_multiple: 0.50  # layer channel multiple
anchors:
  - [10,13, 16,30, 33,23]  # P3/8
  - [30,61, 62,45, 59,119]  # P4/16
  - [116,90, 156,198, 373,326]  # P5/32

# YOLOv5 v6.0 backbone
backbone:
  # [from, number, module, args]
  [[-1, 1, Conv, [64, 6, 2, 2]],  # 0-P1/2
   [-1, 1, Conv, [128, 3, 2]],  # 1-P2/4
   [-1, 3, C3, [128]],
   [-1, 1, Conv, [256, 3, 2]],  # 3-P3/8
   [-1, 6, C3, [256]],
   [-1, 1, Conv, [512, 3, 2]],  # 5-P4/16
   [-1, 9, C3, [512]],
   [-1, 1, Conv, [1024, 3, 2]],  # 7-P5/32
   [-1, 3, C3, [1024]],
   [-1, 1, SPPF, [1024, 5]],  # 9
  ]

# YOLOv5 v6.0 head
head:
  [[-1, 1, Conv, [512, 1, 1]],
   [-1, 1, nn.Upsample, [None, 2, 'nearest']],
   [[-1, 6], 1, Concat, [1]],  # cat backbone P4
   [-1, 3, C3, [512, False]],  # 13

   [-1, 1, Conv, [256, 1, 1]],
   [-1, 1, nn.Upsample, [None, 2, 'nearest']],
   [[-1, 4], 1, Concat, [1]]

## 11. Train dataset using YOLOv5:

> Since we don't train model with pre-trained model, the training process is very slow.

In [None]:
%%time
%cd /content/yolov5/

!python train.py --img 416 --batch 32 --epochs 100 --data /content/drive/MyDrive/hand_gestures/data.yaml \
  --cfg /content/drive/MyDrive/hand_gestures/custom_config.yaml --weights '' --name hand_gestures --cache

/content/yolov5
[34m[1mtrain: [0mweights=, cfg=/content/drive/MyDrive/hand_gestures/custom_config.yaml, data=/content/drive/MyDrive/hand_gestures/data.yaml, hyp=data/hyps/hyp.scratch.yaml, epochs=100, batch_size=32, imgsz=416, rect=False, resume=False, nosave=False, noval=False, noautoanchor=False, evolve=None, bucket=, cache=ram, image_weights=False, device=, multi_scale=False, single_cls=False, adam=False, sync_bn=False, workers=8, project=runs/train, name=hand_gestures, exist_ok=False, quad=False, linear_lr=False, label_smoothing=0.0, patience=100, freeze=[0], save_period=-1, local_rank=-1, entity=None, upload_dataset=False, bbox_interval=-1, artifact_alias=latest
[34m[1mgithub: [0mup to date with https://github.com/ultralytics/yolov5 ✅
YOLOv5 🚀 v6.0-163-gd95978a torch 1.10.0+cu111 CUDA:0 (Tesla K80, 11441MiB)

[34m[1mhyperparameters: [0mlr0=0.01, lrf=0.1, momentum=0.937, weight_decay=0.0005, warmup_epochs=3.0, warmup_momentum=0.8, warmup_bias_lr=0.1, box=0.05, cls=0.5, cls

## 12. Inference with trained weight:


> ⚠️ Warning: Don't use a youtube link to detect. Because, Youtube recently removed the dislike features, which makes the library YOLOv5 uses broken.

In [None]:
!python detect.py --weight runs/train/hand_gestures6/weights/best.pt --img 416 --data /content/drive/MyDrive/hand_gestures/data.yaml --conf 0.01 --source /content/drive/MyDrive/hand_gestures/valid/images/104.jpg

[34m[1mdetect: [0mweights=['runs/train/hand_gestures6/weights/best.pt'], source=/content/drive/MyDrive/hand_gestures/valid/images/104.jpg, data=/content/drive/MyDrive/hand_gestures/data.yaml, imgsz=[416, 416], conf_thres=0.01, iou_thres=0.45, max_det=1000, device=, view_img=False, save_txt=False, save_conf=False, save_crop=False, nosave=False, classes=None, agnostic_nms=False, augment=False, visualize=False, update=False, project=runs/detect, name=exp, exist_ok=False, line_thickness=3, hide_labels=False, hide_conf=False, half=False, dnn=False
YOLOv5 🚀 v6.0-163-gd95978a torch 1.10.0+cu111 CUDA:0 (Tesla K80, 11441MiB)

Fusing layers... 
Model Summary: 213 layers, 7023610 parameters, 0 gradients, 15.8 GFLOPs
image 1/1 /content/drive/MyDrive/hand_gestures/valid/images/104.jpg: 416x256 1 two, 2 threes, 1 five, Done. (0.022s)
Speed: 0.3ms pre-process, 22.2ms inference, 1.7ms NMS per image at shape (1, 3, 416, 416)
Results saved to [1mruns/detect/exp16[0m
