<img src="https://www.inf.elte.hu/images/department/ik/elte-logo-ik-big-hu-v2.svg?v202011191744" style="max-width:400px;">

### **Subject:** Deep Network Development
### **Assignment 2:** Object Detection
### **Date:** 2022/12/07
### **Created by:** Cordero Pedro 
### **Neptun ID:** LVZWWZ
### **Email:** lvzwwz@student.elte.hu

## **0. Setup**

0.1 Connecting Google Drive with the currently Google Colab notebook.

In [1]:
from google.colab import drive
drive.mount('/content/gdrive')

Mounted at /content/gdrive


0.2 Move to Colab Notebooks folder

In [None]:
%cd /content/gdrive/MyDrive/Colab Notebooks

/content/gdrive/MyDrive/Colab Notebooks


## **1. Import libraries**

Importing all libraries/packages that I believe will help fulfil the task.

1.1 Create a new folder for the project

In [None]:
import os

if not os.path.isdir("CP_A2_DND_OD"):
  os.makedirs("CP_A2_DND_OD")

1.2 Move to the new folder created

In [None]:
%cd CP_A2_DND_OD

/content/gdrive/MyDrive/Colab Notebooks/CP_A2_DND_OD


## **2. Dataset**

2.1 Attributed dataset for object detection: **Synthetic Fruit** 



![example_dataset](https://drive.google.com/uc?export=view&id=1CVR73q6B88KmVH82rEhK-PEmByAQV-z4)



2.2 Annotation tool

**LabelImg** is a graphical image annotation tool. It is written in Python and uses Qt for its graphical interface.

![01_AT](https://drive.google.com/uc?export=view&id=1CXLXX7FbjfVrnvb49NKGcZfMlT9neVl5)

Labelling Synthetics Fruits (this is an example):

![02_AT](https://drive.google.com/uc?export=view&id=1CdGUbAvrdV5t1WpBy5p_bOT_5DoCErH_)

Annotations are saved as YOLO format.

In [7]:
example1 = "/content/gdrive/MyDrive/Colab Notebooks/CP_A2_DND_OD/yolov7/03AT.txt"
file1 = open(example1, "r")
file1.read()

'0.6478365384615384 0.4390909090909091 0.18509615384615385 0.14\n0.6622596153846154 0.13 0.18990384615384615 0.14363636363636365\n0.5997596153846154 0.5918181818181818 0.3389423076923077 0.25636363636363635\n0.11298076923076923 0.7981818181818182 0.17307692307692307 0.13090909090909092\n0.2620192307692308 0.2927272727272727 0.3125 0.23636363636363636\n0.8617788461538461 0.5809090909090909 0.2620192307692308 0.19818181818181818\n0.5432692307692307 0.68 0.14903846153846154 0.11272727272727273\n0.29086538461538464 0.5763636363636364 0.3557692307692308 0.2690909090909091\n0.5204326923076923 0.34454545454545454 0.31971153846153844 0.24181818181818182'

## **3. Fine-tune YOLO**

3.1 Clone YOLO v7 official repository and install dependencies

In [None]:
!git clone https://github.com/WongKinYiu/yolov7.git # clone repo

Cloning into 'yolov7'...
remote: Enumerating objects: 998, done.[K
remote: Total 998 (delta 0), reused 0 (delta 0), pack-reused 998[K
Receiving objects: 100% (998/998), 69.77 MiB | 16.57 MiB/s, done.
Resolving deltas: 100% (466/466), done.


3.2 Get ready yaml file where you specify the information about the dataset

In [8]:
!pwd

/content/gdrive/MyDrive/Colab Notebooks/CP_A2_DND_OD


In [9]:
%cd yolov7/data

/content/gdrive/MyDrive/Colab Notebooks/CP_A2_DND_OD/yolov7/data


In [10]:
fp = open('custom_data.yaml', 'w+')
fp.write("train: ./data/train  # 80 images\nval: ./data/valid  # 20 images\nnc: 1 # number of classes\nnames: [ 'Synthetic_Fruit' ]") # class names")
fp.close()

Our dataset has two folders:


1.   train: 80 images
2.   val: 20 images



3.3 Changing the number of classes in the yolov7.yaml file

In [11]:
!pwd

/content/gdrive/MyDrive/Colab Notebooks/CP_A2_DND_OD/yolov7/data


In [12]:
%cd ..
%cd cfg/training

/content/gdrive/MyDrive/Colab Notebooks/CP_A2_DND_OD/yolov7
/content/gdrive/MyDrive/Colab Notebooks/CP_A2_DND_OD/yolov7/cfg/training


Create a copy of the original configurations of yolov7

In [None]:
import shutil
shutil.copyfile("yolov7.yaml", "yolov7-custom.yaml" )

'yolov7-custom.yaml'

In [None]:
with open('yolov7-custom.yaml', 'r', encoding='utf-8') as file:
    data = file.readlines()
  
print(data)
data[1] = "nc: 1  # number of classes\n"
  
with open('yolov7-custom.yaml', 'w', encoding='utf-8') as file:
    file.writelines(data)

['# parameters\n', 'nc: 80  # number of classes\n', 'depth_multiple: 1.0  # model depth multiple\n', 'width_multiple: 1.0  # layer channel multiple\n', '\n', '# anchors\n', 'anchors:\n', '  - [12,16, 19,36, 40,28]  # P3/8\n', '  - [36,75, 76,55, 72,146]  # P4/16\n', '  - [142,110, 192,243, 459,401]  # P5/32\n', '\n', '# yolov7 backbone\n', 'backbone:\n', '  # [from, number, module, args]\n', '  [[-1, 1, Conv, [32, 3, 1]],  # 0\n', '  \n', '   [-1, 1, Conv, [64, 3, 2]],  # 1-P1/2      \n', '   [-1, 1, Conv, [64, 3, 1]],\n', '   \n', '   [-1, 1, Conv, [128, 3, 2]],  # 3-P2/4  \n', '   [-1, 1, Conv, [64, 1, 1]],\n', '   [-2, 1, Conv, [64, 1, 1]],\n', '   [-1, 1, Conv, [64, 3, 1]],\n', '   [-1, 1, Conv, [64, 3, 1]],\n', '   [-1, 1, Conv, [64, 3, 1]],\n', '   [-1, 1, Conv, [64, 3, 1]],\n', '   [[-1, -3, -5, -6], 1, Concat, [1]],\n', '   [-1, 1, Conv, [256, 1, 1]],  # 11\n', '         \n', '   [-1, 1, MP, []],\n', '   [-1, 1, Conv, [128, 1, 1]],\n', '   [-3, 1, Conv, [128, 1, 1]],\n', '   [-

3.4 Choosing the performance of YOLO v7

We choose **pre-trained model yolov7** for our training:

![yolov7_model](https://drive.google.com/uc?export=view&id=1ChNKzXHlUgGOzQswbk1xe_IZvNbRVbKI)


In [12]:
%cd /content/gdrive/MyDrive/Colab Notebooks/CP_A2_DND_OD/yolov7

/content/gdrive/MyDrive/Colab Notebooks/CP_A2_DND_OD/yolov7


Let's download the pre-trained model:

In [None]:
!wget https://github.com/WongKinYiu/yolov7/releases/download/v0.1/yolov7.pt

--2022-11-23 22:12:43--  https://github.com/WongKinYiu/yolov7/releases/download/v0.1/yolov7.pt
Resolving github.com (github.com)... 20.205.243.166
Connecting to github.com (github.com)|20.205.243.166|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://objects.githubusercontent.com/github-production-release-asset-2e65be/511187726/b0243edf-9fb0-4337-95e1-42555f1b37cf?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAIWNJYAX4CSVEH53A%2F20221123%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20221123T221243Z&X-Amz-Expires=300&X-Amz-Signature=cf6c45aa2fe6fadcd0db0481b3d718b8ea8c336ad0a4825927cf519cf97bb5fb&X-Amz-SignedHeaders=host&actor_id=0&key_id=0&repo_id=511187726&response-content-disposition=attachment%3B%20filename%3Dyolov7.pt&response-content-type=application%2Foctet-stream [following]
--2022-11-23 22:12:43--  https://objects.githubusercontent.com/github-production-release-asset-2e65be/511187726/b0243edf-9fb0-4337-95e1-42555f1b37cf?X-Amz-Algorithm=A

## **4. Training**

4.1 Define the following recommended hype-parameters:

*   lr0: 0.01  # initial learning rate (SGD=1E-2, Adam=1E-3)
*   lrf: 0.1  # final OneCycleLR learning rate (lr0 * lrf)
*   momentum: 0.937  # SGD momentum/Adam beta1
*   weight_decay: 0.0005  # optimizer weight decay 5e-4
*   warmup_epochs: 3.0  # warmup epochs (fractions ok)
*   warmup_momentum: 0.8  # warmup initial momentum
*   warmup_bias_lr: 0.1  # warmup initial bias lr
*   box: 0.05  # box loss gain
*   cls: 0.3  # cls loss gain
*   cls_pw: 1.0  # cls BCELoss positive_weight
*   obj: 0.7  # obj loss gain (scale with pixels)
*   obj_pw: 1.0  # obj BCELoss positive_weight
*   iou_t: 0.20  # IoU training threshold
*   anchor_t: 4.0  # anchor-multiple threshold
*   fl_gamma: 0.0  # focal loss gamma (efficientDet default gamma=1.5)
*   hsv_h: 0.015  # image HSV-Hue augmentation (fraction)
*   hsv_s: 0.7  # image HSV-Saturation augmentation (fraction)
*   hsv_v: 0.4  # image HSV-Value augmentation (fraction)
*   degrees: 0.0  # image rotation (+/- deg)
*   translate: 0.2  # image translation (+/- fraction)
*   scale: 0.5  # image scale (+/- gain)
*   shear: 0.0  # image shear (+/- deg)
*   perspective: 0.0  # image perspective (+/- fraction), range 0-0.001
*   flipud: 0.0  # image flip up-down (probability)
*   fliplr: 0.5  # image flip left-right (probability)
*   mosaic: 1.0  # image mosaic (probability)
*   mixup: 0.0  # image mixup (probability)
*   copy_paste: 0.0  # image copy paste (probability)
*   paste_in: 0.0  # image copy paste (probability), use 0 for faster training
*   loss_ota: 1 # use ComputeLossOTA, use 0 for faster training

4.2 Train the model

In [None]:
!python train.py --device 0 --batch-size 16 --epochs 100 --img 640 640 --data data/custom_data.yaml --hyp data/hyp.scratch.custom.yaml --cfg cfg/training/yolov7-custom.yaml --weights yolov7.pt --name yolov7-custom

YOLOR 🚀 2022-11-23 torch 1.12.1+cu113 CUDA:0 (Tesla T4, 15109.75MB)

Namespace(adam=False, artifact_alias='latest', batch_size=16, bbox_interval=-1, bucket='', cache_images=False, cfg='cfg/training/yolov7-custom.yaml', data='data/custom_data.yaml', device='0', entity=None, epochs=100, evolve=False, exist_ok=False, freeze=[0], global_rank=-1, hyp='data/hyp.scratch.custom.yaml', image_weights=False, img_size=[640, 640], label_smoothing=0.0, linear_lr=False, local_rank=-1, multi_scale=False, name='yolov7-custom', noautoanchor=False, nosave=False, notest=False, project='runs/train', quad=False, rect=False, resume=False, save_dir='runs/train/yolov7-custom4', save_period=-1, single_cls=False, sync_bn=False, total_batch_size=16, upload_dataset=False, v5_metric=False, weights='yolov7.pt', workers=8, world_size=1)
[34m[1mtensorboard: [0mStart with 'tensorboard --logdir runs/train', view at http://localhost:6006/
[34m[1mhyperparameters: [0mlr0=0.01, lrf=0.1, momentum=0.937, weight_decay=0.

## **5. Results**

5.1 Curve generated from confidence&recall on the scale of 0 to 1 for Synthetic_Fruit class.

![r_curve](https://drive.google.com/uc?export=view&id=1FXylIFX2Hxt-tNuRfLoB9GC7A2y1iPnz)


5.2 Curve generated from confidence&precision on the scale of 0 to 1 for Synthetic_Fruit class.

![p_curve](https://drive.google.com/uc?export=view&id=1Fm1YAPDDWRVWv4JGxIkKHO5t-7Y_LoS_)


5.3 Curve generated from confidence&F1 on the scale of 0 to 1 for Synthetic_Fruit class.

![f1_curve](https://drive.google.com/uc?export=view&id=1FBkLmPRzofcP2vnGcCjXHUVTG9wcJwZa)


5.4 Confusion matrix graph generated from our algorithm.

![conf](https://drive.google.com/uc?export=view&id=1FoLff26kFz8epcn69xJhnxXEY8m8x9Ef)


5.5 Curves generated from the pre-trained model in every epoch (100).

![all_curve](https://drive.google.com/uc?export=view&id=1FrOA8vOM1D58lYZsrdVTk76IsjvzgLG6)

5.6 Example of train batch

![train_batch](https://drive.google.com/uc?export=view&id=1Dz7LQbprL_7IEcUxxk6HM3mfZJ1qmPHm)

5.7 Example of test batch label

![train_batch](https://drive.google.com/uc?export=view&id=1F7qGwxS3JcVhc5SyiohKD6dPAxheVVXt)

5.8 Example of test batch prediction

![test_batch](https://drive.google.com/uc?export=view&id=1FUM8rqJsOlQfOS7htkoTyouY2z2TbeUd)

## **6. Predictions**

detect.py can be run on images and videos. The three most important parameters of the detect function are the image size (it can be different from the training size, as we discussed), confidence level (estimated probability), IoU threshold (merge nearby predictions)

6.1 Example 1 of image prediction

In [None]:
!python detect.py --weights runs/train/yolov7-custom4/weights/best.pt --conf 0.5 --img-size 640 --source 1.jpg --no-trace

Namespace(agnostic_nms=False, augment=False, classes=None, conf_thres=0.5, device='', exist_ok=False, img_size=640, iou_thres=0.45, name='exp', no_trace=True, nosave=False, project='runs/detect', save_conf=False, save_txt=False, source='1.jpg', update=False, view_img=False, weights=['runs/train/yolov7-custom4/weights/best.pt'])
YOLOR 🚀 2022-11-23 torch 1.12.1+cu113 CUDA:0 (Tesla T4, 15109.75MB)

Fusing layers... 
RepConv.fuse_repvgg_block
RepConv.fuse_repvgg_block
RepConv.fuse_repvgg_block
IDetect.fuse
Model Summary: 314 layers, 36481772 parameters, 6194944 gradients
  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]
1 Synthetic_Fruit, Done. (25.8ms) Inference, (1.7ms) NMS
 The image with the result is saved in: runs/detect/exp4/1.jpg
Done. (1.172s)


![01predic](https://drive.google.com/uc?export=view&id=1G7lu5M5IxI6K847SP3eyxe73iKQWfaLE)

6.2 Example 2 of image prediction

In [None]:
!python detect.py --weights runs/train/yolov7-custom4/weights/best.pt --conf 0.5 --img-size 640 --source 2.jpg --no-trace

Namespace(agnostic_nms=False, augment=False, classes=None, conf_thres=0.5, device='', exist_ok=False, img_size=640, iou_thres=0.45, name='exp', no_trace=True, nosave=False, project='runs/detect', save_conf=False, save_txt=False, source='2.jpg', update=False, view_img=False, weights=['runs/train/yolov7-custom4/weights/best.pt'])
YOLOR 🚀 2022-11-23 torch 1.12.1+cu113 CUDA:0 (Tesla T4, 15109.75MB)

Fusing layers... 
RepConv.fuse_repvgg_block
RepConv.fuse_repvgg_block
RepConv.fuse_repvgg_block
IDetect.fuse
Model Summary: 314 layers, 36481772 parameters, 6194944 gradients
  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]
2 Synthetic_Fruits, Done. (18.5ms) Inference, (1.4ms) NMS
 The image with the result is saved in: runs/detect/exp/2.jpg
Done. (1.033s)


![02predic](https://drive.google.com/uc?export=view&id=1GAmuPhikE2J1LAE1heSFEftDLsqbgdzG)

6.3 Example 3 of video prediction

In [15]:
!python detect.py --weights runs/train/yolov7-custom4/weights/best.pt --conf 0.5 --img-size 640 --source 04video.mp4 --no-trace

Namespace(agnostic_nms=False, augment=False, classes=None, conf_thres=0.5, device='', exist_ok=False, img_size=640, iou_thres=0.45, name='exp', no_trace=True, nosave=False, project='runs/detect', save_conf=False, save_txt=False, source='04video.mp4', update=False, view_img=False, weights=['runs/train/yolov7-custom4/weights/best.pt'])
YOLOR 🚀 2022-11-23 torch 1.12.1+cu113 CUDA:0 (Tesla T4, 15109.75MB)

Fusing layers... 
RepConv.fuse_repvgg_block
RepConv.fuse_repvgg_block
RepConv.fuse_repvgg_block
IDetect.fuse
Model Summary: 314 layers, 36481772 parameters, 6194944 gradients
  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]
video 1/1 (1/597) /content/gdrive/MyDrive/Colab Notebooks/CP_A2_DND_OD/yolov7/04video.mp4: 2 Synthetic_Fruits, Done. (16.0ms) Inference, (1.3ms) NMS
video 1/1 (2/597) /content/gdrive/MyDrive/Colab Notebooks/CP_A2_DND_OD/yolov7/04video.mp4: 2 Synthetic_Fruits, Done. (16.0ms) Inference, (0.9ms) NMS
video 1/1 (3/597) /content/gdrive/MyDrive/Colab Not

![04predic](https://drive.google.com/uc?export=view&id=1CzGyZF4o4ntQciSaA-yS9F-_oL1Bocxy)



## **7. Conclusions**

Please add some explanation about your results (what went wrong, why did you chose a certain method, ....) and how to possibly improve

*   I consider that YOLOV7 is the best choice among the YOLO versions currently, because this one infers faster and with greater accuracy than its previous versions. 

*   Also, within the YOLOV7 models, I first chose the model "YOLOV7-X" but the actual model "YOLOV7" allowed me to predict bounding boxes more accurately. (section 3.4)

*   The Results satisfy the objective of object detection. Training and validation losses and accuracies decrease and increase, respectively. (section 5.5)

*   The established hyperparameters generate a correct training, however a correct implementation of a technique for avoiding underfitting / overfitting, would be very important.

*   I am sure that if I increase the number of images and separate by different classes of synthetic fruits in the dataset and with the early stopping method, the pre-trained YOLOV7 model will generate excellent results and improve this project.

<a align="left" href="https://github.com/WongKinYiu/yolov7/blob/072f76c72c641c7a1ee482e39f604f6f8ef7ee92/figure/performance.png" target="_blank">

Source: This is the official **YOLOv7** is freely available for redistribution under the [GPL-3.0 license](https://choosealicense.com/licenses/gpl-3.0/).