# How to Train YOLOv5 to Count Fruit On Trees with synthetic data



This notebook is based on Jacob Solawetz Tutorial on training custom objects with YOLOv5

In order to do so he used the [YOLOv5 repository](https://github.com/ultralytics/yolov5) by [Ultralytics](https://www.ultralytics.com/). 



I hope it helps others trying to handle creating datasets for specific features with out a proper dataset.

The notebook is geared towards training a dataset created of synthetic data to recognize real objects.
You can find the [here](https://medium.com/p/dab5728f6411)

In this notebook I will skim over the issues discussed in
Jacob Solawetz blog post and simply add the needed cells as they are if you want further information regarding the use I will suggest [reading through his blog post](https://blog.roboflow.ai/how-to-train-yolov5-on-a-custom-dataset/)





#Preperation


In [None]:
# clone YOLOv5 repository
!git clone https://github.com/ultralytics/yolov5

In [None]:
%cd yolov5
# install dependencies
!pip install -qr requirements.txt  # as suggested in the tutorial you can ignore errors
import torch

from IPython.display import Image, clear_output  # to display images
from utils.google_utils import gdrive_download  # to download models/datasets

# clear_output()
print('Setup complete. Using torch %s %s' % (torch.__version__, torch.cuda.get_device_properties(0) if torch.cuda.is_available() else 'CPU'))

Make sure that as a result you see a cuda device in my case I got 'Tesla P100-PCIE-16GB'. 

But you may get another GPU such as Tesla K80'

If not go to Runtime -> Change runtime type and change it to GPU 



#downloading the dataset 
If you are using Roboflow set the url and run the next 2 cells.

If you decide to use the dataset in a different way this is where you should set it up

As a result make sure you have 2 librarires 
train: ../train/images
val: ../valid/images

As well as a data.yaml file

In [5]:
#@title Set your Roboflow url here
url = 'https://app.roboflow.com/ds/nisrp51sC0?key=1FIctqMRYN' #@param {type:"string"}

In [None]:
# Export code snippet and paste here
%cd /content
print(url)
!curl -L $url > roboflow.zip; unzip roboflow.zip; rm roboflow.zip

If instead you want to use my Dataset run the following 2 cell



In [None]:
!git clone https://github.com/Amizorach/FruitSyntheticDataset
%cd /content
!unzip FruitSyntheticDataset/dataset/ds.zip 

# Define Model Configuration and Architecture

This is the default setup suggested in  Jacob Solawetz 
I did not change it - but as he points out you can.
I intend to change it and test different configurations later on.

In [12]:
import yaml
%cd /content
with open("data.yaml", 'r') as stream:
    num_classes = str(yaml.safe_load(stream)['nc'])

/content


In [13]:
#customize iPython writefile so we can write variables
from IPython.core.magic import register_line_cell_magic

@register_line_cell_magic
def writetemplate(line, cell):
    with open(line, 'w') as f:
        f.write(cell.format(**globals()))

In [14]:
%%writetemplate /content/yolov5/models/custom_yolov5s.yaml

# parameters
nc: {num_classes}  # number of classes
depth_multiple: 0.33  # model depth multiple
width_multiple: 0.50  # layer channel multiple

# anchors
anchors:
  - [10,13, 16,30, 33,23]  # P3/8
  - [30,61, 62,45, 59,119]  # P4/16
  - [116,90, 156,198, 373,326]  # P5/32

# YOLOv5 backbone
backbone:
  # [from, number, module, args]
  [[-1, 1, Focus, [64, 3]],  # 0-P1/2
   [-1, 1, Conv, [128, 3, 2]],  # 1-P2/4
   [-1, 3, BottleneckCSP, [128]],
   [-1, 1, Conv, [256, 3, 2]],  # 3-P3/8
   [-1, 9, BottleneckCSP, [256]],
   [-1, 1, Conv, [512, 3, 2]],  # 5-P4/16
   [-1, 9, BottleneckCSP, [512]],
   [-1, 1, Conv, [1024, 3, 2]],  # 7-P5/32
   [-1, 1, SPP, [1024, [5, 9, 13]]],
   [-1, 3, BottleneckCSP, [1024, False]],  # 9
  ]

# YOLOv5 head
head:
  [[-1, 1, Conv, [512, 1, 1]],
   [-1, 1, nn.Upsample, [None, 2, 'nearest']],
   [[-1, 6], 1, Concat, [1]],  # cat backbone P4
   [-1, 3, BottleneckCSP, [512, False]],  # 13

   [-1, 1, Conv, [256, 1, 1]],
   [-1, 1, nn.Upsample, [None, 2, 'nearest']],
   [[-1, 4], 1, Concat, [1]],  # cat backbone P3
   [-1, 3, BottleneckCSP, [256, False]],  # 17 (P3/8-small)

   [-1, 1, Conv, [256, 3, 2]],
   [[-1, 14], 1, Concat, [1]],  # cat head P4
   [-1, 3, BottleneckCSP, [512, False]],  # 20 (P4/16-medium)

   [-1, 1, Conv, [512, 3, 2]],
   [[-1, 10], 1, Concat, [1]],  # cat head P5
   [-1, 3, BottleneckCSP, [1024, False]],  # 23 (P5/32-large)

   [[17, 20, 23], 1, Detect, [nc, anchors]],  # Detect(P3, P4, P5)
  ]

Before training lets check the options 




# Training
You can easily check what paramaters Ultralytics accepts running the following cell 




In [6]:
%cd /content/yolov5/
!python train.py -h

/content/yolov5
usage: train.py [-h] [--weights WEIGHTS] [--cfg CFG] [--data DATA] [--hyp HYP]
                [--epochs EPOCHS] [--batch-size BATCH_SIZE]
                [--img-size IMG_SIZE [IMG_SIZE ...]] [--rect]
                [--resume [RESUME]] [--nosave] [--notest] [--noautoanchor]
                [--evolve] [--bucket BUCKET] [--cache-images]
                [--image-weights] [--device DEVICE] [--multi-scale]
                [--single-cls] [--adam] [--sync-bn] [--local_rank LOCAL_RANK]
                [--workers WORKERS] [--project PROJECT] [--entity ENTITY]
                [--name NAME] [--exist-ok] [--quad] [--linear-lr]
                [--label-smoothing LABEL_SMOOTHING] [--upload_dataset]
                [--bbox_interval BBOX_INTERVAL] [--save_period SAVE_PERIOD]
                [--artifact_alias ARTIFACT_ALIAS]

optional arguments:
  -h, --help            show this help message and exit
  --weights WEIGHTS     initial weights path
  --cfg CFG             model.yaml path
 

Lets collect the information needed from what we did up to now.

- **data:** is the data.yaml downloaded in our case from roboflow and should be in the main directory (../data.yaml)

- **weights:** leaving this empty ('') will run yolov5/weights/download_weights.sh and download the config for you

- **img:** You will need to suplly the following according to how you created the dataset - in my case I created images of size (250,250) but converted them using roboflow to (416, 416) and so this is what I supply

- **batch:** choose your batch size - higher batch sizes leads to lower asymptotic test accuracy. you can read more about batchsizes in [Kevin Shen's article]( https://medium.com/mini-distill/effect-of-batch-size-on-training-dynamics-21c14f7a716e#:~:text=Training%20loss%20and%20accuracy%20when,trained%20using%20different%20batch%20sizes.&text=Finding%3A%20higher%20batch%20sizes%20leads,number%20of%20epochs%20of%20training.)

- **epochs:** you should not need many I ran it for 100 epochs but it seemed to have converged after around 40.

- **cfg:** this is where you pick the model configuration I choose yolov5s because of its speed 
you can find more information [here](https://github.com/ultralytics/yolov5#pretrained-checkpoints). In any case all the configurations can be found in yolov5/models

- **name:** choose your result names


My final output was directed by the original notebook and looks like the folowin cell


In [None]:
# time its performance
%%time
%cd /content/yolov5/
!python train.py --img 416 --batch 64 --epochs 100 --data '../data.yaml' --cfg ./models/custom_yolov5s.yaml --weights ''  --cache-images

# Evaluate Custom YOLOv5 Detector Performance

After testing you can find the results in runs/results.txt (unless you changed its name)

You can see the results using Tensorboard

In [None]:

%load_ext tensorboard
%tensorboard --logdir runs

As a result you should find 2 sets of weights in 
runs/train/yolov5s_results/weights/ --
best.pt and last.pt that most probably correspond to their names

We can run infference using the following cell
simply set the folowing:

- **weights:** simply choose runs/train/yolov5s_results/weights/best.pt

- **img:** the image size of the training dataset (not neccarly the same as what you are testing

- **conf:** the confidince you want to use for detection the higher the confidance the more precise a detection needs to be

- **source:** a path to a image (if you suplly a directory it will collect all images in the directory). For startes you can use the test directory




In [None]:
# use the best weights!
%cd /content/yolov5/
!python detect.py --weights runs/train/yolov5s_results/weights/best.pt --img 416 --conf 0.4 --source ../test

After running the code it will print where it saved the images. The message should look similar to 
"Results saved to runs/detect/exp5"

You can simply go to the files saved in this directory and double click a image to see the results


In [None]:
#display inference on ALL test images
#this looks much better with longer training above

import glob
from IPython.display import Image, display

for imageName in glob.glob('/content/yolov5/runs/detect/exp/*.jpg'): #assuming JPG
    display(Image(filename=imageName))
    print("\n")

## Before I finish

I cant stress enough that this is a simplified version of [Roboflow's notebook](https://colab.research.google.com/drive/1gDZ2xcTOgR39tGGs-EZ6i3RTs16wmzZQ#scrollTo=dOPn9wjOAwwK)

It is not as complete and intended mainly for directing from my blog regarding the creation of synthetic data.
I would like to thank them for the orignal notebook an incourage you to check it out