# YOLOv5 on the Udacity Dataset

18 Nov 2021, Alex Denton, AE4824

## Before You Start 
Tutorial: https://blog.roboflow.com/how-to-train-yolov5-on-a-custom-dataset/

Also, notes on scripts and syntax:

- I did <i>not</i> use Jupyter Notebook for this task. I used PyCharm IDE to write my Python code and exectuted <i>train.py</i> from terminal. 

- "!" means that Jupyter (or a .py file) will run that command in a new terminal instance. The terminal instance will be within your virtual environment, but each new "!" is a new terminal. If you need to run multiple commands in one instance, put them on 1 line with ";" separators.

- "python" vs. "python3" depends on your machine's aliasing. If you want to alias "python" to run "python3" instead of your machine's default python release, you can look up how to edit your profile. <i>Do this at your own risk!</i> I have set mine up this way and tend to write "python"...you can safely replace that with "python3" if you're having issues. 

### (needed before making venv and launching jupyter notebook)

Clone YOLOv5 GitHub repo and install requirements.txt in a Python>=3.6.0 environment, including PyTorch>=1.7. Models and datasets download automatically from the latest YOLOv5 release. I'm sucessfully using 3.8 and 3.9 on different machines.

NOTE: PyTorch>=1.9 with new torch.distributed.run is recommended (replaces older torch.distributed.launch commands below). See https://pytorch.org/docs/stable/distributed.html for details.

## Check the local Cuda version

In [1]:
!nvcc --version

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2018 NVIDIA Corporation
Built on Sat_Aug_25_21:08:01_CDT_2018
Cuda compilation tools, release 10.0, V10.0.130


## Check PyTorch version & GPU Compatability
- torch >= 1.9
- CudaDeviceProperties should have something under 'name' - this means it is compatible

In [1]:
import torch
print('torch %s %s' % (torch.__version__, torch.cuda.get_device_properties(0) if torch.cuda.is_available() else 'CPU'))


torch 1.10.1+cu102 _CudaDeviceProperties(name='Tesla V100-DGXS-32GB', major=7, minor=0, total_memory=32485MB, multi_processor_count=80)


## Configure Multi-GPU DistributedDataParallel Mode
https://github.com/ultralytics/yolov5/issues/475

Before specifying GPUs, <a href="https://hsf-training.github.io/hsf-training-ml-gpu-webpage/02-whichgpu/index.html">determine the parameters</a>:




In [2]:
import torch
use_cuda = torch.cuda.is_available()

if use_cuda:
    print('__CUDNN VERSION:', torch.backends.cudnn.version())
    print('__Number CUDA Devices:', torch.cuda.device_count())
    print('__CUDA Device Name:',torch.cuda.get_device_name(0))
    print('__CUDA Device Total Memory [GB]:',torch.cuda.get_device_properties(0).total_memory/1e9)

__CUDNN VERSION: 7605
__Number CUDA Devices: 4
__CUDA Device Name: Tesla V100-DGXS-32GB
__CUDA Device Total Memory [GB]: 34.063712256


Note: DGX1 and DGX4 both report "Number of CUDA Devices: 5" but the correct number to specify is "4"
<br>There are only 4 GPUs. This script might be counting the CPU as an additional CUDA device,<i> but train.py won't run</i> if you say "5"

You will have to pass python the following along with the usual arguments:

like this:

<i>--nproc_per_node</i> specifies how many GPUs you would like to use. In the example above, it is 4.<br>

<i>--batch</i> is the total batch-size. It will be divided evenly to each GPU. In the example above, it is 64/4=16 per GPU.<br>

The code above will use GPUs 0... (N-1).

Notes<br>
- Windows support is untested, Linux is recommended.
- '--batch' must be a multiple of the number of GPUs.
- GPU 0 will take slightly more memory than the other GPUs as it maintains EMA and is responsible for checkpointing etc.

If you get RuntimeError: Address already in use, it could be because you are running multiple trainings at a time. To fix this, simply use a different port number by adding --master_port like below,

## Download the Udacity Self Driving Car Dataset in Yolov5 Pytorch Format:

https://public.roboflow.com/ds/h0zYn5zFuK?key=tRsZIfO1Cg

In [None]:
!curl -L "https://public.roboflow.com/ds/h0zYn5zFuK?key=tRsZIfO1Cg" > roboflow.zip; unzip roboflow.zip; rm roboflow.zip

NOTE: The folder architecture is very important!!<br>

top level<br>
|<br>
|_ yolov5 (contains all py code and this file)<br>
|&emsp;|_ venv()<br>
|&emsp;|_ data.yaml * otherwise specify location<br>
|<br>
|_ train<br>
|&emsp;   |_ images()<br>
|&emsp;   |_ labels()<br>
|<br>
|_ test<br>
|&emsp;   |_ images()<br>
|&emsp;   |_ labels()<br>
|<br>
|_ valid<br>
 &emsp;   |_ images()<br>
 &emsp;   |_ labels()<br>



Alternatively, make sure that the 'data.yaml' point to the correct directories. For instance:

train: ./data_car/train/images <br>
val: ./data_car/valid/images <br>

Also make sure that the '[your name]_yolov5x.yaml' has: <br>

nc:[your number of categories] <br>


<hr border-top: 24px solid #bbb; border-radius: 10px>

#  * * * Execution * * *

## Define model configuration and architecture (needed in runtime):

## Define number of classes based on YAML

In [None]:
import yaml
with open(yolov5.data_car + "/data.yaml", 'r') as stream:
    num_classes = str(yaml.safe_load(stream)['nc'])

## Customize iPython writefile so we can write variables


In [None]:
# customize iPython writefile so we can write variables
from IPython.core.magic import register_line_cell_magic


@register_line_cell_magic
def writetemplate(line, cell):
    with open(line, 'w') as f:
        f.write(cell.format(**globals()))

## Execute Training on RBC Dataset (worked 15 Nov)

Train Custom YOLOv5 Detector - Next, we'll fire off training!
Here, we are able to pass a number of arguments:<br>

img: define input image size<br>
batch: determine batch size<br>
epochs: define the number of training epochs. (Note: often, 3000+ are common here!)<br>
data: set the path to our yaml file<br>
cfg: specify our model configuration<br>
weights: specify a custom path to weights. (Note: you can download weights from the Ultralytics Google Drive folder)<br>
name: result names<br>
nosave: only save the final checkpoint<br>
cache: cache images for faster training<br>

## RBC code:

In [2]:
!python train.py --img 416 --rect --batch 16 --epochs 100 --data ./data_rbc/data.yaml --cfg ./models/customRBC_yolov5x.yaml --weights yolov5x.pt  --cache


  if len(key) is 1:
  if len(key) is 1:
Traceback (most recent call last):
  File "train.py", line 34, in <module>
    import val  # for end-of-epoch mAP
  File "/home/st1/PycharmProjects/yolov5/val.py", line 26, in <module>
    from models.common import DetectMultiBackend
  File "/home/st1/PycharmProjects/yolov5/models/common.py", line 22, in <module>
    from utils.datasets import exif_transpose, letterbox
  File "/home/st1/PycharmProjects/yolov5/utils/datasets.py", line 28, in <module>
    from utils.augmentations import Albumentations, augment_hsv, copy_paste, letterbox, mixup, random_perspective
  File "/home/st1/PycharmProjects/yolov5/utils/augmentations.py", line 12, in <module>
    from utils.general import LOGGER, check_version, colorstr, resample_segments, segment2box
  File "/home/st1/PycharmProjects/yolov5/utils/general.py", line 33, in <module>
    from utils.metrics import box_iou, fitness
  File "/home/st1/PycharmProjects/yolov5/utils/metrics.py", line 10, in <module>
  

## Car Dataset Code

FAILED --batch 16  memory overflowed "There appear to be 6 leaked semaphore objects" (in PyCharm)

FAILED could not run in Jupyter Notebook - transfered verbatim to PyCharm where it worked

FAILED --batch 8 overflowed memory in the same way as before (PyCharm)

FAILED tried running --batch 8, yolov5l.yaml (instead of yolov5x.yaml)

FAILED tried adding multiple GPUs

In [11]:
!python train.py --img 1920 --rect --batch 8 --epochs 10 --data ./data_car/data.yaml --cfg ./models/customCAR_yolov5l.yaml --weights yolov5l.pt  --cache


  if len(key) is 1:
  if len(key) is 1:
Traceback (most recent call last):
  File "train.py", line 34, in <module>
    import val  # for end-of-epoch mAP
  File "/home/st1/PycharmProjects/yolov5/val.py", line 26, in <module>
    from models.common import DetectMultiBackend
  File "/home/st1/PycharmProjects/yolov5/models/common.py", line 22, in <module>
    from utils.datasets import exif_transpose, letterbox
  File "/home/st1/PycharmProjects/yolov5/utils/datasets.py", line 28, in <module>
    from utils.augmentations import Albumentations, augment_hsv, copy_paste, letterbox, mixup, random_perspective
  File "/home/st1/PycharmProjects/yolov5/utils/augmentations.py", line 12, in <module>
    from utils.general import LOGGER, check_version, colorstr, resample_segments, segment2box
  File "/home/st1/PycharmProjects/yolov5/utils/general.py", line 33, in <module>
    from utils.metrics import box_iou, fitness
  File "/home/st1/PycharmProjects/yolov5/utils/metrics.py", line 10, in <module>
  