# Custom YOLOv5 Train and Deploy locally on Amazon SageMaker Studio

In this notebook we will train a custom YOLOv5 object detection CV model within Amazon SageMaker Studio. 

**Steps:**

0. Initial configuration.
1. Create a labeling job in Amazon SageMaker GroundTruth.
2. Download images and labels from the labeling job.
3. Train the custom YOLOv5 model.
4. Make inferences with the created model. 


## 0. Initial Configuration

In [None]:
!git clone --quiet https://github.com/ultralytics/yolov5
!pip install -r yolov5/requirements.txt

In [3]:
import json
import numpy
import torch 
import os
import boto3
from sklearn.model_selection import train_test_split
s3 = boto3.resource('s3')

  from .autonotebook import tqdm as notebook_tqdm


In [4]:
dirs = ["training_data/images/train", "training_data/labels/train",
            "training_data/images/validation", "training_data/labels/validation"]

In [5]:
for directory in dirs:
    !mkdir -p {directory}

## 1. Create a labeling job in Amazon SageMaker GroundTruth.

In [6]:
##TODO

## 2. Download images and labels from the labeling job.

#### First we have to download the annotation manifest generated by Amazon SageMaker GroundTruth

In [6]:
gt_job_name = "OB-Test-1"
gt_output_manifest_bucket = "buzecd-aiml-demos" # name of the bucket
gt_output_manifest_file = "obb-uc2/training-images/OB-Test-1/manifests/output/output.manifest" # include prefix in the path
labels = ["stop", "pedestrian"]
s3.meta.client.download_file(gt_output_manifest_bucket, gt_output_manifest_file, 'gt_manifest.txt') # download the manifest to your local environment

#### Next, we are going to split our images into two sets, training and validation

In [7]:
with open('gt_manifest.txt') as file:
    lines = file.readlines()
    data = numpy.array(lines)
    train_data, validation_data = train_test_split(data, test_size=0.2)
print("The manifest contains {} annotations.".format(len(data)))
print("{} will be used for training.".format(len(train_data)))
print("{} will be used for validation.".format(len(validation_data)))

The manifest contains 211 annotations.
168 will be used for training.
43 will be used for validation.


#### Now we have our 2 datasets, lets download the images and create the annotation files in YOLO friendly format

In [8]:
def ground_truth_to_yolo(gt_manifest_data, dataset_category):
    print("Downloading images and labels for the {} dataset".format(dataset_category))
    for line in gt_manifest_data:
        line = json.loads(line)
        uri = line["source-ref"]
        file_bucket = uri.split("/")[2]
        file_image_name = uri.split("/")[-1]
        file_txt_name = '.'.join(file_image_name.split(".")[:-1]) + ".txt"
        file_txt_path = "training_data/labels/{}/{}".format(dataset_category, file_txt_name)
        file_path = '/'.join(uri.split("/")[3:])
        # Download image
        s3.meta.client.download_file(file_bucket, file_path, "training_data/images/{}/{}".format(dataset_category,file_image_name))
        # Create txt with annotations
        with open(file_txt_path, 'w') as target:
            for annotation in line[gt_job_name]["annotations"]:
                class_id = annotation["class_id"]
                center_x = (annotation["left"] + (annotation["width"]/2)) / line[gt_job_name]["image_size"][0]["width"]
                center_y = (annotation["top"] + (annotation["height"]/2)) / line[gt_job_name]["image_size"][0]["height"]
                w = annotation["width"] / line[gt_job_name]["image_size"][0]["width"]
                h = annotation["height"] / line[gt_job_name]["image_size"][0]["height"]
                data = "{} {} {} {} {}\n".format(class_id, center_x, center_y, w, h)
                target.write(data)

In [9]:
ground_truth_to_yolo(train_data, "train")
ground_truth_to_yolo(validation_data, "validation")

Downloading images and labels for the train dataset
Downloading images and labels for the validation dataset


#### Lets make sure there are the same number of elements in our directories

In [10]:
def count_files(dirs):
    for directory in dirs:
        number = len([1 for x in list(os.scandir(directory)) if x.is_file()])
        print("There are {} elements in {}".format(number, directory))


count_files(dirs)

There are 168 elements in training_data/images/train
There are 168 elements in training_data/labels/train
There are 43 elements in training_data/images/validation
There are 43 elements in training_data/labels/validation


#### Now let's add these data sources to the data library in the yolov5 folder for our model to train

In [11]:
with open("yolov5/data/custom-model.yaml", 'w') as target:
    target.write("path: ../training_data\n")
    target.write("train: images/train\n")
    target.write("val: images/validation\n")
    target.write("names:\n")
    for i, label in enumerate(labels):
        target.write("  {}: {}\n".format(i, label))
        
with open('yolov5/data/custom-model.yaml') as file:
    lines = file.readlines()
    for line in lines:
        print(line)

path: ../training_data

train: images/train

val: images/validation

names:

  0: stop

  1: pedestrian



## 3. Train the custom YOLOv5 model.

In [12]:
!python yolov5/train.py --workers 4 --device 0 --img 640 --batch 16 --epochs 50 --data yolov5/data/custom-model.yaml --weights yolov5s.pt --cache

  warn(f"Failed to load image Python extension: {e}")
[34m[1mtrain: [0mweights=yolov5s.pt, cfg=, data=yolov5/data/custom-model.yaml, hyp=yolov5/data/hyps/hyp.scratch-low.yaml, epochs=50, batch_size=16, imgsz=640, rect=False, resume=False, nosave=False, noval=False, noautoanchor=False, noplots=False, evolve=None, bucket=, cache=ram, image_weights=False, device=0, multi_scale=False, single_cls=False, optimizer=SGD, sync_bn=False, workers=4, project=yolov5/runs/train, name=exp, exist_ok=False, quad=False, cos_lr=False, label_smoothing=0.0, patience=100, freeze=[0], save_period=-1, seed=0, local_rank=-1, entity=None, upload_dataset=False, bbox_interval=-1, artifact_alias=latest
[34m[1mgithub: [0mup to date with https://github.com/ultralytics/yolov5 ✅
fatal: cannot change to '/root/Object': No such file or directory
YOLOv5 🚀 2022-9-21 Python-3.8.10 torch-1.10.2+cu113 CUDA:0 (Tesla T4, 15110MiB)

[34m[1mhyperparameters: [0mlr0=0.01, lrf=0.01, momentum=0.937, weight_decay=0.0005, war

## 4. Make inferences with the created model.

In [13]:
!python yolov5/detect.py --weights yolov5/runs/train/exp/weights/best.pt --img 640 --conf 0.5 --source street3.mp4

  warn(f"Failed to load image Python extension: {e}")
[34m[1mdetect: [0mweights=['yolov5/runs/train/exp/weights/best.pt'], source=street3.mp4, data=yolov5/data/coco128.yaml, imgsz=[640, 640], conf_thres=0.5, iou_thres=0.45, max_det=1000, device=, view_img=False, save_txt=False, save_conf=False, save_crop=False, nosave=False, classes=None, agnostic_nms=False, augment=False, visualize=False, update=False, project=yolov5/runs/detect, name=exp, exist_ok=False, line_thickness=3, hide_labels=False, hide_conf=False, half=False, dnn=False, vid_stride=1
fatal: cannot change to '/root/Object': No such file or directory
YOLOv5 🚀 2022-9-21 Python-3.8.10 torch-1.10.2+cu113 CUDA:0 (Tesla T4, 15110MiB)

Fusing layers... 
[2022-09-21 12:30:48.268 pytorch-1-10-gpu-py-ml-g4dn-xlarge-5086b554a12da40ba14f4b244605:2425 INFO utils.py:27] RULE_JOB_STOP_SIGNAL_FILENAME: None
[2022-09-21 12:30:48.403 pytorch-1-10-gpu-py-ml-g4dn-xlarge-5086b554a12da40ba14f4b244605:2425 INFO profiler_config_parser.py:111] Una