# Transfer Learning and Hyperparameter Optimization for YOLOv5 using Amazon SageMaker

---

This notebook's CI test result for us-west-2 is as follows. CI test results in other regions can be found at the end of the notebook. 

![This us-west-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/us-west-2/advanced_functionality|pytorch_yolov5_training_and_hpo|transfer_learning_and_hpo_yolov5.ipynb)

---

In Computer Vision (CV), object detection is a technique where a model predicts the presence of objects and locates them in an image using bounding boxes. YOLO (You Only Look Once) belongs to the family of models used for object detection.

There are two approaches that are commonly adopted for this task: a two-step approach and a single shot approach. [YOLO](https://arxiv.org/abs/1506.02640) is a single shot approach to object detection.

This notebook takes you through how to use Amazon SageMaker for transfer learning a YOLOv5 model on custom data. In this notebook you use AWS [Deep Learning Containers (DLC)](https://github.com/aws/deep-learning-containers/blob/master/available_images.md) to customize your own training container. You use transfer learning to train a YOLOv5 model to detect random objects from a home (custom data), mostly from bathroom, kitchen and living-room environments. If you are using Amazon SageMaker Notebook instance to run this notebook, you should use the **Conda Python 3** kernel, if you are in a SageMaker Studio environment use the **Python 3 Data Science** kernel.

You also use Amazon SageMaker's Automated Model Tuning feature to perform hyperparameter optimization to arrive at the best possible model.

In this notebook you use the Caltech Home Objects dataset [[1]Moreels and Perona, “Caltech Home Objects 2006”. CaltechDATA, Apr. 07, 2022. doi: 10.22002/D1.20089.](https://data.caltech.edu/records/bckkv-8my10). 

Broadly, you cover the following topics in this notebook:

1. [Pre-requisites if you are in a SageMaker Studio environment](#Pre-requisites-if-you-are-in-a-SageMaker-Studio-environment)

2. [Data Preparation](#Data-Preparation)
    
    a. [Converting from MATLAB to .txt](#Converting-from-MATLAB-to-.txt)
    
    b. [Converting from Amazon SageMaker Ground Truth bounding box to .txt [OPTIONAL]](#Converting-from-Amazon-SageMaker-Ground-Truth-bounding-box-to-.txt-[OPTIONAL])
    
    c. [Training and validation sets](#Training-and-validation-sets)
    
    
3. [Set up the PyTorch environment for training](#Set-up-the-PyTorch-environment-for-training)

    a. [Customizing a DLC for training](#Customising-a-DLC-for-training)
    
    b. [Get and then upload weights to S3 location](#Get,-and-then-upload-weights-to-a-S3-location)
    

4. [Training](#Training)


5. [Hyperparameter Optimization with Amazon SageMaker Automated Model Tuning](#Hyperparameter-Optimization-with-Amazon-SageMaker-Automated-Model-Tuning)


6. [Conclusion : Deploying YOLOv5 on Amazon SageMaker](#Conclusion-:-Deploying-YOLOv5-on-Amazon-SageMaker)



You can download the dataset you will use in this notebook, you can do this in the [Data Preparation](#Data-Preparation) section below. Alternatively, you can download the dataset from [this page](https://data.caltech.edu/records/bckkv-8my10). If you download the dataset yourself, make sure you upload the downloaded file ```Home_Objects_06.zip``` to your working directory on the notebook instance.

## Pre-requisites if you are in a SageMaker Studio environment

**Check your SageMaker execution role**, ensure that you have the following trust policy with AWS CodeBuild in place:

```
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Service": [
          "codebuild.amazonaws.com"
        ]
      },
      "Action": "sts:AssumeRole"
    }
  ]
}
```
Use this guidance on [editing trust relationship for an existing role](https://docs.aws.amazon.com/directoryservice/latest/admin-guide/edit_trust.html) to edit the trust policy for your SageMaker execution role.

Add this inline policy, use this guidance from the AWS IAM documentation to [add a permission policy inline](https://docs.aws.amazon.com/IAM/latest/UserGuide/access_policies_manage-attach-detach.html#add-policies-console) to your SageMaker execution role.

```
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "codebuild:DeleteProject",
                "codebuild:CreateProject",
                "codebuild:BatchGetBuilds",
                "codebuild:StartBuild"
            ],
            "Resource": "arn:aws:codebuild:*:*:project/sagemaker-studio*"
        },
        {
            "Effect": "Allow",
            "Action": "logs:CreateLogStream",
            "Resource": "arn:aws:logs:*:*:log-group:/aws/codebuild/sagemaker-studio*"
        },
        {
            "Effect": "Allow",
            "Action": [
                "logs:GetLogEvents",
                "logs:PutLogEvents"
            ],
            "Resource": "arn:aws:logs:*:*:log-group:/aws/codebuild/sagemaker-studio*:log-stream:*"
        },
        {
            "Effect": "Allow",
            "Action": "logs:CreateLogGroup",
            "Resource": "*"
        },
        {
            "Effect": "Allow",
            "Action": [
                "ecr:CreateRepository",
                "ecr:BatchGetImage",
                "ecr:CompleteLayerUpload",
                "ecr:DescribeImages",
                "ecr:DescribeRepositories",
                "ecr:UploadLayerPart",
                "ecr:ListImages",
                "ecr:InitiateLayerUpload",
                "ecr:BatchCheckLayerAvailability",
                "ecr:PutImage"
            ],
            "Resource": "arn:aws:ecr:*:*:repository/sagemaker-studio*"
        },
        {
            "Effect": "Allow",
            "Action": "ecr:GetAuthorizationToken",
            "Resource": "*"
        },
        {
            "Effect": "Allow",
            "Action": [
              "s3:GetObject",
              "s3:DeleteObject",
              "s3:PutObject"
              ],
            "Resource": "arn:aws:s3:::sagemaker-*/*"
        },
        {
            "Effect": "Allow",
            "Action": [
                "s3:CreateBucket"
            ],
            "Resource": "arn:aws:s3:::sagemaker*"
        },
        {
            "Effect": "Allow",
            "Action": [
                "iam:GetRole",
                "iam:ListRoles"
            ],
            "Resource": "*"
        },
        {
            "Effect": "Allow",
            "Action": "iam:PassRole",
            "Resource": "arn:aws:iam::*:role/*",
            "Condition": {
                "StringLikeIfExists": {
                    "iam:PassedToService": "codebuild.amazonaws.com"
                }
            }
        }
    ]
}
```

You are now ready to proceed with executing the rest of this notebook.

## Data Preparation

In [None]:
import sagemaker

sagemaker_session = sagemaker.Session()

bucket = sagemaker_session.default_bucket()
prefix = "sagemaker/Sample-pytorch-YOLOv5-HO-C3-Detection"

role = sagemaker.get_execution_role()
print("Bucket Name: {} and the role is {}".format(bucket, role))

In [None]:
import scipy.io
import os
import sys
import json
import boto3
import random
import matplotlib.pyplot as plt
import numpy as np
from botocore.exceptions import ClientError
from sagemaker import image_uris
from zipfile import ZipFile
from sklearn.model_selection import train_test_split
from PIL import Image, ImageDraw

In [None]:
SAGEMAKERWDIR = os.getcwd() + "/"
DATASETLOCALBASE = SAGEMAKERWDIR + "data"
ANNOTATIONSPATH = DATASETLOCALBASE + "/" + "Home_Objects_06/Train/Gtruth"
YOLO5ANNOTATIONS = ANNOTATIONSPATH + "/YOLOv5"
IMAGESPATH = DATASETLOCALBASE + "/" + "Home_Objects_06/Train"

TESTANNOTATIONSPATH = DATASETLOCALBASE + "/" + "Home_Objects_06/Test/Gtruth"
TESTYOLO5ANNOTATIONS = TESTANNOTATIONSPATH + "/YOLOv5"
TESTIMAGESPATH = DATASETLOCALBASE + "/" + "Home_Objects_06/Test"

In [None]:
## Some utility functions to get the data ready for training.


def unzipdataset(wrkdir, archname):
    with ZipFile(archname, "r") as arch:
        if not os.path.exists(DATASETLOCALBASE):
            os.makedirs(DATASETLOCALBASE)
        arch.extractall(path=DATASETLOCALBASE)
        print("Done extracting zip archive!")


def createClassMap(annots=ANNOTATIONSPATH):
    fllist = (
        file
        for file in os.listdir(ANNOTATIONSPATH)
        if os.path.isfile(os.path.join(ANNOTATIONSPATH, file))
    )
    classlist = []
    for i, fl in enumerate(fllist):
        classlist.append({"name": fl, "clas": i})
    return classlist


def readannotfile(givenfile, silent=False):
    """
    This function is good only for the test dataset.
    This is because, for this dataset, the test dataset
    keys are slightly different.
    givenfile: full path to <filename>.JPG.mat
    """
    if os.path.isdir(givenfile):
        return False, 0.0, 0.0, 0.0, 0.0
    annotations = scipy.io.loadmat(givenfile)
    filesplit = givenfile.split("/")[-1]
    ante, post = filesplit.split(".")[0], filesplit.split(".")[1]
    doesimageexist = os.path.exists(IMAGESPATH + "/" + ante + "." + post)
    coords = []
    for coord in annotations["outline"][0][0][0]:
        coords.append(coord)
    if silent == False:
        print(coords)
        print("Checking if we have the corresponding image file")
        if doesimageexist:
            print("Corresponding image file exists.")
        else:
            print("Please check, corresponding image file does not exist.")
    topleft, topright, bottomright, bottomleft = coords[0], coords[1], coords[2], coords[3]
    return doesimageexist, topleft, topright, bottomright, bottomleft


def bounding_box_old_annotations(imagefile, annotationfile):
    """
    Again, good for the training dataset.
    imagefile: full or relative path to image file (.JPG)
    annotationfile: full or relative path to the annotation file i.e. the *.mat file
    """
    ## Get size of the image file
    img = Image.open(imagefile)
    ## Get annotations
    die, tl, _, br, _ = readannotfile(annotationfile, silent=True)
    width, height = img.size
    plotted_image = ImageDraw.Draw(img)
    plotted_image.rectangle(((tl[0], tl[1]), (br[0], br[1])), outline="red", width=2)
    plt.imshow(np.array(img))
    plt.show()


def plot_bounding_yolov5_annotations(imagefile, annotationfile):
    """
    imagefile: full path to the image file *.JPG
    annotationfile: full path to the *.mat file
    """
    assert os.path.exists(imagefile)
    image = Image.open(imagefile)

    assert os.path.exists(imagefile)
    f = open(annotationfile, "r")
    bx = ""
    for line in f:
        bx = line  # We want to teest a single box in any image
    f.close()

    annotation_list = bx.strip().split(" ")

    annotations = np.array(annotation_list, dtype=np.float64)

    w, h = image.size

    plotted_image = ImageDraw.Draw(image)

    chged_annots = np.array(annotations)
    chged_annots[1] = annotations[1] * w
    chged_annots[2] = annotations[2] * h
    chged_annots[3] = annotations[3] * w
    chged_annots[4] = annotations[4] * h
    chged_annots[1] = chged_annots[1] - chged_annots[3] / 2  # xmin
    chged_annots[2] = chged_annots[2] - chged_annots[4] / 2  # ymin
    chged_annots[3] = chged_annots[1] + chged_annots[3]  # xmax
    chged_annots[4] = chged_annots[2] + chged_annots[4]  # ymax

    xmin, ymin, xmax, ymax = chged_annots[1], chged_annots[2], chged_annots[3], chged_annots[4]

    plotted_image.rectangle(((xmin, ymin), (xmax, ymax)), outline="red", width=2)

    plt.imshow(np.array(image))
    plt.show()


def convertMATLABToYolov5(annotationfile, imagefile):
    """
    annotationsfile: full or relative path to the annotation file i.e. the *.mat file
    imagefile: full or relative path to the image file
    """
    clas, xcenter, ycenter, width, height = 0, 0.0, 0.0, 0.0, 0.0
    width, height = Image.open(imagefile).size
    ## Get the class index
    cl = next((item for item in CLASSENUM if item["name"] == annotationfile.split("/")[-1]), None)
    if cl != None:
        clas = cl["clas"]
        yolofilename = cl["name"].split(".")[0] + ".txt"
        _, tl, _, _, _ = readannotfile(annotationfile, silent=True)
        xmin, ymin = tl[0], tl[1]
        _, _, _, br, _ = readannotfile(annotationfile, silent=True)
        xmax, ymax = br[0], br[1]
        xcenter = (xmin + (xmax - xmin) / 2.0) / width
        ycenter = (ymin + (ymax - ymin) / 2.0) / height
        box_width = (xmax - xmin) / width
        box_height = (ymax - ymin) / height

        ## writing to the new annotation file
        if not os.path.exists(YOLO5ANNOTATIONS):
            os.makedirs(YOLO5ANNOTATIONS)
        f = open(YOLO5ANNOTATIONS + "/" + yolofilename, "w")
        print("Writing to {}".format(YOLO5ANNOTATIONS + "/" + yolofilename))
        original_stdout = sys.stdout
        sys.stdout = f
        print("{} {} {} {} {}".format(clas, xcenter, ycenter, box_width, box_height))
        sys.stdout = original_stdout
        f.close()

Get the dataset for this notebook, you run the cell that follows. You can also download the dataset directly as stated in the beginning of this notebook.

In [None]:
dataset_client = boto3.client("s3")
dataset_client.download_file(
    f"sagemaker-sample-files",
    "datasets/image/caltech-home-objects-2006/Home_Objects_06.zip",
    "Home_Objects_06.zip",
)

After you download the dataset, we unzip the bundle we just downloaded.

In [None]:
unzipdataset(SAGEMAKERWDIR, "Home_Objects_06.zip")

You will now read one of the annotation files. These files contain coordinates stored in MATLAB .mat files. You have to change these files to the .txt format that YOLOv5 uses for training.

In [None]:
## Here is an example of an annotations file
readannotfile(ANNOTATIONSPATH + "/" + "P1020595.JPG.mat")

Make sure that the data is good, one of the things you can do is make sure that you have a MATLAB file for every JPG file. You should not get any output for the cell below.

In [None]:
for fyle in os.listdir(ANNOTATIONSPATH):
    valid, _, _, _, _ = readannotfile(ANNOTATIONSPATH + "/" + fyle, silent=True)
    if not valid:
        print("Filename: {}, invalid data: {}".format(fyle, str(valid)))

### Converting from MATLAB to .txt

In [None]:
CLASSENUM = createClassMap()

for annots in os.listdir(ANNOTATIONSPATH):
    if os.path.isdir(ANNOTATIONSPATH + "/" + annots):
        continue
    parts = annots.split(".")
    baseimage = parts[0] + "." + parts[1]
    imagefile = IMAGESPATH + "/" + baseimage
    convertMATLABToYolov5(ANNOTATIONSPATH + "/" + annots, imagefile)
print("\n\n***Conversion to YOLOv5 TXT format complete!***")

Check if you have got the converted annotations right.

In [None]:
plot_bounding_yolov5_annotations(
    IMAGESPATH + "/" + "P1020595.JPG", YOLO5ANNOTATIONS + "/P1020595.txt"
)

In [None]:
# You are done with the training dataset. Getting the validation dataset ready


def getClassNum(givenfile):
    cl = next(
        (item for item in CLASSENUM if item["name"] == givenfile.split("/")[-1] + ".mat"), None
    )
    return cl["clas"]


def createValidationData(givenfile, imagefile):
    """
    givenfile: full path to the test <filename>.JPG.mat
    imagefile: full path to the test <filename>.JPG
    """
    width, height = Image.open(imagefile).size
    # Lets get all the bounding boxes
    bandcs = scipy.io.loadmat(givenfile)
    for vertice in bandcs["outline"][0]:
        tl, tr, br, bl = vertice[0]
        cls = getClassNum(vertice[1][0])
        xmin, ymin, xmax, ymax = tl[0], tl[1], br[0], br[1]
        xcenter = (xmin + (xmax - xmin) / 2.0) / width
        ycenter = (ymin + (ymax - ymin) / 2.0) / height
        box_width = (xmax - xmin) / width
        box_height = (ymax - ymin) / height

        ## writing to the new annotation file
        if not os.path.exists(TESTYOLO5ANNOTATIONS):
            os.makedirs(TESTYOLO5ANNOTATIONS)
        yolofilename = givenfile.split("/")[-1].split(".")[0] + ".txt"
        f = open(TESTYOLO5ANNOTATIONS + "/" + yolofilename, "a")
        print("Writing to {}".format(TESTYOLO5ANNOTATIONS + "/" + yolofilename))
        original_stdout = sys.stdout
        sys.stdout = f
        print("{} {} {} {} {}".format(cls, xcenter, ycenter, box_width, box_height))
        sys.stdout = original_stdout
        f.close()


def plot_bounding_yolov5_annotations_testimage(imagefile, annotationfile, num):
    """
    imagefile: full path to image file
    annotationfile: full path to annotation file
    num: the annotation to display
    """
    assert os.path.exists(imagefile)
    image = Image.open(imagefile)

    assert os.path.exists(imagefile)
    f = open(annotationfile, "r")
    bx = ""
    for number, line in enumerate(f):
        if number == num:
            bx = line  # We want to teest a single box in any image
    f.close()

    annotation_list = bx.strip().split(" ")

    annotations = np.array(annotation_list, dtype=np.float64)

    w, h = image.size

    plotted_image = ImageDraw.Draw(image)

    chged_annots = np.array(annotations)
    chged_annots[1] = annotations[1] * w
    chged_annots[2] = annotations[2] * h
    chged_annots[3] = annotations[3] * w
    chged_annots[4] = annotations[4] * h
    chged_annots[1] = chged_annots[1] - chged_annots[3] / 2  # xmin
    chged_annots[2] = chged_annots[2] - chged_annots[4] / 2  # ymin
    chged_annots[3] = chged_annots[1] + chged_annots[3]  # xmax
    chged_annots[4] = chged_annots[2] + chged_annots[4]  # ymax

    xmin, ymin, xmax, ymax = chged_annots[1], chged_annots[2], chged_annots[3], chged_annots[4]

    plotted_image.rectangle(((xmin, ymin), (xmax, ymax)), outline="red", width=2)

    plt.imshow(np.array(image))
    plt.show()

In [None]:
for annots in os.listdir(TESTANNOTATIONSPATH):
    if os.path.isdir(TESTANNOTATIONSPATH + "/" + annots):
        continue
    parts = annots.split(".")
    baseimage = parts[0] + "." + parts[1]
    imagefile = TESTIMAGESPATH + "/" + baseimage
    createValidationData(TESTANNOTATIONSPATH + "/" + annots, imagefile)
print("\n\n***Conversion to YOLOv5 TXT format complete!***")

In [None]:
# Check if you have the test annotations right as well
plot_bounding_yolov5_annotations_testimage(
    TESTIMAGESPATH + "/P1020827.JPG", TESTYOLO5ANNOTATIONS + "/P1020827.txt", 2
)

### Converting from Amazon SageMaker Ground Truth bounding box to .txt [OPTIONAL]

You can use [Amazon SageMaker Ground Truth](https://aws.amazon.com/sagemaker/data-labeling/) to build your data labeling workflows, it is a data labeling service that enables you to use [Amazon Mechanical Turk](https://www.mturk.com/), third party vendors, or your own private workforce for your data labeling tasks. SageMaker Ground Truth can be used to label images, text, videos and video frames, as well as 3D point clouds. It can also generate labeled synthetic data.

If you are using SageMaker Ground Truth, you can create a [bounding box labeling job](https://docs.aws.amazon.com/sagemaker/latest/dg/sms-bounding-box.html), this notebook demonstrates how to transform the output from a ground truth labeling job to an input format suitable for your YOLOv5 training job. Please refer to the [SageMaker documentation](https://docs.aws.amazon.com/sagemaker/latest/dg/sms-bounding-box.html) for guidance on how you can create a bounding box labeling job. 

Below is a sample output line from such a labeling job, formatted for easy reading. This is one line from a  ```output.manifest``` file. This file is in the [JSON lines format](https://jsonlines.org/). Along with this notebook you can find a sample output manifest file named ```sample_output.manifest```. This sample Ground Truth labeling job output file will be used to demonstrate how you can transform annotations to a format appropriate for YOLOv5 training.

```
{
    "source-ref": "s3://my-groundtruth-bucket/ground-truth-od-full-demo/images/000062a39995e348.jpg",
    "category": {
        "image_size": [
            {
                "width": 680,
                "height": 1024,
                "depth": 3
            }
        ],
        "annotations": [
            {
                "class_id": 0,
                "top": 164.60000000000002,
                "left": 138.8,
                "height": 859,
                "width": 443.8
            }
        ]
    },
    "category-metadata": {
        "objects": [
            {
                "confidence": 0.93
            }
        ],
        "class-map": {
            "0": "Bird"
        },
        "type": "groundtruth/object-detection",
        "human-annotated": "yes",
        "creation-date": "2022-09-12T09:51:00.148118",
        "job-name": "labeling-job/ground-truth-od-demo-1662974840"
    }
}
```

The above output shows a single class bounding box labeling job which is a single json line in the output manifest. The above output was extracted from the output manifest generated after running [this notebook](https://github.com/aws/amazon-sagemaker-examples/blob/6ac5bb28dcbe29e16d3cb8fe7169cabe1c6f34eb/ground_truth_labeling_jobs/ground_truth_object_detection_tutorial/object_detection_tutorial.ipynb) using a SageMaker Notebook instance. The function below ```convertGT2TXT```, transforms an output.manifest file to .TXT files that the YOLOv5 training job expects.

In [None]:
def convertGT2TXT(manifestfile, topannotationloc):
    """
    Converts an output manifest file
    from its existing JSONlines format
    to the TXT format for YOLOv5
    training.
    manifestfile: The name of the output manifest file from the Ground Truth labeling job.
    topannotationloc: The parent directory below which the TXT annotations files are placed.
    """
    annotobj = []
    with open(manifestfile) as mf:
        for line in mf:
            annotobj.append(json.loads(line))
    for ob in annotobj:
        ## Get the TXT filename
        txtflname = (ob["source-ref"].split("/")[-1]).split(".")[-2] + ".txt"
        print("Writing to {}".format(topannotationloc + "/" + txtflname))
        ## Writing to the new annotation file
        annotf = open(topannotationloc + "/" + txtflname, "w")
        original_stdout = sys.stdout
        sys.stdout = annotf
        input_width = ob["category"]["image_size"][0]["width"]
        input_height = ob["category"]["image_size"][0]["height"]
        for annots in ob["category"]["annotations"]:
            clas = annots["class_id"]
            xcenter = (annots["left"] + annots["width"] / 2.0) / input_width
            ycenter = (annots["top"] + annots["height"] / 2.0) / input_height
            width = annots["width"] / input_width
            height = annots["height"] / input_height
            print("{} {} {} {} {}".format(clas, xcenter, ycenter, width, height))
        sys.stdout = original_stdout
        annotf.close()

In [None]:
## Generating TXT files, assuming a directory does not exist
!mkdir gt2txt
convertGT2TXT("./sample_output.manifest", "gt2txt")

Eventually, you need to build the same file structure with the TXT files generated from the above function execution, and image files, as demonstrated below.

### Training and validation sets

Here, you are creating the training and validation sets, with 20% set aside for validation.

In [None]:
def getImageList(direc):
    fyls = []
    for fyl in os.listdir(direc):
        if os.path.isdir(direc + "/" + fyl):
            continue
        fyls.append(fyl)
    random.shuffle(fyls)
    return fyls


fyls = getImageList(TESTIMAGESPATH)
annots = [fyl.split(".")[0] + ".txt" for fyl in fyls]

## We will use only 50% of this test dataset for validation
X_val, _, y_val, _ = train_test_split(fyls, annots, test_size=0.50, random_state=42)

print("A set of 5 validation image files : ")
X_val[:5]

In [None]:
imagefiles = getImageList(IMAGESPATH)
annotationfiles = [fyl.split(".")[0] + ".txt" for fyl in imagefiles]

X_train = imagefiles
y_train = annotationfiles

print("A set of 5 training image files : ")
X_train[:5]

Right now, the data is organized like so:

**Images**: ```IMAGESPATH/*.JPG```

**Annotations**: ```YOLO5ANNOTATIONS/*.txt```

You need to separate these into training and validation datasets. This is what it should look like on S3, and eventually, on the training container.

**Training**: ```<base path>/[images|labels]/train```

**Validation**: ```<base path>/[images|labels]/val```

In [None]:
S3TRAINIMAGESLOC = prefix + "/data/train/"
S3TRAINANNOTSLOC = prefix + "/data/train/"
S3OTHERDATA = prefix + "/classes/"
S3VALIMAGESLOC = prefix + "/data/val/"
S3VALANNOTSLOC = prefix + "/data/val/"

s3_client = boto3.client("s3")

for i in X_train:
    try:
        response = s3_client.upload_file(IMAGESPATH + "/" + i, bucket, S3TRAINIMAGESLOC + i)
    except ClientError as e:
        logging.error(e)

for i in X_val:
    try:
        response = s3_client.upload_file(TESTIMAGESPATH + "/" + i, bucket, S3VALIMAGESLOC + i)
    except ClientError as e:
        logging.error(e)

for i in y_train:
    try:
        response = s3_client.upload_file(YOLO5ANNOTATIONS + "/" + i, bucket, S3TRAINANNOTSLOC + i)
    except ClientError as e:
        logging.error(e)

for i in y_val:
    try:
        response = s3_client.upload_file(TESTYOLO5ANNOTATIONS + "/" + i, bucket, S3VALANNOTSLOC + i)
    except ClientError as e:
        logging.error(e)

TRAIN_CHANNEL = "s3://" + bucket + "/" + prefix + "/data/train/"
VAL_CHANNEL = "s3://" + bucket + "/" + prefix + "/data/val/"

## Upload the CLASSENUM list
jsonclasses = json.dumps(CLASSENUM)
with open("classenum.json", "w") as outfile:
    outfile.write(jsonclasses)
try:
    response = s3_client.upload_file("classenum.json", bucket, S3OTHERDATA + "classenum.json")
except ClientError as e:
    logging.error(e)

## Set up the PyTorch environment for training

#### Customizing a DLC for training

You are creating your own container, using one of the Deep Learning Containers as base image. Look for an image suitable for your use case. You can use a GPU or CPU image, you specify the kind of image you want to use, specify the the framework you intend to use, the version of the framework, region and scope.

Ensure that the image you are going with as your base image, is the same image as the one you specify as the base image in the ```FROM``` instruction in your Dockerfile. Check ```docker/Dockerfile``` to make sure, and modify the first line if required.

In [None]:
image_uris.retrieve(
    framework="pytorch",
    region="us-east-1",
    version="1.11.0",
    py_version="py38",
    image_scope="training",
    instance_type="ml.c5.4xlarge",
)

In [None]:
!pygmentize docker/Dockerfile

#### If you are in a SageMaker Studio notebook environment...

**DO NOT** run the next three cells below. Instead, follow the instructions on [building and pushing your docker container in the studio notebook environment](#Building-your-docker-image-and-pushing-it-to-Amazon-EC2-Container-Registry-(ECR)-in-a-SageMaker-Studio-notebook-environment) below.

In [None]:
!pygmentize docker/build_and_push.sh

In [None]:
!chmod +x docker/build_and_push.sh && docker/build_and_push.sh

# check that your image exists
!docker ps

In [None]:
client = boto3.client("sts")
account = client.get_caller_identity()["Account"]

my_session = boto3.session.Session()
region = my_session.region_name

algorithm_name = "pytorch-training-container-extension-yolov5-cpu"

image_uri = "{}.dkr.ecr.{}.amazonaws.com/{}:latest".format(account, region, algorithm_name)

print(image_uri)

#### In the SageMaker Studio notebook environment : Building your docker image and pushing it to Amazon EC2 Container Registry (ECR)

1. Give your SageMaker execution role all the required permissions, go through the [pre-requisites](#Pre-requisites-if-you-are-in-a-SageMaker-Studio-environment) section of this notebook above before you proceed.

2. Next, install the ```sagemaker-studio-image-build``` package using pip.

3. Finally, build and register the container image using the following command:

   ```sm-docker build . --file /path/to/Dockerfile```
   
**Uncomment and run the next three cells ONLY if you are in a SageMaker Studio notebook environment**

In [None]:
# !pip install sagemaker-studio-image-build

In [None]:
# !sm-docker build . --file docker/Dockerfile

If you are in a studio environment copy the **Image URI** from the output of the previous cell and use it to populate the ```image_uri``` variable in the next cell (below) and then uncomment the code in the cell below. 

In [None]:
# Expected format of the image URI : <<AWS_ACCOUNT_ID>>.dkr.ecr.<<REGION>>.amazonaws.com/<<ECR_REPO_NAME>>:<<SAGEMAKER_STUDIO_USER>>
# Assign the value you get for Image URI from the execution of the previous cell to the ```image_uri``` variable below before
# proceeding further. Uncomment the code below before proceeding.

# image_uri = <<Image URI value copy/pasted from the execution of the previous cell>>

__At this point__ in the notebook, irrespective of the environment you are running in i.e. SageMaker Studio or on a SageMaker notebook instance, your ```image_uri``` variable should be populated.

Since you are performing transfer learning using a custom dataset, you require weights for initialization. Pretrained weights (checkpoints) can be found on the [YOLOv5 release page](https://github.com/ultralytics/yolov5/releases). Store the downloaded weights at an S3 location. For this notebook you use the yolov5s.pt.

In [None]:
WEIGHTSLOC = prefix + "/data/weights/"

#### Get, and then upload weights to a S3 location

In [None]:
!wget https://github.com/ultralytics/yolov5/releases/download/v6.1/yolov5s.pt
!zip yolov5s.pt yolov5s.pt

##### Upload weights to S3 location for use during training later

In [None]:
s3_client = boto3.client("s3")

try:
    response = s3_client.upload_file("yolov5s.pt", bucket, WEIGHTSLOC + "yolov5s.pt")
except ClientError as e:
    logging.error(e)

#### Almost ready to initiate training

A little primer about how to leverage hyperparameters to run training and automated model tuning for YOLOv5 on Amazon SageMaker.

```freeze``` is used to freeze the weights of the backbone layers. The number of backbone layers will change based on the model you choose, in this case, we are freezing 10 layers, these layers serve as feature extractors. The head layers, which are **not** frozen compute the output predictions. For a guide to how you can decide on how many layers to freeze for transfer learning you may want to go through this thorough guide on [Transfer Learning with Frozen Layers](https://github.com/ultralytics/yolov5/issues/1314).

## Training

In [None]:
from sagemaker.pytorch import PyTorch
import json
import uuid

# JSON encode hyperparameters.
def json_encode_hyperparameters(hyperparameters):
    return {str(k): json.dumps(v) for (k, v) in hyperparameters.items()}


# The values for hyperparameters are just examples and you are advised to change them to what suits your use case.
hyperparameters = json_encode_hyperparameters(
    {
        "epochs": 10,
        "batchsize": 8,
        "freeze": 10,
        "patience": 10,
        "lr0": 0.01,
        "lrf": 0.01,
        "weights": "s3://" + bucket + "/" + prefix + "/data/weights/" + "yolov5s.pt",
        "classes": "s3://" + bucket + "/" + prefix + "/classes/classenum.json",
    }
)

training_uuid = uuid.uuid1()
training_job_name = "yolov5-project-" + str(training_uuid)

print("Starting training job : {}".format(training_job_name))

pt_estimator = PyTorch(
    entry_point="yolov5/training-wrapper.py",
    role=role,
    instance_type="ml.c5.4xlarge",
    volume_size=100,
    instance_count=1,
    framework_version="1.11.0",
    py_version="py3",
    hyperparameters=hyperparameters,
    image_uri=image_uri,
    debugger_hook_config=False,
    output_path="s3://" + bucket + "/" + prefix + "/output",
)

pt_estimator.fit(
    {
        "train": "s3://" + bucket + "/" + prefix + "/data/train",
        "val": "s3://" + bucket + "/" + prefix + "/data/val",
    },
    job_name=training_job_name,
)

The following hyperparameters are used during training :

**lr0**: Initial learning rate

**lrf**: Final OneCycleLR learning rate

**momentum**: SGD momentum/Adam beta1

**weight_decay**: Optimizer weight decay

**warmup_epochs**: Warmup epochs 

**warmup_momentum**: Warmup initial momentum

**warmup_bias_lr**: Warmup initial bias lr

**box**: Box loss gain

**cls**: cls loss gain

**cls_pw**: cls BCELoss positive_weight

**obj**: obj loss gain 

**obj_pw**: obj BCELoss positive_weight

**iou_t**: IoU training threshold

**anchor_t**: anchor-multiple threshold

**anchors**: anchors per output grid 

**fl_gamma**: focal loss gamma 

**hsv_h**: image HSV-Hue augmentation 

**hsv_s**: image HSV-Saturation augmentation

**hsv_v**: image HSV-Value augmentation 

**degrees**: image rotation 

**translate**: image translation 

**scale**: image scale 

**shear**: image shear 

**perspective**: image perspective

**flipud**: image flip up-down 

**fliplr**: image flip left-right 

**mosaic**: image mosaic 

**mixup**: image mixup 

**How did the training job go?**

Extract the ```results.png``` from the ```model.tar.gz``` file. You download the ```model.tar.gz``` from the S3 location you specified in the output path when you create the estimator. 

In [None]:
s3 = boto3.client("s3")
s3.download_file(
    bucket, prefix + "/output/" + str(training_job_name) + "/output/model.tar.gz", "model.tar.gz"
)

#### Contents of the ```model.tar.gz```

In [None]:
!tar -xvzf model.tar.gz

Go through the visualizations of metrics and losses from the ```results.png```. Other relevant information is also available can also be examined like the Precision-Recall curve, weights, F1 curve and more. 

In [None]:
%matplotlib inline

import matplotlib.pyplot as plt

plt.figure(figsize=(16, 12), dpi=120)
image = plt.imread("./exp/results.png")
plt.imshow(image)
plt.show()

## Hyperparameter Optimization with Amazon SageMaker Automated Model Tuning

Here, you use a few metrics as a way for SageMaker to find the best model, you can consider as many as 20 metrics! For guidance on how to set up the objective metrics, please refer to this documentation on [Defining Metrics](https://docs.aws.amazon.com/sagemaker/latest/dg/automatic-model-tuning-define-metrics.html) for Automated Model Tuning (AMT).

The YOLOv5 uses hyperparameter evolution to optimize hyperparameters. A detailed guide is provided on how this can be [achieved here](https://docs.ultralytics.com/tutorials/hyperparameter-evolution/). In this notebook, you address HPO using Amazon SageMaker AMT. The hyperparameters used by YOLOv5 are sourced from ```hyp.scratch.yaml``` file. Below, is a list of hyperparameters that can be tuned using evolution during training, along with their names and default values:

```
lr0: 0.01  # initial learning rate (SGD=1E-2, Adam=1E-3)
lrf: 0.2  # final OneCycleLR learning rate (lr0 * lrf)
momentum: 0.937  # SGD momentum/Adam beta1
weight_decay: 0.0005  # optimizer weight decay 5e-4
warmup_epochs: 3.0  # warmup epochs (fractions ok)
warmup_momentum: 0.8  # warmup initial momentum
warmup_bias_lr: 0.1  # warmup initial bias lr
box: 0.05  # box loss gain
cls: 0.5  # cls loss gain
cls_pw: 1.0  # cls BCELoss positive_weight
obj: 1.0  # obj loss gain (scale with pixels)
obj_pw: 1.0  # obj BCELoss positive_weight
iou_t: 0.20  # IoU training threshold
anchor_t: 4.0  # anchor-multiple threshold
anchors: 0  # anchors per output grid (0 to ignore)
fl_gamma: 0.0  # focal loss gamma (efficientDet default gamma=1.5)
hsv_h: 0.015  # image HSV-Hue augmentation (fraction)
hsv_s: 0.7  # image HSV-Saturation augmentation (fraction)
hsv_v: 0.4  # image HSV-Value augmentation (fraction)
degrees: 0.0  # image rotation (+/- deg)
translate: 0.1  # image translation (+/- fraction)
scale: 0.5  # image scale (+/- gain)
shear: 0.0  # image shear (+/- deg)
perspective: 0.0  # image perspective (+/- fraction), range 0-0.001
flipud: 0.0  # image flip up-down (probability)
fliplr: 0.5  # image flip left-right (probability)
mosaic: 1.0  # image mosaic (probability)
mixup: 0.0  # image mixup (probability)
```

Fitness is the criterion used in YOLOv5 to arrive at the best model, and the value that is maximized. The challenge with using evolution is that it requires at least 300 generations (recommended), this can get pretty time-consuming and expensive as it will need 100s or 1000s of GPU hours. Nevertheless, if you do want to use evolution, all you have to do is add --evolve when ```train.py``` is executed.

These are the hyper parameters you can tune using Amazon SageMaker AMT:

```
lr0: continuous
lrf: continuous
momentum: continuous
weight_decay: continuous
warmup_epochs: integer
warmup_momentum: continuous
warmup_bias_lr: continuous
box: continuous
cls: continuous
cls_pw: integer
obj: integer
obj_pw: integer
iou_t: continuous
anchor_t: integer
anchors: integer
fl_gamma: continuous
hsv_h: continuous
hsv_s: continuous
hsv_v: continuous
degrees: integer
translate: continuous
scale: continuous
shear: integer
perspective: continuous
flipud: continuous
fliplr: continuous
mosaic: continuous
mixup: continuous

```

Identify the metric you will use to evaluate. In this case you will use the objective metric of mAP@.5. You will be maximizing this metric. This is a very common metric used in object detection. You can learn more about mAP and other advanced metrics in this Stanford CS230 page on [Advanced Evaluation Metrics](https://cs230.stanford.edu/section/8/). The code below demonstrates how you can perform AMT in SageMaker for tuning the YOLOv5 model for your dataset.

In [None]:
from time import gmtime, strftime
from sagemaker.pytorch import PyTorch
from sagemaker.tuner import IntegerParameter, ContinuousParameter, HyperparameterTuner

hyperparameters = json_encode_hyperparameters(
    {
        "freeze": 10,
        "patience": 10,
        "lr0": 0.01,
        "lrf": 0.01,
        "weights": "s3://" + bucket + "/" + prefix + "/data/weights/" + "yolov5s.pt",
        "classes": "s3://" + bucket + "/" + prefix + "/classes/classenum.json",
    }
)

hyperparameter_ranges = {
    "epochs": IntegerParameter(10, 20),
    "batchsize": IntegerParameter(8, 64),
    "iou_t": ContinuousParameter(0.15, 0.25),
}

objective_metric_name = "mAP.5"
objective_type = "Maximize"
metric_definitions = [{"Name": "mAP.5", "Regex": "(0\.[0-9]{1,6}).{7,12}$"}]

tuning_uuid = uuid.uuid1()
tuning_job_name = "yolov5-project-hpo-" + str(training_uuid)

tuner = HyperparameterTuner(
    pt_estimator,
    objective_metric_name=objective_metric_name,
    objective_type=objective_type,
    hyperparameter_ranges=hyperparameter_ranges,
    metric_definitions=metric_definitions,
    max_jobs=2,  # Change this to a suitable number that makes sense in your case
    max_parallel_jobs=1,  # Change this to a suitable number that makes sense in your case
    base_tuning_job_name=tuning_job_name,
)

tuner.fit(
    {
        "train": "s3://" + bucket + "/" + prefix + "/data/train",
        "val": "s3://" + bucket + "/" + prefix + "/data/val",
    },
    wait=True,
)

Now that the tuning job is complete, find the best model from the training job that gave us the best results.

In [None]:
tuner.best_training_job()

Download the best model.

In [None]:
s3 = boto3.client("s3")
s3.download_file(
    bucket,
    prefix + "/output/" + tuner.best_training_job() + "/output/model.tar.gz",
    "hpo.model.tar.gz",
)

## Conclusion : Deploying YOLOv5 on Amazon SageMaker

Whether you run a training job, or perform automated model tuning (AMT), you need to create a ```model.tar.gz``` file that has to follow a specific structure. In this structure you will need to have the model, in this case the ```best.pt``` file. This is available in the ```weights``` directory in your ```model.tar.gz``` file. For a deeper understanding of how to create the model directory structure refer to [Use PyTorch with SageMaker SDK](https://sagemaker.readthedocs.io/en/stable/frameworks/pytorch/using_pytorch.html#deploy-pytorch-models).

Further guidance on deploying a YOLOv5 model is available in this AWS Machine Learning blog post about scaling [YOLOv5 inference with Amazon SageMaker endpoints and AWS Lambda](https://aws.amazon.com/blogs/machine-learning/scale-yolov5-inference-with-amazon-sagemaker-endpoints-and-aws-lambda/).

## Notebook CI Test Results

This notebook was tested in multiple regions. The test results are as follows, except for us-west-2 which is shown at the top of the notebook.

![This us-east-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/us-east-1/advanced_functionality|pytorch_yolov5_training_and_hpo|transfer_learning_and_hpo_yolov5.ipynb)

![This us-east-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/us-east-2/advanced_functionality|pytorch_yolov5_training_and_hpo|transfer_learning_and_hpo_yolov5.ipynb)

![This us-west-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/us-west-1/advanced_functionality|pytorch_yolov5_training_and_hpo|transfer_learning_and_hpo_yolov5.ipynb)

![This ca-central-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/ca-central-1/advanced_functionality|pytorch_yolov5_training_and_hpo|transfer_learning_and_hpo_yolov5.ipynb)

![This sa-east-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/sa-east-1/advanced_functionality|pytorch_yolov5_training_and_hpo|transfer_learning_and_hpo_yolov5.ipynb)

![This eu-west-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/eu-west-1/advanced_functionality|pytorch_yolov5_training_and_hpo|transfer_learning_and_hpo_yolov5.ipynb)

![This eu-west-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/eu-west-2/advanced_functionality|pytorch_yolov5_training_and_hpo|transfer_learning_and_hpo_yolov5.ipynb)

![This eu-west-3 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/eu-west-3/advanced_functionality|pytorch_yolov5_training_and_hpo|transfer_learning_and_hpo_yolov5.ipynb)

![This eu-central-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/eu-central-1/advanced_functionality|pytorch_yolov5_training_and_hpo|transfer_learning_and_hpo_yolov5.ipynb)

![This eu-north-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/eu-north-1/advanced_functionality|pytorch_yolov5_training_and_hpo|transfer_learning_and_hpo_yolov5.ipynb)

![This ap-southeast-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/ap-southeast-1/advanced_functionality|pytorch_yolov5_training_and_hpo|transfer_learning_and_hpo_yolov5.ipynb)

![This ap-southeast-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/ap-southeast-2/advanced_functionality|pytorch_yolov5_training_and_hpo|transfer_learning_and_hpo_yolov5.ipynb)

![This ap-northeast-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/ap-northeast-1/advanced_functionality|pytorch_yolov5_training_and_hpo|transfer_learning_and_hpo_yolov5.ipynb)

![This ap-northeast-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/ap-northeast-2/advanced_functionality|pytorch_yolov5_training_and_hpo|transfer_learning_and_hpo_yolov5.ipynb)

![This ap-south-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/ap-south-1/advanced_functionality|pytorch_yolov5_training_and_hpo|transfer_learning_and_hpo_yolov5.ipynb)
