# Transfer Learning and Hyperparameter Optimization for YOLOv5 using Amazon SageMaker

YOLO (You Only Look Once) is one of a family of models that is used for object detection. Object detection is a form of ML solution in computer vision (CV) where a neural network predicts the presence of certain class of objects within an image and points out their location in the picture through the use of bounding boxes.

There are broadly two approaches that are commonly adopted for this task, a two step approach and a single shot approach, [YOLO](https://arxiv.org/abs/1506.02640) comes under the family of appraoches that leverages the single shot approach. The other is the [Single Shot Multibox Detector](https://arxiv.org/pdf/1512.02325.pdf) (SSD), [available built-in with Amazon SageMaker](https://docs.aws.amazon.com/sagemaker/latest/dg/object-detection.html).

This notebook will take you through how to use Amazon SageMaker for transfer learning a YOLOv5 model on custom data. You will be using AWS [Deep Learning Containers (DLC)](https://github.com/aws/deep-learning-containers/blob/master/available_images.md) to customise the training container so you can use transfer learning to train a YOLOv5 model to detect car licence plates (custom data). If you are using a Amazon SageMaker Notebook instance, you should use the **Python 3** kernel, if you are in a SageMaker Studio environment use the **Python 3 Data Science** kernel.

You will also use Amazon SageMaker's Automated Model Tuning feature to perform hyperparameter optimisation to arrive at the best possible model.

You will be using this [car licence plate detection dataset from Kaggle](https://www.kaggle.com/datasets/andrewmvd/car-plate-detection). This would be applicable to a fairly common use case like keeping track of cars in a parking lot, or tracking cars as theey pass through toll booths. The YOLOv5 model fits the use case just right.

Broadly, you will be covering the following topics in this notebook:

1. [Pre-requisites if you are in a SageMaker Studio environment](#Pre-requisites-if-you-are-in-a-SageMaker-Studio-environment)

2. [Data Preparation](#Data-Preparation)
    
    a. [Converting from VOC Pascal to .txt](#Converting-from-VOC-Pascal-to-.txt)
    
    b. [Converting from Amazon SageMaker Ground Truth bounding box to .txt [OPTIONAL]](#Converting-from-Amazon-SageMaker-Ground-Truth-bounding-box-to-.txt-[OPTIONAL])
    
    c. [Training and validation sets](#Training-and-validation-sets)
    
    
3. [Set up the PyTorch environment for training](#Set-up-the-PyTorch-environment-for-training)

    a. [Customising a DLC for training](#Customising-a-DLC-for-training)
    
    b. [Get and then upload weights to S3 location](#Get-and-then-upload-weights-to-S3-location)
    

4. [Training](#Training)


5. [Hyperparameter Optimization with Amazon SageMaker Automated Model Tuning](#Hyperparameter-Optimization-with-Amazon-SageMaker-Automated-Model-Tuning)


6. [Conclusion : Deploying YOLOv5 on Amazon SageMaker](#Conclusion-:-Deploying-YOLOv5-on-Amazon-SageMaker)



To begin with, you will start by downloading the dataset by clicking the **Download button** on [this page](https://www.kaggle.com/datasets/andrewmvd/car-plate-detection), look for it on the top right, alternatively you could also use [the Kaggle API](https://github.com/Kaggle/kaggle-api). Upload the downloaded file ```archive.zip``` to your working directory on this SageMaker notebook instance. Once you have it on the notebook instance follow the instructions below.

## Pre-requisites if you are in a SageMaker Studio environment

**Check your SageMaker execution role** if not already, ensure that you have the following trust policy with AWS Codebuild in place:

```
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Service": [
          "codebuild.amazonaws.com"
        ]
      },
      "Action": "sts:AssumeRole"
    }
  ]
}
```
You can use this guidance on [editing trust relationship for an existing role](https://docs.aws.amazon.com/directoryservice/latest/admin-guide/edit_trust.html) to edit the trust policy for your SageMaker execution role.

Add this inline policy, you can use this guidance from the AWS IAM documentation to [add a permission policy inline](https://docs.aws.amazon.com/IAM/latest/UserGuide/access_policies_manage-attach-detach.html#add-policies-console) to your SageMaker execution role.

```
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "codebuild:DeleteProject",
                "codebuild:CreateProject",
                "codebuild:BatchGetBuilds",
                "codebuild:StartBuild"
            ],
            "Resource": "arn:aws:codebuild:*:*:project/sagemaker-studio*"
        },
        {
            "Effect": "Allow",
            "Action": "logs:CreateLogStream",
            "Resource": "arn:aws:logs:*:*:log-group:/aws/codebuild/sagemaker-studio*"
        },
        {
            "Effect": "Allow",
            "Action": [
                "logs:GetLogEvents",
                "logs:PutLogEvents"
            ],
            "Resource": "arn:aws:logs:*:*:log-group:/aws/codebuild/sagemaker-studio*:log-stream:*"
        },
        {
            "Effect": "Allow",
            "Action": "logs:CreateLogGroup",
            "Resource": "*"
        },
        {
            "Effect": "Allow",
            "Action": [
                "ecr:CreateRepository",
                "ecr:BatchGetImage",
                "ecr:CompleteLayerUpload",
                "ecr:DescribeImages",
                "ecr:DescribeRepositories",
                "ecr:UploadLayerPart",
                "ecr:ListImages",
                "ecr:InitiateLayerUpload",
                "ecr:BatchCheckLayerAvailability",
                "ecr:PutImage"
            ],
            "Resource": "arn:aws:ecr:*:*:repository/sagemaker-studio*"
        },
        {
            "Effect": "Allow",
            "Action": "ecr:GetAuthorizationToken",
            "Resource": "*"
        },
        {
            "Effect": "Allow",
            "Action": [
              "s3:GetObject",
              "s3:DeleteObject",
              "s3:PutObject"
              ],
            "Resource": "arn:aws:s3:::sagemaker-*/*"
        },
        {
            "Effect": "Allow",
            "Action": [
                "s3:CreateBucket"
            ],
            "Resource": "arn:aws:s3:::sagemaker*"
        },
        {
            "Effect": "Allow",
            "Action": [
                "iam:GetRole",
                "iam:ListRoles"
            ],
            "Resource": "*"
        },
        {
            "Effect": "Allow",
            "Action": "iam:PassRole",
            "Resource": "arn:aws:iam::*:role/*",
            "Condition": {
                "StringLikeIfExists": {
                    "iam:PassedToService": "codebuild.amazonaws.com"
                }
            }
        }
    ]
}
```

You can now proceed with executing the rest of the notebook.

## Data Preparation

In [None]:
import sagemaker

sagemaker_session = sagemaker.Session()

bucket = sagemaker_session.default_bucket()
prefix = "sagemaker/Sample-pytorch-YOLOv5-LP-Detection"

role = sagemaker.get_execution_role()
print("Bucket Name: {} and the role is {}".format(bucket, role))

In [None]:
import os
import boto3
import xml.etree.ElementTree as ET
import sys
import json
import logging
import random
import matplotlib.pyplot as plt
import numpy as np
from sagemaker import image_uris
from zipfile import ZipFile
from botocore.exceptions import ClientError
from sklearn.model_selection import train_test_split
from PIL import Image, ImageDraw


SAGEMAKERWDIR = os.getcwd() + "/"
DATASETLOCALBASE = SAGEMAKERWDIR + "data"
ANNOTATIONSPATH = DATASETLOCALBASE + "/annotations"
YOLO5ANNOTATIONS = ANNOTATIONSPATH + "/YOLOv5"
IMAGESPATH = DATASETLOCALBASE + "/images"


def unzipdataset(wrkdir, archname):
    with ZipFile(archname, "r") as arch:
        if not os.path.exists(DATASETLOCALBASE):
            os.makedirs(DATASETLOCALBASE)
        arch.extractall(path=DATASETLOCALBASE)
        print("Done extracting zip archive!")


def readannotfile(givenfile, silent=False):
    givenfile = ANNOTATIONSPATH + "/" + givenfile
    if silent == False:
        try:
            with open(givenfile, "r") as fle:
                for line in fle:
                    print(line)
        except FileNotFoundError as e:
            print(e)
            return False
    return os.path.exists(
        IMAGESPATH + "/" + ((os.path.basename(givenfile)).split(".")[-2]) + ".png"
    )


def parseVOCPascalFile(xmlfilename):
    if readannotfile(ANNOTATIONSPATH + "/" + xmlfilename, silent=True):
        tree = ET.parse(ANNOTATIONSPATH + "/" + xmlfilename)
        root = tree.getroot()
        completeflname, clas, xcenter, ycenter, width, height = "", 0, 0.0, 0.0, 0.0, 0.0
        ## We are only interested in filename, object class, xmin, ymin, xmax, ymax, width, height
        for filename in root.findall("filename"):
            fname = filename.text
            flname = ((fname.split("."))[-2]) + ".txt"
            if not os.path.exists(YOLO5ANNOTATIONS):
                os.makedirs(YOLO5ANNOTATIONS)
            completeflname = YOLO5ANNOTATIONS + "/" + flname
        for obj in root.findall("object"):
            cname = obj.find("name").text
            if cname == "licence":
                clas = 0
            xmin = float(obj.find("bndbox").find("xmin").text)
            ymin = float(obj.find("bndbox").find("ymin").text)
            xmax = float(obj.find("bndbox").find("xmax").text)
            ymax = float(obj.find("bndbox").find("ymax").text)
        for size in root.findall("size"):
            input_width = float(size.find("width").text)
            input_height = float(size.find("height").text)
        xcenter = (xmin + (xmax - xmin) / 2.0) / input_width
        ycenter = (ymin + (ymax - ymin) / 2.0) / input_height
        width = (xmax - xmin) / input_width
        height = (ymax - ymin) / input_height

        ## writing to the new annotation file
        f = open(YOLO5ANNOTATIONS + "/" + flname, "w")
        print("Writing to {}".format(completeflname))
        original_stdout = sys.stdout
        sys.stdout = f
        print("{} {} {} {} {}".format(clas, xcenter, ycenter, width, height))
        sys.stdout = original_stdout
        f.close()


def plot_bounding_box(imagefile, annotationfile):
    assert os.path.exists(imagefile)
    image = Image.open(imagefile)

    assert os.path.exists(imagefile)
    f = open(annotationfile, "r")
    bx = ""
    for line in f:
        bx = line  # We want to teest a single box in any image
    f.close()

    annotation_list = bx.strip().split(" ")

    annotations = np.array(annotation_list, dtype=np.float64)

    w, h = image.size

    plotted_image = ImageDraw.Draw(image)

    chged_annots = np.array(annotations)
    chged_annots[1] = annotations[1] * w
    chged_annots[2] = annotations[2] * h
    chged_annots[3] = annotations[3] * w
    chged_annots[4] = annotations[4] * h
    chged_annots[1] = chged_annots[1] - chged_annots[3] / 2  # xmin
    chged_annots[2] = chged_annots[2] - chged_annots[4] / 2  # ymin
    chged_annots[3] = chged_annots[1] + chged_annots[3]  # xmax
    chged_annots[4] = chged_annots[2] + chged_annots[4]  # ymax

    xmin, ymin, xmax, ymax = chged_annots[1], chged_annots[2], chged_annots[3], chged_annots[4]

    plotted_image.rectangle(((xmin, ymin), (xmax, ymax)), outline="red", width=2)

    plt.imshow(np.array(image))
    plt.show()

In [None]:
unzipdataset(SAGEMAKERWDIR, "archive.zip")

Let's read one of the annotations files, these files are in the PASCAL VOC format. Lets see what we need to do to process them. We have change it to the .txt format that we will use for the YOLOv5 model training.

In [None]:
## Here is an example of a file that we have read
readannotfile("Cars234.xml")

Also, let us make sure that our data is good, one of the things we do is make sure that we have an xml file for every png image file we have. You should not get any output for this.

In [None]:
for fyle in os.listdir(ANNOTATIONSPATH):
    valid = readannotfile(fyle, silent=True)
    if not valid:
        print("Filename: {}, invalid data: {}".format(fyle, str(valid)))

### Converting from VOC Pascal to .txt

In [None]:
for annots in os.listdir(ANNOTATIONSPATH):
    parseVOCPascalFile(annots)

Let us see if we have got the converted annotations right.

In [None]:
## A test
plot_bounding_box(IMAGESPATH + "/Cars250.png", YOLO5ANNOTATIONS + "/Cars250.txt")

One of the things we noticed about our dataset, is that this dataset has images of variable sizes, good for us YOLO accomodates this, and resizes the images, and defaults to 640. In fact, when we train on our custom dataset, the image size will default to 640.

### Converting from Amazon SageMaker Ground Truth bounding box to .txt  [OPTIONAL]

You can use [Amazon SageMaker Ground Truth](https://aws.amazon.com/sagemaker/data-labeling/) to build your data labeling workflows, it is a data labeling service that enables you to use [Amazon Mechanical Turk](https://www.mturk.com/), third party vendors, or your own private workforce for your data labeling tasks. SageMaker Ground Truth can be used to label images, text, videos and video frames, as well as 3D point clouds. It can also generate labeled synthetic data.

If you are using one of SageMaker Ground Truth, you can create a [bounding box labeling job](https://docs.aws.amazon.com/sagemaker/latest/dg/sms-bounding-box.html), this notebook will demonstrate how you can transform the output from such a job to input suitable for your YOLOv5 training job. Please refer to the [SageMaker documentation](https://docs.aws.amazon.com/sagemaker/latest/dg/sms-bounding-box.html) for guidance on how you can create a bounding box labeling job. 

Below is a sample output line from such a labeling job, formatted for easy reading. This would be one line in the ```output.manifest``` file at the end of the labeling job as the output manifest is generated in the [JSONlines format](https://jsonlines.org/). Along with this notebook you have been provided a sample output manifest file named ```sample_output.manifest```. This sample Ground Truth labeling job output file will be used to demonstrate how you can transform annotations to a format appropriate for YOLOv5 training.

```
{
    "source-ref": "s3://rns-groundtruth-bucket/ground-truth-od-full-demo/images/000062a39995e348.jpg",
    "category": {
        "image_size": [
            {
                "width": 680,
                "height": 1024,
                "depth": 3
            }
        ],
        "annotations": [
            {
                "class_id": 0,
                "top": 164.60000000000002,
                "left": 138.8,
                "height": 859,
                "width": 443.8
            }
        ]
    },
    "category-metadata": {
        "objects": [
            {
                "confidence": 0.93
            }
        ],
        "class-map": {
            "0": "Bird"
        },
        "type": "groundtruth/object-detection",
        "human-annotated": "yes",
        "creation-date": "2022-09-12T09:51:00.148118",
        "job-name": "labeling-job/ground-truth-od-demo-1662974840"
    }
}
```

The above output demonstrates a single class bounding box labeling job which is one json line in the output manifest. The above output was extracted from the output manifest generated after running [this notebook](https://github.com/aws/amazon-sagemaker-examples/blob/6ac5bb28dcbe29e16d3cb8fe7169cabe1c6f34eb/ground_truth_labeling_jobs/ground_truth_object_detection_tutorial/object_detection_tutorial.ipynb) using a SageMaker Notebook instance. Below is function that transforms an output.manifest file to .TXT files that the YOLOv5 training job expects.

In [None]:
def convertGT2TXT(manifestfile, topannotationloc):
    """
    Converts an output manifest file
    from its existing JSONlines format
    to the TXT format for YOLOv5
    training.
    manifestfile: The name of the output manifest file from the Ground Truth labeling job.
    topannotationloc: The parent directory below which the TXT annotations files are placed.
    """
    annotobj = []
    with open(manifestfile) as mf:
        for line in mf:
            annotobj.append(json.loads(line))
    for ob in annotobj:
        ## Get the TXT filename
        txtflname = (ob["source-ref"].split("/")[-1]).split(".")[-2] + ".txt"
        print("Writing to {}".format(topannotationloc + "/" + txtflname))
        ## Writing to the new annotation file
        annotf = open(topannotationloc + "/" + txtflname, "w")
        original_stdout = sys.stdout
        sys.stdout = annotf
        input_width = ob["category"]["image_size"][0]["width"]
        input_height = ob["category"]["image_size"][0]["height"]
        for annots in ob["category"]["annotations"]:
            clas = annots["class_id"]
            xcenter = (annots["left"] + annots["width"] / 2.0) / input_width
            ycenter = (annots["top"] + annots["height"] / 2.0) / input_height
            width = annots["width"] / input_width
            height = annots["height"] / input_height
            print("{} {} {} {} {}".format(clas, xcenter, ycenter, width, height))
        sys.stdout = original_stdout
        annotf.close()

In [None]:
## Generating TXT files, assuming a directory does not exist
!mkdir gt2txt
convertGT2TXT("./sample_output.manifest", "gt2txt")

Eventually, you will need to build the same file structure with these TXT files and image files as is demonstrated below.

### Training and validation sets

Here we create the training and validation sets, with 20% set aside for validation.

In [None]:
fyls = os.listdir(IMAGESPATH)
random.shuffle(fyls)

annots = [fyl.split(".")[-2] + ".txt" for fyl in fyls]
X_train, X_val, y_train, y_val = train_test_split(fyls, annots, test_size=0.20, random_state=42)

print("A set of 5 training image files : ")
X_train[:5]

Right now, the data is organized like so:

**Images**: ```IMAGESPATH/*.png```

**Annotations**: ```YOLO5ANNOTATIONS/*.txt```

We need to separate these into training and validation datasets. This is what they should look like on S3, and eventually, on the training container.

**Training**: ```<base path>/train/[images|labels]```

**Validation**: ```<base path>/val/[images|labels]```

In [None]:
S3TRAINIMAGESLOC = prefix + "/data/train/"
S3TRAINANNOTSLOC = prefix + "/data/train/"
S3VALIMAGESLOC = prefix + "/data/val/"
S3VALANNOTSLOC = prefix + "/data/val/"

s3_client = boto3.client("s3")

for i in X_train:
    try:
        response = s3_client.upload_file(IMAGESPATH + "/" + i, bucket, S3TRAINIMAGESLOC + i)
    except ClientError as e:
        logging.error(e)

for i in X_val:
    try:
        response = s3_client.upload_file(IMAGESPATH + "/" + i, bucket, S3VALIMAGESLOC + i)
    except ClientError as e:
        logging.error(e)

for i in y_train:
    try:
        response = s3_client.upload_file(YOLO5ANNOTATIONS + "/" + i, bucket, S3TRAINANNOTSLOC + i)
    except ClientError as e:
        logging.error(e)

for i in y_val:
    try:
        response = s3_client.upload_file(YOLO5ANNOTATIONS + "/" + i, bucket, S3VALANNOTSLOC + i)
    except ClientError as e:
        logging.error(e)

TRAIN_CHANNEL = "s3://" + bucket + "/" + prefix + "/data/train/"
VAL_CHANNEL = "s3://" + bucket + "/" + prefix + "/data/val/"

## Set up the PyTorch environment for training

#### Customising a DLC for training

You will be creating your own container, you will be using one of the Deep Learning Containers as a base image. Look for something suitable that you can use. You can alternatively use a GPU image, you can specify the kind of image you want to use. Think of your use case when deciding on the kind of DLC image you want to use.

In [None]:
image_uris.retrieve(
    framework="pytorch",
    region="us-east-1",
    version="1.11.0",
    py_version="py38",
    image_scope="training",
    instance_type="ml.c5.4xlarge",
)

In [None]:
!pygmentize docker/Dockerfile

#### If you are in a studio notebook environment

**Do Not** run the next three cells below. Follow the instructions on [building and pushing your docker container in the studio notebook environment](#Building-your-docker-image-and-pushing-it-to-Amazon-EC2-Container-Registry-(ECR)-in-a-SageMaker-Studio-notebook-environment) below.

In [None]:
!pygmentize docker/build_and_push.sh

In [None]:
!chmod +x docker/build_and_push.sh && docker/build_and_push.sh

In [None]:
client = boto3.client("sts")
account = client.get_caller_identity()["Account"]

my_session = boto3.session.Session()
region = my_session.region_name

algorithm_name = "pytorch-training-container-extension-yolov5-cpu"

image_uri = "{}.dkr.ecr.{}.amazonaws.com/{}:latest".format(account, region, algorithm_name)

print(image_uri)

#### Building your docker image and pushing it to Amazon EC2 Container Registry (ECR) in a SageMaker Studio notebook environment

1. If you are experimenting, give your SageMaker execution role all the required permissions. You can find detailed information about the permissions required in this AWS blogpost about [building container images from your studio notebooks](https://aws.amazon.com/blogs/machine-learning/using-the-amazon-sagemaker-studio-image-build-cli-to-build-container-images-from-your-studio-notebooks/).

2. Next, install the ```sagemaker-studio-image-build``` package using pip.

3. Finally, build and register the container image using the following command:

   ```sm-docker build . --file /path/to/Dockerfile```
   
**Run the next three cells for the SageMaker Studio environment only**

In [None]:
!pip install sagemaker-studio-image-build

In [None]:
!sm-docker build . --file docker/Dockerfile

If you are in a studio environment use the **Image URI** from the execution of the previous cell. Use this value to populate the image_uri variable in the next cell. 

In [None]:
# Expected format of the image URI : <<AWS_ACCOUNT_ID>>.dkr.ecr.<<REGION>>.amazonaws.com/<<ECR_REPO_NAME>>:<<SAGEMAKER_STUDIO_USER>>
image_uri = <<Image URI value copy/pasted from the execution of the previous cell>>

__At this point__ in the notebook, irrespective of the environment you are running in i.e. Studio or Notebook instance, your ```image_uri``` variable should be populated.

Since you are tuning the model for a custom dataset, you will require weights for initialization. Pretained weights (checkpoints) can be found on the [YOLOv5 release page](https://github.com/ultralytics/yolov5/releases). Store the downloaded weights at an S3 location. For this notebook you will use the yolov5s.pt.

In [None]:
WEIGHTSLOC = prefix + "/data/weights/"

#### Get and then upload weights to S3 location

In [None]:
!wget https://github.com/ultralytics/yolov5/releases/download/v6.1/yolov5s.pt

##### Upload weights to S3 location for use during training later

In [None]:
s3_client = boto3.client("s3")

try:
    response = s3_client.upload_file("yolov5s.pt", bucket, WEIGHTSLOC + "yolov5s.pt")
except ClientError as e:
    logging.error(e)

#### Almost ready to initiate training

A little primer about how to leverage hyperparameters to run training and automated model tuning for YOLOv5 on Amazon SageMaker.

```freeze``` is used to freeze the weights of the backbone layers. The number of backbone layers will change based on the model you choose, in this case, we are freezing 10 layers, these layers serve as feature extractors. The head layers, which are **not** frozen compute the output predictions. For a guide to how you can decide on how many layers to freeze for transfer learning you may want to go through this thorough guide on [Transfer Learning with Frozen Layers](https://github.com/ultralytics/yolov5/issues/1314).

## Training

In [None]:
from sagemaker.pytorch import PyTorch
import json
import uuid

# JSON encode hyperparameters.
def json_encode_hyperparameters(hyperparameters):
    return {str(k): json.dumps(v) for (k, v) in hyperparameters.items()}


# The values for hyperparameters are just examples and you are advised to change them to what suits your use case.
hyperparameters = json_encode_hyperparameters(
    {
        "epochs": 5,
        "batchsize": 8,
        "freeze": 10,
        "patience": 10,
        "lr0": 0.01,
        "lrf": 0.01,
        "weights": "s3://" + bucket + "/" + prefix + "/data/weights/" + "yolov5s.pt",
    }
)

training_uuid = uuid.uuid1()
training_job_name = "yolov5-project-" + str(training_uuid)

print("Starting training job : {}".format(training_job_name))

pt_estimator = PyTorch(
    entry_point="yolov5/training-wrapper.py",
    role=role,
    instance_type="ml.c5.4xlarge",
    volume_size=100,
    instance_count=1,
    framework_version="1.11.0",
    py_version="py3",
    hyperparameters=hyperparameters,
    image_uri=image_uri,
    debugger_hook_config=False,
    output_path="s3://" + bucket + "/" + prefix + "/output",
)

pt_estimator.fit(
    {
        "train": "s3://" + bucket + "/" + prefix + "/data/train",
        "val": "s3://" + bucket + "/" + prefix + "/data/val",
    },
    job_name=training_job_name,
)

The following hyperparameters can be used during training :

**lr0**: Initial learning rate

**lrf**: Final OneCycleLR learning rate

**momentum**: SGD momentum/Adam beta1

**weight_decay**: Optimizer weight decay

**warmup_epochs**: Warmup epochs 

**warmup_momentum**: Warmup initial momentum

**warmup_bias_lr**: Warmup initial bias lr

**box**: Box loss gain

**cls**: cls loss gain

**cls_pw**: cls BCELoss positive_weight

**obj**: obj loss gain 

**obj_pw**: obj BCELoss positive_weight

**iou_t**: IoU training threshold

**anchor_t**: anchor-multiple threshold

**anchors**: anchors per output grid 

**fl_gamma**: focal loss gamma 

**hsv_h**: image HSV-Hue augmentation 

**hsv_s**: image HSV-Saturation augmentation

**hsv_v**: image HSV-Value augmentation 

**degrees**: image rotation 

**translate**: image translation 

**scale**: image scale 

**shear**: image shear 

**perspective**: image perspective

**flipud**: image flip up-down 

**fliplr**: image flip left-right 

**mosaic**: image mosaic 

**mixup**: image mixup 

**How did the training job go?**

You can get to the results.png in the model.tar.gz file that you can find at the S3 location you specified at the output path when you created the estimator. 

In [None]:
s3 = boto3.client("s3")
s3.download_file(
    bucket, prefix + "/output/" + str(training_job_name) + "/output/model.tar.gz", "model.tar.gz"
)

#### Contents of the ```model.tar.gz```

In [None]:
!tar -xvzf model.tar.gz

We can go through the metrics and losses saved to the ```results.png```. Other relevant information can also be examined like the Precision-Recall curve, weights, F1 curve and more. Below, we show the results.png.

In [None]:
%matplotlib inline

import matplotlib.pyplot as plt

plt.figure(figsize=(16, 12), dpi=120)
image = plt.imread("./exp/results.png")
plt.imshow(image)
plt.show()

## Hyperparameter Optimization with Amazon SageMaker Automated Model Tuning

In this case, we will try to look at only a few metrics as a way for SageMaker to find the best model for the job, we can consider as many as 20 metrics! For guidance on how to set up the objective metrics, please refer to this documentation on [Defining Metrics](https://docs.aws.amazon.com/sagemaker/latest/dg/automatic-model-tuning-define-metrics.html) for Automated Model Tuning (AMT).

The YOLOv5 uses hyperparameter evolution to optimize hyperperameters. A detailed guide is provided on how this can be [achieved here](https://docs.ultralytics.com/tutorials/hyperparameter-evolution/). We will address HPO using Amazon SageMaker. The hyperparameters used by YOLOv5 are sourced from ```hyp.scratch.yaml``` file. These are as follows, below is a list of hyperparameters that can be tuned using evolution during training, along with the names and default values:

```
lr0: 0.01  # initial learning rate (SGD=1E-2, Adam=1E-3)
lrf: 0.2  # final OneCycleLR learning rate (lr0 * lrf)
momentum: 0.937  # SGD momentum/Adam beta1
weight_decay: 0.0005  # optimizer weight decay 5e-4
warmup_epochs: 3.0  # warmup epochs (fractions ok)
warmup_momentum: 0.8  # warmup initial momentum
warmup_bias_lr: 0.1  # warmup initial bias lr
box: 0.05  # box loss gain
cls: 0.5  # cls loss gain
cls_pw: 1.0  # cls BCELoss positive_weight
obj: 1.0  # obj loss gain (scale with pixels)
obj_pw: 1.0  # obj BCELoss positive_weight
iou_t: 0.20  # IoU training threshold
anchor_t: 4.0  # anchor-multiple threshold
anchors: 0  # anchors per output grid (0 to ignore)
fl_gamma: 0.0  # focal loss gamma (efficientDet default gamma=1.5)
hsv_h: 0.015  # image HSV-Hue augmentation (fraction)
hsv_s: 0.7  # image HSV-Saturation augmentation (fraction)
hsv_v: 0.4  # image HSV-Value augmentation (fraction)
degrees: 0.0  # image rotation (+/- deg)
translate: 0.1  # image translation (+/- fraction)
scale: 0.5  # image scale (+/- gain)
shear: 0.0  # image shear (+/- deg)
perspective: 0.0  # image perspective (+/- fraction), range 0-0.001
flipud: 0.0  # image flip up-down (probability)
fliplr: 0.5  # image flip left-right (probability)
mosaic: 1.0  # image mosaic (probability)
mixup: 0.0  # image mixup (probability)
```

Fitness is the criterion used in YOLOv5 to arrive at the best model, and the value that is maximized. The challenge with using evolution is that it requires at least 300 generations (recommended), this can get pretty time consuming and expensive as it will need 100s or 1000s of GPU hours. Nevertheless, if you do want to use evolution, all you have to do is add --evolve when ```train.py``` is executed.

These are the parameters you can tune:

```
lr0: continuous
lrf: continuous
momentum: continuous
weight_decay: continuous
warmup_epochs: integer
warmup_momentum: continuous
warmup_bias_lr: continuous
box: continuous
cls: continuous
cls_pw: integer
obj: integer
obj_pw: integer
iou_t: continuous
anchor_t: integer
anchors: integer
fl_gamma: continuous
hsv_h: continuous
hsv_s: continuous
hsv_v: continuous
degrees: integer
translate: continuous
scale: continuous
shear: integer
perspective: continuous
flipud: continuous
fliplr: continuous
mosaic: continuous
mixup: continuous

```

Identify the metric you will use to evaluate. In this case you will use the objective metric of mAP@.5. You will be maximising this metric. This is a very common metric used in object detection. You can learn more about mAP and other advanced metrics in this Stanford CS230 page on [Advanced Evaluation Metrics](https://cs230.stanford.edu/section/8/). The code below demonstrates how you can perform AMT in SageMaker for tuning the YOLOv5 model for your dataset.

In [None]:
from time import gmtime, strftime
from sagemaker.pytorch import PyTorch
from sagemaker.tuner import IntegerParameter, ContinuousParameter, HyperparameterTuner

hyperparameters = json_encode_hyperparameters(
    {
        "freeze": 10,
        "patience": 10,
        "lr0": 0.01,
        "lrf": 0.01,
        "weights": "s3://" + bucket + "/" + prefix + "/data/weights/" + "yolov5s.pt",
    }
)

hyperparameter_ranges = {
    "epochs": IntegerParameter(10, 20),
    "batchsize": IntegerParameter(8, 64),
    "iou_t": ContinuousParameter(0.15, 0.25),
}

objective_metric_name = "mAP.5"
objective_type = "Maximize"
metric_definitions = [{"Name": "mAP.5", "Regex": "(0\.[0-9]{1,6}).{7,12}$"}]

tuning_uuid = uuid.uuid1()
tuning_job_name = "yolov5-project-hpo-" + str(training_uuid)

tuner = HyperparameterTuner(
    pt_estimator,
    objective_metric_name=objective_metric_name,
    objective_type=objective_type,
    hyperparameter_ranges=hyperparameter_ranges,
    metric_definitions=metric_definitions,
    max_jobs=3,  # Change this to a suitable number that makes sense in your case
    max_parallel_jobs=1,  # Change this to a suitable number that makes sense in your case
    base_tuning_job_name=tuning_job_name,
)

tuner.fit(
    {
        "train": "s3://" + bucket + "/" + prefix + "/data/train",
        "val": "s3://" + bucket + "/" + prefix + "/data/val",
    },
    wait=True,
)

Now that the tuning job is complete, find the best model from the training job that gave us the best results.

In [None]:
tuner.best_training_job()

Download the best model.

In [None]:
s3 = boto3.client("s3")
s3.download_file(
    bucket,
    prefix + "/output/" + tuner.best_training_job() + "/output/model.tar.gz",
    "hpo.model.tar.gz",
)

## Conclusion : Deploying YOLOv5 on Amazon SageMaker

Whether you simply train or perform automated model tuning (AMT), you will need create a ```model.tar.gz``` file that has to follow a specific structure, in this structure you will need to have the model, in this case the ```best.pt``` file that is available in the ```weights``` directory in the ```model.tar.gz``` file we generated from our training and AMT jobs in this notebook earlier. To dive deep into creating this model directory structure refer to [Use PyTorch with SageMaker SDK](https://sagemaker.readthedocs.io/en/stable/frameworks/pytorch/using_pytorch.html#deploy-pytorch-models).

Further guidance on deploying a YOLOv5 model can be found in this AWS Machine Learning blogpost about scaling [YOLOv5 inference with Amazon SageMaker endpoints and AWS Lambda](https://aws.amazon.com/blogs/machine-learning/scale-yolov5-inference-with-amazon-sagemaker-endpoints-and-aws-lambda/).