# Train and deploy a Custom YOLOv5 model on Amazon SageMaker Pipelines

In this notebook we will train and deploy custom YOLOv5 object detection CV model with Amazon SageMaker Pipelines.

**Steps:**

0. Initial configuration.
1. Locate a labeled dataset with YOLOv5 expected format.
2. Configure SM Pipeline Parameters
3. Configure SM Pipeline Steps
4. Execute the pipeline
5. Deploy the model after approval

## 0. Initial Configuration

In [None]:
import sys
!{sys.executable} -m pip install -qU pip
!{sys.executable} -m pip install -qU sagemaker

In [None]:
import boto3
import sagemaker
import json
import time
import uuid

from sagemaker.workflow.pipeline_context import PipelineSession
from sagemaker.workflow.steps import CacheConfig
from sagemaker.workflow.parameters import (
    ParameterInteger,
    ParameterString,
    ParameterFloat,
)

from sagemaker.workflow.model_step import ModelStep
from sagemaker.pytorch.estimator import PyTorch
from sagemaker.workflow.condition_step import ConditionStep
from sagemaker.workflow.conditions import ConditionLessThanOrEqualTo
from sagemaker.pytorch import PyTorchModel


sagemaker_session = sagemaker.session.Session()
pipeline_session = PipelineSession()
role = sagemaker.get_execution_role()

## 1. Locate a labeled dataset with YOLOv5 expected format.

Before we train a custom YOLOv5 model, we need to have a labeled dataset. In the previous notebook "0 - Label your dataset with Amazon SageMaker GroundTruth" you will be able to label your own dataset and transform it into YOLOv5 expected format or use an example custom dataset. Once you have run through one of the two options you will have available the S3 dataset location and labels used.

In [None]:
dataset_s3_uri = ""
labels = [""]

## 2. Configure SM Pipeline Parameters

Configure the different parameters the pipeline needs to run.

In [None]:
MAP_threshold = ParameterFloat(
    name="MAPThreshold", 
    default_value=0.8
)

processing_instance_count = ParameterInteger(name="ProcessingInstanceCount", 
                                             default_value=1)

instance_type = ParameterString(name="TrainingInstanceType", 
                                default_value="ml.m5.xlarge")

model_approval_status = ParameterString(
    name="ModelApprovalStatus", 
    default_value="PendingManualApproval"
)

labeled_dataset_uri = ParameterString(
    name="labeled_dataset_uri",
    default_value=dataset_s3_uri,
)

default_bucket = sagemaker_session.default_bucket()

model_package_group_name = "Yolov5-PL"

cache_config = CacheConfig(
    enable_caching=True, 
    expire_after="PT1H")

## 3. Configure SM Pipeline Steps

In [None]:
!git clone --quiet https://github.com/ultralytics/yolov5 yolov5
!wget -q https://github.com/ultralytics/yolov5/releases/download/v6.2/yolov5s.pt -P yolov5

### Train Step

We are going to update the validation file to log the metrics which we will later use to evaluate our trained model. 

In [None]:
list = ["f'MP={mp};'", "f'MR={mr};'", "f'MAP50={map50};'"]
origin_file = open('yolov5/val.py','r').readlines()
update_file = open('yolov5/val.py','w')
for line in origin_file:
    update_file.write(line)
    if '    # Return results' in line:
       for item in list:
            new_line = "    LOGGER.info(%s)" %(item)        
            update_file.write(new_line + "\n") 
update_file.close()

Now let's define the data sources to the data library in the yolov5 folder for our model to train

In [None]:
def create_datafile(labels):
    write_file = open('yolov5/data/custom-coco.yaml','w')
    write_file.write('path: /opt/ml/input/data/train\n')
    write_file.write('train: images/train\n')
    write_file.write('val: images/validation\n')
    write_file.write('names:\n')
    for idx, label in enumerate(labels):
        write_file.write("  {}: {}\n".format(idx,label))
    write_file.close()

create_datafile(labels)

In [None]:
from sagemaker.estimator import Estimator
from sagemaker.inputs import TrainingInput

metric_definitions=[
    {
        "Name": "MP",
        "Regex": "MP=(.*?);",
    },
    {
        "Name": "MR",
        "Regex": "MR=(.*?);",
    },
    {
        "Name": "MAP50",
        "Regex": "MAP50=(.*?);",
    }
]

hyperparameters={
    "workers":"8",
    "device": "0",
    "batch-size": "8",
    "epochs": "30",
    "data": "custom-coco.yaml",
    "weights": "yolov5s.pt",
    "project": "/opt/ml/model"
}

estimator = PyTorch(
    framework_version='1.11.0',
    py_version='py38',
    entry_point='train.py',
    source_dir='yolov5',
    hyperparameters=hyperparameters,
    instance_count=1,
    instance_type='ml.g5.2xlarge',
    role=role,
    disable_profiler=True, 
    debugger_hook_config=False,
    sagemaker_session=pipeline_session,
    metric_definitions=metric_definitions
)

train_args = estimator.fit(
    inputs={
        "train": TrainingInput(
            s3_data = dataset_s3_uri
        )
    },
    job_name='yolov5-train-{}'.format(str(uuid.uuid4()))
)

In [None]:
from sagemaker.inputs import TrainingInput
from sagemaker.workflow.steps import TrainingStep

step_train = TrainingStep(
    name="TrainModel",
    step_args=train_args,
    cache_config=cache_config
)

### Create Model Step

In [None]:
model = PyTorchModel(
    entry_point='detect.py',
    source_dir='helper-code',
    model_data=step_train.properties.ModelArtifacts.S3ModelArtifacts,
    framework_version='1.11.0',
    py_version='py38',
    role=role,
    sagemaker_session=pipeline_session
)

step_create_model = ModelStep(
    name="CreateModel",
    step_args=model.create(instance_type="ml.c5.large"),
)

step_create_model.depends_on = [step_cond]

### Register Model Step

In [None]:
register_model_step_args = model.register(
    content_types=["application/json"],
    response_types=["application/json"],
    inference_instances=["ml.c5.large"],
    model_package_group_name=model_package_group_name,
    approval_status=model_approval_status
)

step_register = ModelStep(
   name="RegisterModel",
   step_args=register_model_step_args,
)

step_register.depends_on = [step_create_model]

### Condition Step

In [None]:
cond_lte = ConditionLessThanOrEqualTo(
    right=step_train.properties.FinalMetricDataList['MAP50'].Value,
    left=MAP_threshold,
)

step_cond = ConditionStep(
    name="EvaluateMetrics",
    conditions=[cond_lte],
    if_steps=[step_create_model,step_register],
    else_steps=[],
)

step_cond.add_depends_on([step_train])

## 4. Execute the pipeline

In [None]:
from sagemaker.workflow.pipeline import Pipeline

pipeline_name = f"YOLOv5-Pipeline"
pipeline = Pipeline(
    name=pipeline_name,
    parameters=[
        labeled_dataset_uri,
        processing_instance_count,
        instance_type,
        model_approval_status,
        MAP_threshold
    ],
    steps=[step_train, step_cond]
)

Verify the pipeline json definition is well constructed

In [None]:
json.loads(pipeline.definition())

Upsert the newly created pipeline

In [None]:
pipeline.upsert(role_arn=role)

Execute the pipeline. 

In [None]:
execution = pipeline.start()

## 5. Deploy the model after approval

In [None]:
#TODO