# Amazon SageMaker Object Detection using the Image and JSON format

1. [Introduction](#Introduction)
2. [Setup](#Setup)
3. [Specifying input Dataset](#Specifying-input-Dataset)
4. [Training](#Training)

## Introduction

Object detection is the process of identifying and localizing objects in an image. A typical object detection solution takes in an image as input and provides a bounding box on the image where an object of interest is, along with identifying what object the box encapsulates. But before we have this solution, we need to process a traning dataset, create and setup a training job for the alorithm so that the aglorithm can learn about the dataset and then host the algorithm as an endpoint, to which we can supply the query image.

This notebook focuses on using the built-in SageMaker Single Shot multibox Detector ([SSD](https://arxiv.org/abs/1512.02325)) object detection algorithm to train model on your custom dataset. For dataset prepration or using the model for inference, please see other scripts in this folder

## Setup

To train the Object Detection algorithm on Amazon SageMaker, we need to setup and authenticate the use of AWS services. To begin with we need an AWS account role with SageMaker access. This role is used to give SageMaker access to your data in S3. In this example, we will use the same role that was used to start this SageMaker notebook.

In [1]:
%%time
import sagemaker
from sagemaker import get_execution_role

role = get_execution_role()
print(role)
sess = sagemaker.Session()

arn:aws:iam::735324722473:role/service-role/AmazonSageMaker-ExecutionRole-20181211T153495
CPU times: user 829 ms, sys: 84.4 ms, total: 913 ms
Wall time: 983 ms


We also need the S3 bucket that has the training manifests and will be used to store the tranied model artifacts. 

In [2]:
bucket = 'greengrass-object-detection-blog' # custom bucket name.
prefix = 'demo'

In [3]:
from sagemaker.amazon.amazon_estimator import get_image_uri

# This retrieves a docker container with the built in object detection SSD model. 
training_image = get_image_uri(sess.boto_region_name, 'object-detection', repo_version="latest")
print (training_image)

825641698319.dkr.ecr.us-east-2.amazonaws.com/object-detection:latest


## Specifying input Dataset

This notebook assumes you already have prepared two [Augmented Manifest Files] (https://docs.aws.amazon.com/sagemaker/latest/dg/augmented-manifest.html) as training and validation input data for the object detection model.  

There are many advantages to using **augmented manifest files** for your training input

* No format conversion is required if you are using SageMaker Ground Truth to generate the data labels
* Unlike the traditional approach of providing paths to the input images separately from its labels, augmented manifest file already combines both into one entry for each input image, reducing complexity in algorithm code for matching each image with labels. (Read this blog post (https://aws.amazon.com/blogs/machine-learning/easily-train-models-using-datasets-labeled-by-amazon-sagemaker-ground-truth/) for more explanation.) 
* When splitting your dataset for train/validation/test, you don't need to rearrange and re-upload image files to different s3 prefixes for train vs validation. Once you upload your image files to S3, you never need to move it again. You can just place pointers to these images in your augmented manifest file for training and validation. More on the train/validation data split in this post later. 
* When using augmented manifest file, the training input images is loaded on to the training instance in *Pipe mode,* which means the input data is streamed directly to the training algorithm while it is running (vs. File mode, where all input files need to be downloaded to disk before the training starts). This results in faster training performance and less disk resource utilization. Read more in this blog post (https://aws.amazon.com/blogs/machine-learning/accelerate-model-training-using-faster-pipe-mode-on-amazon-sagemaker/) on the benefits of pipe mode.


In [32]:
train_data_prefix = "simple-train"
s3_train_data= "s3://{}/training-manifest/{}/augmented.manifest".format(bucket, train_data_prefix)
s3_validation_data = "s3://{}/training-manifest/{}/validation.manifest".format(bucket, train_data_prefix)
print("Train data: {}".format(s3_train_data) )
print("Validation data: {}".format(s3_validation_data) )

Train data: s3://tanmcrae-greengrass-blog/training-manifest/simple-train/augmented.manifest
Validation data: s3://tanmcrae-greengrass-blog/training-manifest/simple-train/validation.manifest


In [33]:
train_data = sagemaker.session.s3_input(s3_train_data, 
                                        distribution='FullyReplicated', 
                                        content_type='application/x-recordio', 
                                        record_wrapping='RecordIO',
                                        s3_data_type='AugmentedManifestFile', 
                                        attribute_names=['source-ref', 'bb'])

validation_data = sagemaker.session.s3_input(s3_validation_data, 
                                             distribution='FullyReplicated', 
                                             content_type='application/x-recordio', 
                                             record_wrapping='RecordIO',
                                             s3_data_type='AugmentedManifestFile', 
                                             attribute_names=['source-ref', 'bb'])

In [34]:
s3_output_location = 's3://{}/{}/output'.format(bucket, prefix)
s3_output_location

's3://tanmcrae-greengrass-blog/blog-demo/output'

In [35]:
!aws s3 cp $s3_train_data .

download: s3://tanmcrae-greengrass-blog/training-manifest/simple-train/augmented.manifest to ./augmented.manifest


In [36]:
import json
import os 

def read_manifest_file(file_path):
    with open(file_path, 'r') as f:
        output = [json.loads(line.strip()) for line in f.readlines()]
        return output
    
    
train_data = read_manifest_file(os.path.split(s3_train_data)[1])
num_training_samples =  len(train_data)
num_training_samples

6130

## Training
Now that we are done with all the setup that is needed, we are ready to train our object detector. To begin, let us create a ``sageMaker.estimator.Estimator`` object. This estimator will launch the training job.

In [17]:
od_model = sagemaker.estimator.Estimator(training_image,
                                         role, 
                                         train_instance_count=1, 
                                         train_instance_type='ml.p3.2xlarge',
                                         train_volume_size = 50,
                                         train_max_run = 360000,
                                         input_mode = 'Pipe',
                                         output_path=s3_output_location,
                                         sagemaker_session=sess)

The object detection algorithm at its core is the [Single-Shot Multi-Box detection algorithm (SSD)](https://arxiv.org/abs/1512.02325). This algorithm uses a `base_network`, which is typically a [VGG](https://arxiv.org/abs/1409.1556) or a [ResNet](https://arxiv.org/abs/1512.03385). (resnet is typically faster so for edge inferences, I'd recommend using this base network). The Amazon SageMaker object detection algorithm supports VGG-16 and ResNet-50 now. It also has a lot of options for hyperparameters that help configure the training job. The next step in our training, is to setup these hyperparameters and data channels for training the model. Consider the following example definition of hyperparameters. See the SageMaker Object Detection [documentation](https://docs.aws.amazon.com/sagemaker/latest/dg/object-detection.html) for more details on the hyperparameters.

To figure out which works best for your data, run a hyperparameter tuning job. There's some example notebooks at [https://github.com/awslabs/amazon-sagemaker-examples](https://github.com/awslabs/amazon-sagemaker-examples) that you can use for reference. 

In [37]:
# This is where transfer learning happens. We use the pre-trained model and nuke the output layer by specifying
# the num_classes value. You can also run a hyperparameter tuning job to figure out which values work the best. 
od_model.set_hyperparameters(base_network='resnet-50',
                             use_pretrained_model=1,
                             num_classes=2,
                             mini_batch_size=16,
                             epochs=30,
                             learning_rate=0.001,
                             lr_scheduler_step='10,20',
                             lr_scheduler_factor=0.1,
                             optimizer='sgd',
                             momentum=0.9,
                             weight_decay=0.0005,
                             overlap_threshold=0.5,
                             nms_threshold=0.45,
                             image_shape=512,
                             label_width=150,
                             num_training_samples=num_training_samples)

Now that the hyperparameters are setup, let us prepare the handshake between our data channels and the algorithm. To do this, we need to create the `sagemaker.session.s3_input` objects from our data channels. These objects are then put in a simple dictionary, which the algorithm consumes. Notice that here we use a `content_type` as `image/jpeg` for the image channels and the annoation channels. Notice how unlike the [RecordIO format](https://github.com/awslabs/amazon-sagemaker-examples/blob/master/introduction_to_amazon_algorithms/object_detection_pascalvoc_coco/object_detection_recordio_format.ipynb), we use four channels here.

In [22]:
data_channels = {'train': train_data, 'validation': validation_data}

We have our `Estimator` object, we have set the hyperparameters for this object and we have our data channels linked with the algorithm. The only remaining thing to do is to train the algorithm. The following cell will train the algorithm. Training the algorithm involves a few steps. Firstly, the instances that we requested while creating the `Estimator` classes are provisioned and are setup with the appropriate libraries. Then, the docker container with the training code is downloaded to the instance. Once this is done, the training job begins. Using `pipe` input mode, the input data will be streamed to the instance as training happens. The data logs will also print out Mean Average Precision (mAP) on the validation data, among other losses, for every run of the dataset once or one epoch. This metric is a proxy for the quality of the algorithm. 

Once the job has finished a "Job complete" message will be printed. The trained model can be found in the S3 bucket that was setup as `output_path` in the estimator.

In [None]:
od_model.fit(inputs=data_channels, logs=True)