# Training Object Detection Models in SageMaker with Augmented Manifests

This notebook demonstrates the use of an "augmented manifest" to train an object detection machine learning model with AWS SageMaker.

## Setup

Here we define S3 file paths for input and output data, the training image containing the semantic segmentation algorithm, and instantiate a SageMaker session.

In [None]:
! date

In [None]:
import boto3
import re
import sagemaker
from sagemaker.image_uris import retrieve
from sagemaker import get_execution_role
import time
from time import gmtime, strftime
import json

role = get_execution_role()
sess = sagemaker.Session()
s3 = boto3.resource("s3")

training_image = retrieve(region=sess.boto_region_name, framework="object-detection", version="latest")

bucket = 'gov-agm-temp'
prefix = 'obj_detect'
s3_output_location = "s3://{}/{}/output".format(bucket, prefix)

### Required Inputs

*Be sure to edit the file names and paths below for your own use!*

In [None]:

augmented_manifest_filename_train = (
    "training_manifest_with_validation_mod.json"  # Replace with the filename for your training data.
)
augmented_manifest_filename_validation = (
    "testing_manifest_with_validation_mod.json"  # Replace with the filename for your validation data.
)
bucket_name = "ml-materials"  # Replace with your bucket name.
s3_prefix = "object_detection_dataset/labeled_data/LabelJob1/manifests/output"  # Replace with the S3 prefix where your data files reside.
s3_output_path = "s3://{}/output".format(bucket_name)  # Replace with your desired output directory.

The setup section concludes with a few more definitions and constants.

In [None]:
# Defines paths for use in the training job request.
s3_train_data_path = "s3://{}/{}/{}".format(
    bucket_name, s3_prefix, augmented_manifest_filename_train
)
s3_validation_data_path = "s3://{}/{}/{}".format(
    bucket_name, s3_prefix, augmented_manifest_filename_validation
)

print("Augmented manifest for training data: {}".format(s3_train_data_path))
print("Augmented manifest for validation data: {}".format(s3_validation_data_path))

### Understanding the Augmented Manifest format

Augmented manifests provide two key benefits. First, the format is consistent with that of a labeling job output manifest. This means that you can take your output manifests from a Ground Truth labeling job and, whether the dataset objects were entirely human-labeled, entirely machine-labeled, or anything in between, and use them as inputs to SageMaker training jobs - all without any additional translation or reformatting! Second, the dataset objects and their corresponding ground truth labels/annotations are captured *inline*. This effectively reduces the required number of channels by half, since you no longer need one channel for the dataset objects alone and another for the associated ground truth labels/annotations.

The augmented manifest format is essentially the [json-lines format](http://jsonlines.org/), also called the new-line delimited JSON format. This format consists of an arbitrary number of well-formed, fully-defined JSON objects, each on a separate line. Augmented manifests must contain a field that defines a dataset object, and a field that defines the corresponding annotation. Let's look at an example for an object detection problem.

The Ground Truth output format is discussed more fully for various types of labeling jobs in the [official documenation](https://docs.aws.amazon.com/sagemaker/latest/dg/sms-data-output.html).

{<span style="color:blue">"source-ref"</span>: "s3://bucket_name/path_to_a_dataset_object.jpeg", <span style="color:blue">"labeling-job-name"</span>: {"annotations":[{"class_id":"0",`<bounding box dimensions>`}],"image_size":[{`<image size simensions>`}]}

The first field will always be either `source` our `source-ref`. This defines an individual dataset object. The name of the second field depends on whether the labeling job was created from the SageMaker console or through the Ground Truth API. If the job was created through the console, then the name of the field will be the labeling job name. Alternatively, if the job was created through the API, then this field maps to the `LabelAttributeName` parameter in the API. 

The training job request requires a parameter called `AttributeNames`. This should be a two-element list of strings, where the first string is "source-ref", and the second string is the label attribute name from the augmented manifest. This corresponds to the <span style="color:blue">blue text</span> in the example above. In this case, we would define `attribute_names = ["source-ref", "labeling-job-name"]`.

*Be sure to carefully inspect your augmented manifest so that you can define the `attribute_names` variable below.*

### Preview Input Data

Let's read the augmented manifest so we can inspect its contents to better understand the format.

In [None]:
augmented_manifest_s3_key = s3_train_data_path.split(bucket_name)[1][1:]
s3_obj = s3.Object(bucket_name, augmented_manifest_s3_key)
augmented_manifest = s3_obj.get()["Body"].read().decode("utf-8")
augmented_manifest_lines = augmented_manifest.split("\n")

num_training_samples = len(
    augmented_manifest_lines
)  # Compute number of training samples for use in training job request.

def json_pretty_print(jsonline):
    return json.dumps(json.loads(jsonline),indent=2)

print("Preview of Augmented Manifest File Contents")
print("-------------------------------------------")
print("\n")

for i in range(2):
    print("Line {}".format(i + 1))
    print(json_pretty_print(augmented_manifest_lines[i]))
    print("\n")

The key feature of the augmented manifest is that it has both the data object itself (i.e., the image), and the annotation in-line in a single JSON object. Note that the `annotations` keyword contains dimensions and coordinates (e.g., width, top, height, left) for bounding boxes! The augmented manifest can contain an arbitrary number of lines, as long as each line adheres to this format.

Let's discuss this format in more detail by descibing each parameter of this JSON object format.

* The `source-ref` field defines a single dataset object, which in this case is an image over which bounding boxes should be drawn. Note that the name of this field is arbitrary. 
* The `object-detection-job-name` field defines the ground truth bounding box annotations that pertain to the image identified in the `source-ref` field. As mentioned above, note that the name of this field is arbitrary. You must take care to define this field in the `AttributeNames` parameter of the training job request, as shown later on in this notebook.
* Because this example augmented manifest was generated through a Ground Truth labeling job, this example also shows an additional field called `object-detection-job-name-metadata`. This field contains various pieces of metadata from the labeling job that produced the bounding box annotation(s) for the associated image, e.g., the creation date, confidence scores for the annotations, etc. This field is ignored during the training job. However, to make it as easy as possible to translate Ground Truth labeling jobs into trained SageMaker models, it is safe to include this field in the augmented manifest you supply to the training job.

In [None]:
attribute_names = ["source-ref", "nugdms3-train_BB"]  # Replace as appropriate for your augmented manifest.

# Create Training Job

First, we'll construct the request for the training job.

In [None]:
try:
    if attribute_names == ["source-ref", "XXXX"]:
        raise Exception(
            "The 'attribute_names' variable is set to default values. Please check your augmented manifest file for the label attribute name and set the 'attribute_names' variable accordingly."
        )
except NameError:
    raise Exception(
        "The attribute_names variable is not defined. Please check your augmented manifest file for the label attribute name and set the 'attribute_names' variable accordingly."
    )


Now we create the Amazon SageMaker training job.

In [None]:
od_model = sagemaker.estimator.Estimator(
    training_image,
    role,
    instance_count=1,
    instance_type="ml.p3.8xlarge",
    volume_size=50,
    max_run=360000,
    input_mode="Pipe",
    output_path=s3_output_location,
    sagemaker_session=sess,
)

In [None]:

od_model.set_hyperparameters(
    base_network="resnet-50",
    use_pretrained_model=1,
    num_classes=3,
    mini_batch_size=32,
    epochs=200,
    learning_rate=0.01,
    optimizer="sgd",
    image_shape=512,
    num_training_samples=str(num_training_samples),
)

In [None]:
from sagemaker.session import TrainingInput

train_data = TrainingInput(
    s3_train_data_path,    
    distribution="FullyReplicated",
    content_type="application/x-recordio",
    s3_data_type="AugmentedManifestFile",
    record_wrapping="RecordIO",
    attribute_names=attribute_names
)

validation_data = TrainingInput(
    s3_validation_data_path,    
    distribution="FullyReplicated",
    content_type="application/x-recordio",
    s3_data_type="AugmentedManifestFile",
    record_wrapping="RecordIO",
    attribute_names=attribute_names
)

data_channels = {"train": train_data, "validation": validation_data}

In [None]:
%%time
od_model.fit(inputs=data_channels, logs=False)

## Deploy

In [None]:
import sagemaker
object_detector = od_model.deploy(
                    initial_instance_count = 1, 
                    instance_type = 'ml.m4.xlarge', 
                    serializer = sagemaker.serializers.IdentitySerializer('image/png'),
                    deserializer = sagemaker.deserializers.JSONDeserializer())

## Realtime Predictions

Lets first define some functions to help visualize the results<br>
First of all, in machine learning you're dealing with a numeric matrix and the predictions for the labels are the numeric labels assigned when we provided the label data<br>
i.e.<br>
    0   -   Person<br>
    1   -   StackedBoxes<br>
    2   -   ForkLift<br>

In [None]:
# List of the mappings from label number to label
object_categories = ['Person', 'StackedBoxes', 'Forklift']

We have added a set of utilities to help visualize the results. All contained in the included visualize_utils.py file

In [None]:
# We are loading  some utilities to help use visualize results from the predictions
import visualize_utils

Now lets run a prediction against the endpoint and see the results (JSON output)

In [None]:
bucket = s3.Bucket('ml-materials')
img = bucket.Object(f'object_detection_dataset/source_images/image_0000240.5.png').get().get('Body').read()
object_detector.predict(img)

The JSON Results are as a list of lists.<br> Each list contains 6 comma seperated numbers<br>
**Label<br>
Confidence<br>
BoundBox Top Left X (% of image width)<br>
BoundBox Top Left Y (% of image height)<br>
BoundBox Width (% of image width)<br>
BoundBox Height (% of image height)**<br>
<br>As you can see you get a lot of results with a vast range of confidence for each. So we need to filter out the ones we are interested. We do this by defining a threshold (e.g. only show entries with confidence > 0.2) <br>
Lets view a few images and the predictions with a threshold of 0.2 (>20% confidence)

In [None]:
#test_photos = ['image_0000112.0.png','image_0000130.0.png','image_0000140.5.png','image_0000231.5.png']
test_photos = ['image_0000212.0.png','image_0000230.0.png','image_0000240.5.png','image_0000241.5.png']

In [None]:
visualize_utils.predictions_image_list(object_detector, test_photos, 0.2)

Lets use a different function to view image ranges with the same threshold

In [None]:
video_seq_list = ["197-217","235-249"]

In [None]:
visualize_utils.predictions_image_sequences(object_detector, video_seq_list, 0.2)

Lets now view the output predictions (that you would be analyzing across time sequences to assess the action being viewed)

We will look at the video frame sequence list we defined earlier. Before we run any predictions, lets make sure the image exists or find one near by.

In [None]:
na = []
for times in video_seq_list:
    tsplit = times.split('-')
    tstart = tsplit[0]
    tend = tsplit[1]
    for v in range(int(tstart),int(tend),2):
        fname = f'image_{v:07}.0.png'
        checkfile = visualize_utils.s3_objexist('ml-materials',f'object_detection_dataset/source_images/{fname}')
        if not checkfile:
            fname = f'image_{v:07}.5.png'
            checkfile = visualize_utils.s3_objexist('ml-materials',f'object_detection_dataset/source_images/{fname}')
        print(f"s3://ml-materials/object_detection_dataset/source_images/{fname} {checkfile}")
        na.append(fname)

Now we have a list of images that do exist, let do a prediction and view the results<br>
The verbalize_results function will run the prediction and output the the interpreted label name, absolute coordinates of the bounding box and confidence of the prediction 

In [None]:
for image in na:
    results, detection_filtered = visualize_utils.only_predict(image, object_detector, threshold=0.2)
    print(image)
    visualize_utils.verbalize_results(image, results, 0.3)

## Delete Endpoint

In [None]:
object_detector.delete_endpoint()

# Conclusion

That's it! Let's review what we've learned. 
* Augmented manifests are a new format that provide a seamless interface between Ground Truth labeling jobs and SageMaker training jobs. 
* In augmented manifests, you specify the dataset objects and the associated annotations in-line.
* Be sure to pay close attention to the `AttributeNames` parameter in the training job request. The strings you specifuy in this field must correspond to those that are present in your augmented manifest.

In [None]:
! date