# Object Detection with AWS: A Demo with Boots and Cats

This series of notebooks demonstrates tackling a sample computer vision problem on AWS - building a two-class object detector for [boots and cats](https://www.youtube.com/watch?v=Nni0rTLg5B8).

**This notebook** walks through using the [SageMaker Ground Truth](https://aws.amazon.com/sagemaker/groundtruth/) tool to annotate training and validation data sets.

**Follow-on** notebooks show how to train a range of models from the created dataset, including:

* [Amazon Rekognition](https://aws.amazon.com/rekognition/)'s new [custom labels](https://aws.amazon.com/rekognition/custom-labels-features/) functionality, announced at Re:Invent 2019
* SageMaker's [built-in object detection algorithm](https://docs.aws.amazon.com/sagemaker/latest/dg/object-detection.html)
* Custom algorithms implemented on SageMaker's **framework containers**, such as MXNet and TensorFlow.

# Boots 'n' Cats 1: Introduction and Data Preparation

## Acknowledgements

We use the [**Open Images Dataset v4**](https://storage.googleapis.com/openimages/web/download_v4.html) as a convenient source of pre-curated images. The Open Images Dataset V4 is created by Google Inc. We have not modified the images or the accompanying annotations. You can obtain the images and the annotations [here](https://storage.googleapis.com/openimages/web/download_v4.html). The annotations are licensed by Google Inc. under CC BY 4.0 license. The images are listed as having a CC BY 2.0 license. The following paper describes Open Images V4 in depth: from the data collection and annotation to detailed statistics about the data and evaluation of models trained on it.

A. Kuznetsova, H. Rom, N. Alldrin, J. Uijlings, I. Krasin, J. Pont-Tuset, S. Kamali, S. Popov, M. Malloci, T. Duerig, and V. Ferrari. The Open Images Dataset V4: Unified image classification, object detection, and visual relationship detection at scale. arXiv:1811.00982, 2018. ([link to PDF](https://arxiv.org/abs/1811.00982))

## Pre-requisites

This notebook is designed to be run in Amazon SageMaker. To complete this workshop (and understand what's going on), you'll need:

* Basic familiarity with Python, [AWS S3](https://docs.aws.amazon.com/s3/index.html), [Amazon Sagemaker](https://aws.amazon.com/sagemaker/), and the [AWS Command Line Interface (CLI)](https://aws.amazon.com/cli/).
* To run in **a region where [Rekognition Custom Labels](https://aws.amazon.com/rekognition/custom-labels-features/) is available** - Currently US East (N.Virginia), US East (Ohio), US West (Oregon), and EU (Ireland)) - if you plan to explore this feature.
* Sufficient [SageMaker quota limits](https://docs.aws.amazon.com/general/latest/gr/aws_service_limits.html#limits_sagemaker) set on your account to run GPU-accelerated training jobs.


## Cost and runtime

Depending on your configuration, this demo may consume resources outside of the free tier - but should not generally be expensive, as we'll be training on a small number of images. You might wish to review the following for your region:

* [Amazon SageMaker pricing](https://aws.amazon.com/sagemaker/pricing/)
* [SageMaker Ground Truth pricing](https://aws.amazon.com/sagemaker/groundtruth/pricing/)
* [Amazon Rekognition pricing](https://aws.amazon.com/rekognition/pricing/)

The standard `ml.t2.medium` instance should be sufficient to run the notebooks.

We will use GPU-accelerated instance types for training and hyperparameter optimization, and use spot instances where appropriate to optimize these costs.

As noted in the step-by-step guidance, you should take particular care to delete any created SageMaker real-time prediction endpoints when finishing the demo.


## Step 0: Dependencies and configuration

As usual we'll start by loading libraries, defining configuration, and connecting to the AWS SDKs.


In [None]:
%load_ext autoreload
%autoreload 1

# Built-Ins:
import os
import json
import warnings

# External Dependencies:
import boto3
import sagemaker
from IPython.display import display, HTML

# Local Dependencies:
# We define some functions in the `util` folder to simplify data preparation and
# visualization for the notebook.
%aimport util


Next we configure the name and layout of your bucket, and the annotation job to set up.

**If you're following this demo in a group:** you can pool your annotations for better accuracy without spending hours annotating:

* Have each group member set a different `BATCH_OFFSET` (from 0, in increments of `N_EXAMPLES_PER_CLASS`), and you'll be allocated different images to annotate.
* Later, you can *import* the other members' output manifest files to your own S3 data set.

**If not: don't worry** - we already provide a 100-image set in this repository to augment your annotations!


In [None]:
## Overall S3 bucket layout:
BUCKET_NAME = sagemaker.Session().default_bucket()  # (Or an existing bucket's name, if you prefer)
%store BUCKET_NAME
DATA_PREFIX = "data"  # The folder in the bucket (and locally) where we will store data
%store DATA_PREFIX
MODELS_PREFIX = "models"  # The folder in the bucket where we will store models
%store MODELS_PREFIX
CHECKPOINTS_PREFIX = "models/checkpoints"  # Model checkpoints can go in a subfolder of models
%store CHECKPOINTS_PREFIX

## Annotation job:
CLASS_NAMES = ["Boot", "Cat"]
%store CLASS_NAMES
N_EXAMPLES_PER_CLASS = 20
BATCH_OFFSET = 0
BATCH_NAME = "my-annotations"

# Note that some paths are reserved, restricting your choice of BATCH_NAME:
data_raw_prefix = f"{DATA_PREFIX}/raw"
data_augment_prefix = f"{DATA_PREFIX}/augmentation"
data_batch_prefix = f"{DATA_PREFIX}/{BATCH_NAME}"
test_image_folder = f"{DATA_PREFIX}/test"
%store test_image_folder


Here we just connect to the AWS SDKs we'll use, and validate the choice of S3 bucket:

In [None]:
role = sagemaker.get_execution_role()
session = boto3.session.Session()
region = session.region_name
s3 = session.resource("s3")
bucket = s3.Bucket(BUCKET_NAME)
smclient = session.client("sagemaker")

bucket_region = \
    session.client("s3").head_bucket(Bucket=BUCKET_NAME) \
    ["ResponseMetadata"]["HTTPHeaders"]["x-amz-bucket-region"]
assert (
    bucket_region == region
), f"Your S3 bucket {BUCKET_NAME} and this notebook need to be in the same AWS region."

if region not in ("eu-west-1", "us-east-1", "us-east-2", "us-west-2"):
    warnings.warn(
        f"**WARNING:**\nCurrent region {region} is not yet supported by Rekognition Custom Labels!"
    )


## Step 1: Set the goalposts with some unlabelled target data

Let's start out by collecting a handful of images from around the web to illustrate what we'd like to detect.

These images are not licensed and the links may break for different regions / times in future: Feel free to add your own or replace with any other images of boots and cats! Model evaluations in following notebooks will loop through all images in the `test_image_folder`

In [None]:
os.makedirs(test_image_folder, exist_ok=True)
!wget -O $test_image_folder/tabby.jpg https://images.fineartamerica.com/images-medium-large-5/1990s-ginger-and-white-tabby-cat-animal-images.jpg
!wget -O $test_image_folder/beatbox.jpg https://midnightmusic.com.au/wp-content/uploads/2014/08/How-to-beatbox-5001.png
!wget -O $test_image_folder/ampersand.jpg https://i.ytimg.com/vi/DsC5hNYpP9Y/maxresdefault.jpg
!wget -O $test_image_folder/boots.jpg https://d28m5bx785ox17.cloudfront.net/v1/img/w4r1gr5IKcC9tTcJG_vsJVbyjZ_SVKuFf3YBxtrGdFs=/d/l
!wget -O $test_image_folder/cats.jpg https://www.dw.com/image/42582511_401.jpg

for test_image in os.listdir(test_image_folder):
    display(HTML(f"<h4>{test_image}</h4>"))
    util.visualize_detection(f"{test_image_folder}/{test_image}", [], [])


## Step 2: Map our class names to OpenImages class IDs

OpenImages defines a hierarchy of object types (e.g. "swan" is a subtype of "bird"), and references each with a class ID instead of the human-readable name.

Since we want to find images containing boots and cats, our first job is to figure what OpenImages class IDs they correspond to.

In [None]:
# Download and process the Open Images metadata:
annotations, class_descriptions, ontology = util.openimages.download_openimages_metadata(data_raw_prefix)

# Map our configured CLASS_NAMES to sets of Open Images class IDs:
class_id_sets = util.openimages.class_names_to_openimages_ids(CLASS_NAMES, class_descriptions, ontology)
print(class_id_sets)


## Step 3: Find suitable example images

Now we've looked up the full range of applicable label IDs, we can use the OpenImages annotations to extract which image IDs will be interesting for us to train on (i.e. they contain boots and/or cats).


In [None]:
# Skip these images with known bad quality content:
SKIP_IMAGES = { "251d4c429f6f9c39", "065ad49f98157c8d" }

image_ids = util.openimages.list_images_containing(
    class_id_sets,
    annotations,
    N_EXAMPLES_PER_CLASS,
    SKIP_IMAGES,
    BATCH_OFFSET
)
print(f"Found {len(image_ids)} images")
print(image_ids)


## Step 4: Upload images and manifest file to S3

We need our training image data in an accessible S3 bucket, and a [manifest](https://docs.aws.amazon.com/sagemaker/latest/dg/sms-data-input.html) file defining for SageMaker Ground Truth (and later our model) what images are in the data set and where to find them.

In the following cell, we:

* Copy each identified image directly from the OpenImages repository to our bucket
* Build up a local manifest file listing all the images
* Upload the manifest file to the bucket

This process should only take a few seconds with small data sets like we're dealing with here.

In [None]:
os.makedirs(f"{data_batch_prefix}/manifests", exist_ok=True)
input_manifest_loc = f"{data_batch_prefix}/manifests/input.manifest"

with open(input_manifest_loc, "w") as f:
    print("Copying images", end="")
    # TODO: Delete existing folder contents?
    for image_id in image_ids:
        print(".", end="")
        dest_key = f"{data_batch_prefix}/images/{image_id}.jpg"
        bucket.copy(
            {
                "Bucket": "open-images-dataset",
                "Key": f"test/{image_id}.jpg"
            },
            dest_key
        )
        f.write(json.dumps({ "source-ref": f"s3://{BUCKET_NAME}/{dest_key}" }) + "\n")
    print("")
    print(f"Images copied to s3://{BUCKET_NAME}/{data_batch_prefix}/images/")

bucket.upload_file(input_manifest_loc, input_manifest_loc)
print(f"Manifest uploaded to s3://{BUCKET_NAME}/{input_manifest_loc}")


## Step 5: Set up the SageMaker Ground Truth labelling job

Now that our images and a manifest file listing them are ready in S3, we'll set up the Ground Truth labelling job **in the [AWS console](https://console.aws.amazon.com)**.

Under *Services* go to *Amazon SageMaker*, and select *Ground Truth > Labeling Jobs* from the side-bar menu on the left.

**Note:** These steps assume you've either never used SageMaker Ground Truth before, or have already set up a Private Workforce that will be suitable for this task. If you have one or more private workforces configured already, but none of them are appropriate for this task, you'll need to go to *Ground Truth > Labeling workforces* **first** to create a new one.

### Job Details

Click the **Create labeling job** button, and you'll be asked to specify job details as follows:

* **Job name:** Choose a name to identify this labelling job, e.g. `boots-and-cats-batch-0`
* **Input data location:** The path to the input manifest file in S3 (see output above)
* **Output data location:** Set this just to the parent folder of the input manifests (e.g. *s3://gt-object-detect-thewsey-us-east-1/data/my-annotations*)
* **IAM role:** If you're not sure whether your existing roles have the sufficient permissions for Ground Truth, select the options to create a new role
* **Task type:** Image > Bounding box

<img src="BlogImages/JobDetailsIntro.png"/>

All other settings can be left as default. Record your choices for the label name and output data location below, because we'll need these later:

In [None]:
my_groundtruth_job_name =  # TODO: e.g. "boots-and-cats-batch-0"?
my_groundtruth_labels = my_groundtruth_job_name  # Shouldn't need to change, if you left this as default
my_groundtruth_output = f"s3://{BUCKET_NAME}/data/my-annotations"  # TODO: **No trailing slash!**


### Workers

On the next screen, we'll configure **who** will annotate our data: Ground Truth allows you to define your own in-house *Private Workforces*; use *Vendor Managed Workforces* for specialist tasks; or use the public workforce provided by *Amazon Mechanical Turk*.

Select **Private** worker type, and you'll be prompted either to select from your existing private workforces, or create a new one if none exist.

To create a new private workforce if you need, simply follow the UI workflow with default settings. It doesn't matter what you call the workforce, and you can create a new Cognito User Group to define the workforce. **Add yourself** to the user pool by adding your email address: You should receive a confirmation email shortly with a temporary password and a link to access the annotation portal.

Automatic data labeling is applicable only for data sets over 1000 samples, so leave this turned **off** for now.

<img src="BlogImages/SelectPrivateWorkforce.png"/>

### Labeling Tool

Since you'll be labelling the data yourself, a brief description of the task should be fine in this case. When using real workforces, it's important to be really clear in this section about the task requirements and best practices - to ensure consistency of annotations between human workers.

For example: In the common case where we see a *pair* of boots from the side and one is almost entirely obscured, how should the image be annotated? Should *model* cats count, or only real ones?

The most important configuration here is to set the *options* to be the same as our `CLASS_NAMES` and in the same order: **Boot, Cat**

<img src="BlogImages/LabellingToolSetup.png"/>

Take some time to explore the other options for configuring the annotation tool; and when you're ready click "Create" to launch the labeling job.

## Step 6: Label those images!

Follow the link you received in your workforce invitation email to the workforce's **labelling portal**, and log in with the default password given in the email (which you'll be asked to change).

If you lose the portal link, you can always retrieve it through the *Ground Truth > Labeling Workforces* menu in the SageMaker console: Near the top of the summary of private workforces.

New jobs can sometimes take a minute or two to appear for workers, but you should soon see a screen like the below. Select the job and click "Start working" to enter the labelling tool.

<img src="BlogImages/LabellingJobsReady.png"/>

Note that you can check on the progress of labelling jobs through the APIs as well as in the AWS console:

In [None]:
smclient.describe_labeling_job(LabelingJobName=my_groundtruth_job_name)["LabelingJobStatus"]


Label all the images in the tool by selecting the class and drawing boxes around the objects, and when done you will be brought back to the (now empty) jobs list screen above.

It may take a few seconds after completing for the job status to update in the AWS console.

When the job shows as complete, run the below code to **download your results:**


In [None]:
# SageMaker Ground Truth will save your actual output manifest to here:
smgt_output_manifest_uri = f"{my_groundtruth_output}/{my_groundtruth_job_name}/manifests/output/output.manifest"
smgt_output_bucket, my_smgt_output_key = util.smgt.s3_uri_to_bucket_and_key(smgt_output_manifest_uri)

# That path's bonkers convoluted, so let's have our local copy somewhere a little more concise:
my_smgt_output_path = f"{data_batch_prefix}/manifests/output.manifest"

print(f"Downlading output manifest:\n{smgt_output_manifest_uri}")
bucket.download_file(my_smgt_output_key, my_smgt_output_path)
print(f"\nGot: {my_smgt_output_path}")

print(f"\nContents:")
with open(my_smgt_output_path, "r") as f:
    print(f.readline()[:-1]) # (Strip trailing newline)
print("...etc.")


## Step 7: Import an additional pre-labelled dataset

This repository contains an example output manifest (100 images) which we can use to augment our data set and improve our model's accuracy (or in case you couldn't finish your labelling job!).

Of course, somebody else's manifest will reference files in their bucket - that we probably don't have access to... So here we **import** these refs (openimages JPEGs) to our own bucket, and create a new manifest file with the updated links.


In [None]:
# Filename to output the translated manifest to (we'll need this later):
augment_manifest_path = f"{data_augment_prefix}/manifests/output/output.updated.manifest"

# Function which, given a ref, will upload the file to our bucket and return the new ref:
import_fn = util.smgt.ManifestRefImporter(
    source=lambda s: s.rpartition("/")[2],
    target=f"s3://{BUCKET_NAME}/{data_augment_prefix}/images/",
    repository="s3://open-images-dataset/test/",
    session=session,
)

# Translate the manifest:
print("Importing augmentation manifest refs...")
util.smgt.translate_manifest_refs(
    source_manifest=f"{data_augment_prefix}/manifests/output/output.manifest",
    target_manifest=augment_manifest_path,
    translator_fn=import_fn,
)
print(f"Augmentation manifest saved to {augment_manifest_path}")

print(f"\nContents:")
with open(augment_manifest_path, "r") as f:
    print(f.readline()[:-1]) # (Strip trailing newline)
print("...etc.")


In [None]:
# If you wanted to import some other manifest file too:
# util.smgt.translate_manifest_refs(
#     source_manifest="data/[SUBFOLDER?]/manifests/output/output.manifest", # Existing manifest file
#     target_manifest="data/[SUBFOLDER?]/manifests/output/output.updated.manifest", # New manifest file
#     translator_fn=import_fn,
# )


## Step 8: Merge the annotated datasets

Now that our own labelling job is complete and we've imported other data to augment our data-set, we'll consolidate the batches together into a single combined manifest file.

Since each annotation job may have had different names, and stored its labels to different fields in the output manifest, our merge will standardize the data to the `"labels"` field:


In [None]:
merged_manifest_data = util.smgt.merge_manifests(
    "labels",
    { "file": augment_manifest_path, "field": "labels" },
    # If you couldn't finish your annotations, comment out the line below:
    { "file": my_smgt_output_path, "field": my_groundtruth_labels },
    shuffle=True,
)

print(f"Got {len(merged_manifest_data)} total samples")

# The standardization above means these are always the attributes training will care about:
attribute_names = ["source-ref", "labels"]
%store attribute_names

# For illustration, this is what an entry in our combined manifest looks like:
print(f"\nMerged manifest contents:")
print(merged_manifest_data[0])
print("...etc.")


In object detection, we don't typically need to provide "negative examples" (images without the target objects), because the regions of each image that **aren't** the target object are already fulfilling that purpose.

Some implementations of some algorithms may accept (and even benefit from) off-target images, but the GluonCV library we use in the final notebook doesn't get along with them - so we will cut them from our dataset:


In [None]:
merged_manifest_data = list(filter(
    lambda d: len(d["labels"]["annotations"]) > 0,
    merged_manifest_data
))

print(f"Got {len(merged_manifest_data)} total samples after filtering out off-target images")


## Step 9: Split training vs validation and upload final manifests

Now we have all our consolidated label sets (and all the referenced images uploaded in our S3 bucket), the final step is to split training vs validation data and upload a manifest for each:

In [None]:
train_test_split_index = round(len(merged_manifest_data)*0.8)
train_data = merged_manifest_data[:train_test_split_index]
validation_data = merged_manifest_data[train_test_split_index:]

n_samples_training = len(train_data)
%store n_samples_training
n_samples_validation = len(validation_data)
%store n_samples_validation

with open(f"{DATA_PREFIX}/train.manifest", "w") as f:
    for line in train_data:
        f.write(json.dumps(line))
        f.write("\n")

with open(f"{DATA_PREFIX}/validation.manifest", "w") as f:
    for line in validation_data:
        f.write(json.dumps(line))
        f.write("\n")

bucket.upload_file(f"{DATA_PREFIX}/train.manifest", f"{DATA_PREFIX}/train.manifest")
print(f"Training manifest uploaded to:\ns3://{BUCKET_NAME}/{DATA_PREFIX}/train.manifest")
bucket.upload_file(f"{DATA_PREFIX}/validation.manifest", f"{DATA_PREFIX}/validation.manifest")
print(f"Validation manifest uploaded to:\ns3://{BUCKET_NAME}/{DATA_PREFIX}/validation.manifest")


## Review

Phew! That felt like a lot of work, but a lot of the steps were hacks for our example:

* To find raw image data for our targets (boots and cats), we mapped our class names to the public OpenImages dataset and used their existing annotations to find relevant images.
* To get a decent data volume without spending forever annotating in the workshop, we merged our Ground Truth annotation results with other augmentation sets.

The useful points to remember are:

* SageMaker Ground Truth (and as we'll see later, many of the built-in algorithms as well) uses **augmented manifests** to define annotated image datasets.
* These manifests are just plain text [JSON Lines](http://jsonlines.org/) files that we can also edit in our own code to do whatever we like from importing/exporting annotations, to stitching together datasets as we did here.
* Once the input manifest is prepared, it only takes a few clicks to define workforce teams and annotation jobs in SageMaker Ground Truth: Which supports other built-in and even custom annotation workflows for a variety of data types and tasks.

Although we didn't use it here due to the dataset size, the [automated labelling](https://docs.aws.amazon.com/sagemaker/latest/dg/sms-automated-labeling.html) feature can drastically cut annotation costs and time on bigger data-sets for the tasks where it's supported (including object detection).

Ground Truth supports validation workflows (typically much faster for humans) as well as labelling; which can be combined with automated labelling in light of the importance of good quality ground truth input to effective machine learning.

In the follow-on notebooks, we'll use the composite training and validation datasets we created here to fit a variety of models and compare their performance. Let's move on to notebook 2(a)!