# Amazon SageMaker Object Detection using the RecordIO format


## Introduction

Object detection is the process of identifying and localizing objects in an image. A typical object detection solution takes in an image as input and provides a bounding box on the image where a object of interest is along with identifying what object the box encapsulates. But before we have this solution, we need to acquire and process a traning dataset, create and setup a training job for the alorithm so that the aglorithm can learn about the dataset and then host the algorithm as an endpoint, to which we can supply the query image.

This notebook is an end-to-end example introducing the Amazon SageMaker Object Detection algorithm. In this demo, we will demonstrate how to train and to host an object detection model on the [Pascal VOC dataset](http://host.robots.ox.ac.uk/pascal/VOC/) using the Single Shot multibox Detector ([SSD](https://arxiv.org/abs/1512.02325)) algorithm. In doing so, we will also demonstrate how to construct a training dataset using the RecordIO format as this is the format that the training job will consume. We will also demonstrate how to host and validate this trained model. Amazon SageMaker Object Detection also allow training with the image and JSON format, which is illustrated in the [image and JSON Notebook](https://github.com/awslabs/amazon-sagemaker-examples/blob/master/introduction_to_amazon_algorithms/object_detection_pascalvoc_coco/object_detection_image_json_format.ipynb).

## Setup

To train the Object Detection algorithm on Amazon SageMaker, we need to setup and authenticate the use of AWS services. To begin with we need an AWS account role with SageMaker access. This role is used to give SageMaker access to your data in S3 will automatically be obtained from the role used to start the notebook.

In [1]:
%%time
import sagemaker
from sagemaker import get_execution_role

role = get_execution_role()
print(role)
sess = sagemaker.Session()

arn:aws:iam::507922848584:role/service-role/AmazonSageMaker-ExecutionRole-20191025T081132
CPU times: user 870 ms, sys: 165 ms, total: 1.03 s
Wall time: 13.7 s


We also need the S3 bucket that you want to use for training and to store the tranied model artifacts. In this notebook, we require a custom bucket that exists so as to keep the naming clean. You can end up using a default bucket that SageMaker comes with as well.

In [2]:
bucket = "aws-ml-demo-2020" # custom bucket name.
prefix = "DEMO-ObjectDetection"

Lastly, we need the Amazon SageMaker Object Detection docker image, which is static and need not be changed.

In [4]:
import boto3
from sagemaker import image_uris 

region_name = boto3.Session().region_name
training_image = image_uris.retrieve('object-detection',region_name)
print (training_image)

811284229777.dkr.ecr.us-east-1.amazonaws.com/object-detection:1


## Data Preparation
[Pascal VOC](http://host.robots.ox.ac.uk/pascal/VOC/) was a popular computer vision challenge and they released annual challenge datasets for object detection from 2005 to 2012. In this notebook, we will use the data sets from 2007 and 2012, named as VOC07 and VOC12 respectively. Cumulatively, we have more than 20,000 images containing about 50,000 annotated objects. These annotated objects are grouped into 20 categories.

While using the Pascal VOC dateset, please be aware of the database usage rights:
"The VOC data includes images obtained from the "flickr" website. Use of these images must respect the corresponding terms of use: 
* "flickr" terms of use (https://www.flickr.com/help/terms)"

### Download data
Let us download the Pascal VOC datasets from 2007 and 2012.

In [None]:
%%time

# Download the dataset
!wget -P /tmp http://pjreddie.com/media/files/VOCtrainval_11-May-2012.tar
!wget -P /tmp http://pjreddie.com/media/files/VOCtrainval_06-Nov-2007.tar
!wget -P /tmp http://pjreddie.com/media/files/VOCtest_06-Nov-2007.tar

# # Extract the data.
!tar -xf /tmp/VOCtrainval_11-May-2012.tar && rm /tmp/VOCtrainval_11-May-2012.tar
!tar -xf /tmp/VOCtrainval_06-Nov-2007.tar && rm /tmp/VOCtrainval_06-Nov-2007.tar
!tar -xf /tmp/VOCtest_06-Nov-2007.tar && rm /tmp/VOCtest_06-Nov-2007.tar

In [15]:
!ls VOCdevkit/

VOC2007  VOC2012


### Convert data into RecordIO
[RecordIO](https://mxnet.incubator.apache.org/architecture/note_data_loading.html) is a highly efficient binary data format from [MXNet](https://mxnet.incubator.apache.org/) that makes it easy and simple to prepare the dataset and transfer to the instance that will run the training job. To generate a RecordIO file, we will use the tools from MXNet. The provided tools will first generate a list file and then use the [im2rec tool](https://github.com/apache/incubator-mxnet/blob/master/tools/im2rec.py) to create the [RecordIO](https://mxnet.incubator.apache.org/architecture/note_data_loading.html) file. More details on how to generate RecordIO file for object detection task, see the [MXNet example](https://github.com/apache/incubator-mxnet/tree/master/example/ssd).

We will combine the training and validation sets from both 2007 and 2012 as the training data set, and use the test set from 2007 as our validation set.

In [16]:
!python tools/prepare_dataset.py --dataset pascal --year 2007,2012 --set trainval --target VOCdevkit/train.lst
# !rm -rf VOCdevkit/VOC2012
!python tools/prepare_dataset.py --dataset pascal --year 2007 --set test --target VOCdevkit/val.lst --no-shuffle
# !rm -rf VOCdevkit/VOC2007

saving list to disk...
List file VOCdevkit/train.lst generated...
Creating .rec file from /home/ec2-user/SageMaker/object_detection_pascalvoc_coco_2020-11-24/VOCdevkit/train.lst in /home/ec2-user/SageMaker/object_detection_pascalvoc_coco_2020-11-24/VOCdevkit
time: 0.048779964447021484  count: 0
time: 2.0392513275146484  count: 1000
time: 2.0881295204162598  count: 2000
time: 1.9781060218811035  count: 3000
time: 2.033107042312622  count: 4000
time: 2.014026403427124  count: 5000
time: 2.0233044624328613  count: 6000
time: 2.0468826293945312  count: 7000
time: 2.016981601715088  count: 8000
time: 2.021055221557617  count: 9000
time: 2.0359718799591064  count: 10000
time: 2.0449962615966797  count: 11000
time: 1.9941303730010986  count: 12000
time: 2.031057834625244  count: 13000
time: 2.030531644821167  count: 14000
time: 2.0220155715942383  count: 15000
time: 2.164947748184204  count: 16000
Record file VOCdevkit/train.rec generated...
saving list to disk...
List file VOCdevkit/val.lst 

### Upload data to S3
Upload the data to the S3 bucket. We do this in multiple channels. Channels are simply directories in the bucket that differentiate between training and validation data. Let us simply call these directories `train` and `validation`.

In [17]:
bucket = "aws-ml-demo-2020" # custom bucket name.
prefix = "DEMO-ObjectDetection"

In [18]:
%%time

# Upload the RecordIO files to train and validation channels
train_channel = prefix + '/train'
validation_channel = prefix + '/validation'

sess.upload_data(path='VOCdevkit/train.rec', bucket=bucket, key_prefix=train_channel)
sess.upload_data(path='VOCdevkit/val.rec', bucket=bucket, key_prefix=validation_channel)

s3_train_data = 's3://{}/{}'.format(bucket, train_channel)
s3_validation_data = 's3://{}/{}'.format(bucket, validation_channel)
s3_output_location = 's3://{}/{}/output'.format(bucket, prefix)

CPU times: user 31.1 s, sys: 15.3 s, total: 46.4 s
Wall time: 35.8 s
