<div style="text-align: right"> &uarr;   Ensure Kernel is set to  &uarr;  </div><br><div style="text-align: right"> 
conda_mxnet_latest_p37  </div>

# SageMaker Image Classification Built-In Algorithm

## Introduction 
The Amazon SageMaker image classification algorithm is a supervised learning algorithm that supports multi-label classification. It takes an image as input and outputs one or more labels assigned to that image. It uses a convolutional neural network (ResNet) that can be trained from scratch or trained using transfer learning when a large number of training images are not available.

The outline of this notebook is 

1. Prepare images into RecordIO format

2. Train the SageMaker Image Classification built-in algorithm 

3. Create and deploy the model to an endpoint for doing inference 

4. Test realtime inference with the endpoint

5. Do batch inference using SageMaker Batch Transform

Lets start by importing some base libraries and some initial variables

In the cell below, replace **your-unique-bucket-name** with the name of bucket you created in the data-prep notebook

In [None]:
%%time
import boto3
import os
import re
import time
import sagemaker
from sagemaker import get_execution_role
from sagemaker.amazon.amazon_estimator import get_image_uri

role = get_execution_role()

sess = sagemaker.Session()

bucket = 'your-unique-bucket-name'

training_image = sagemaker.image_uris.retrieve(region=boto3.Session().region_name, framework='image-classification')

Find mxnet  so we can use some of the tools to create RecordIO format datasets

In [None]:
imrec = ! find $CONDA_PREFIX -name im2rec.py | grep -v gpu

We now store the location of the MXNet tool im2rec.py

In [None]:
imrec_loc = imrec[0]

## Data Preparation

Lets first list out the folders in our data folder 

In [None]:
! ls -1 ../data

Now we create a folder to store our RecordIO files

In [None]:
! mkdir recordio_dataset

We will now build our train and validation datasets in recordio format
First we generate list files using im2rec.py from mxnet <br>
The output will show the class label and its assigned number (implied from the folder structure)<br>
i.e.<br>
Priority 0<br>
Roundabout 1<br>
Signal 2

In [None]:
! python {imrec_loc} recordio_dataset/train ../data/train --recursive --list --num-thread 8

In [None]:
! python {imrec_loc} recordio_dataset/validation ../data/val --recursive --list --num-thread 8

Now we have generated the list files, we will use them to generate the respective training and validation recordio files

In [None]:
! python {imrec_loc} recordio_dataset/train.lst ../data/train 

In [None]:
! python {imrec_loc} recordio_dataset/validation.lst ../data/val

Now we have the train and validation datasets in recordio format, we will now copy them to our S3 bucket 

In [None]:
s3_train_key = "recordio_dataset/train"
s3_validation_key = "recordio_dataset/validation"

s3_train = 's3://{}/{}/'.format(bucket, s3_train_key)
s3_validation = 's3://{}/{}/'.format(bucket, s3_validation_key)

s3_train_lst = 's3://{}/{}/'.format(bucket, "recordio_dataset/lst/train.lst")
s3_validation_lst = 's3://{}/{}/'.format(bucket, "recordio_dataset/lst/validation.lst")

In [None]:
! aws s3 cp recordio_dataset/train.rec {s3_train}
! aws s3 cp recordio_dataset/train.idx {s3_train}

! aws s3 cp recordio_dataset/validation.rec {s3_validation}
! aws s3 cp recordio_dataset/validation.idx {s3_validation}

! aws s3 cp recordio_dataset/train.lst {s3_train_lst}
! aws s3 cp recordio_dataset/validation.lst {s3_validation_lst}

### Training parameters
There are two kinds of parameters that need to be set for training. The first one are the parameters for the training job. These include:

* **Training instance count**: This is the number of instances on which to run the training. When the number of instances is greater than one, then the image classification algorithm will run in distributed settings. 
* **Training instance type**: This indicates the type of machine on which to run the training. Typically, we use GPU instances for these training 
* **Output path**: This the s3 folder in which the training output is stored

In [None]:
job_name_prefix = 'traffic-image-classification'
job_name = job_name_prefix + '-' + time.strftime('-%Y-%m-%d-%H-%M-%S', time.gmtime())

s3_output_location = 's3://{}/{}/output'.format(bucket, job_name_prefix)
sm_ic_estimator = sagemaker.estimator.Estimator(
    training_image,
    role,
    instance_count=1,
    instance_type="ml.p3.2xlarge",
    volume_size=50,
    max_run=360000,
    input_mode="File",
    output_path=s3_output_location,
    sagemaker_session=sess,
)

### Algorithm parameters

Apart from the above set of parameters, there are hyperparameters that are specific to the algorithm. These are:

* **num_layers**: The number of layers (depth) for the network. We use 18 in this samples but other values such as 50, 152 can be used.
* **use_pretrained_model**: Set to 1 to use pretrained model for transfer learning.
* **image_shape**: The input image dimensions,'num_channels, height, width', for the network. It should be no larger than the actual image size. The number of channels should be same as the actual image.
* **num_classes**: This is the number of output classes for the dataset. We use 3 classes so we set this value to 3
* **mini_batch_size**: The number of training samples used for each mini batch. In distributed training, the number of training samples used per batch will be N * mini_batch_size where N is the number of hosts on which training is run
* **resize**: Resize the image before using it for training. The images are resized so that the shortest side is of this parameter. If the parameter is not set, then the training data is used as such without resizing.
* **epochs**: Number of training epochs
* **learning_rate**: Learning rate for training
* **num_training_samples**: This is the total number of training samples. It is set to 1334 for this dataset

You can find a detailed description of all the algorithm parameters at https://docs.aws.amazon.com/sagemaker/latest/dg/IC-Hyperparameter.html

In [None]:
sm_ic_estimator.set_hyperparameters(
    num_layers=18,
    use_pretrained_model=1,
    image_shape="3,640,640",
    num_classes=3,
    mini_batch_size=64,
    epochs=50,
    learning_rate=0.01,
    num_training_samples=1334,
)

### Input data specification
Set the data type and channels used for training. In this training, we use application/x-recordio content type that require the dataset to be is recordio format and lst file for data input. In addition, Sagemaker image classification algorithm supports application/x-image format 

In [None]:
train_data = sagemaker.inputs.TrainingInput(
    s3_train,
    distribution="FullyReplicated",
    content_type="application/x-recordio",
    s3_data_type="S3Prefix",
)

validation_data = sagemaker.inputs.TrainingInput(
    s3_validation,
    distribution="FullyReplicated",
    content_type="application/x-recordio",
    s3_data_type="S3Prefix",
)

train_data_lst = sagemaker.inputs.TrainingInput(
    s3_train_lst,
    distribution="FullyReplicated",
    content_type="text/plain",
    s3_data_type="S3Prefix",
)

validation_data_lst = sagemaker.inputs.TrainingInput(
    s3_validation_lst,
    distribution="FullyReplicated",
    content_type="text/plain",
    s3_data_type="S3Prefix",
)

data_channels = {
    "train": train_data,
    "validation": validation_data,
    "train_lst": train_data_lst,
    "validation_lst": validation_data_lst,
}

We now can call the fit method with the input channels on the estimator to start the training<br>
**NOTE** This cell takes **16 mins** to run

In [None]:
%%time
sm_ic_estimator.fit(inputs=data_channels, logs=True)

## **NOTE:** <br>
If at this point your kernel disconnects from the server (you can tell because the kernel in the top right hand corner will say **No Kernel**),<br>you can reattach to the training job (so you dont to start the training job again).<br>Follow the steps below
1. Scoll your notebook to the top and set the kernel to the recommended kernel specified in the top right hand corner of the notebook
2. Go to your SageMaker console, Go to Training Jobs and copy the name of the training job you were disconnected from
3. Scoll to the bottom of this notebook, paste your training job name to replace the **your-training-job-name** in the cell
4. Replace **your-unique-bucket-name** with the name of bucket you created in the data-prep notebook
5. Run the edited cell
6. Return to this cell and continue executing the rest of this notebook

## Inference

***

A trained model does nothing on its own. We now want to use the model to perform inference. For this example, that means predicting the class of the image.<br>Normally you can deploy the created model by using the deploy method in the estimator as shown in the commented section.<br>
Since we are going to use a pretrained model we are going to create a sagemaker model using the training container, location of the model URI and serializer.<br>
We will then deploy endpoint using the created model. 
<br>
**NOTE** This cell takes **5 mins** to run

In [None]:
%%time
from datetime import datetime
from sagemaker.serializers import IdentitySerializer
from sagemaker.model import Model

model_data = 's3://ml-materials/sm_image_class/model.tar.gz'
# model_data is set to the pretrained model.
# uncomment the following line the get the model URI from the training job
#model_data = sm_ic_estimator.model_data

endpoint_name = f"sm-image-classification-{datetime.utcnow():%Y-%m-%d-%H%M}"

sm_client = boto3.Session().client(service_name='sagemaker-runtime') 

sm_model = Model(image_uri=training_image, 
              model_data=model_data, 
              role=role)

ic_classifier = sm_model.deploy(
    initial_instance_count=1,
    instance_type="ml.m4.xlarge",
    endpoint_name=endpoint_name
)

Now we get the endpoint name and use boto3 to call the endpoint with our test image<br>


In [None]:
%%time
import json

im_name="../data/test/Roundabout/R2.png"

client = boto3.client('sagemaker-runtime')

response = client.invoke_endpoint(
    EndpointName=endpoint_name,
    ContentType='application/x-image',
    Body=open(im_name, 'rb').read())

json.loads(response['Body'].read().decode("utf-8"))

## Clean up
You can use the following command to delete the endpoint. The endpoint that is created above is persistent and would consume resources till it is deleted.<br>It is good to delete the endpoint when it is not used

In [None]:
sagemaker_client = boto3.client('sagemaker')
sagemaker_client.delete_endpoint(EndpointName=endpoint_name)

## Batch Inference
We are going to use SageMaker Batch Transform to run batch inference on the Test dataset provided

We will start by creating a model in SageMaker. In the request, you name the model and describe a primary container.<br>For the primary container, you specify the Docker image that contains inference code, artifacts (from prior training).<br>You can optionally add a custom environment map that the inference code uses when you deploy the model for predictions.<br>
In our case the the docker image is provided by SageMaker, so we will provide the model name and the location of the model artifacts

In [None]:
%%time
from datetime import datetime
from sagemaker.serializers import IdentitySerializer
from sagemaker.model import Model

model_data = 's3://ml-materials/sm_image_class/model.tar.gz'
# model_data is set to the pretrained model.
# uncomment the following line the get the model URI from the training job
#model_data = sm_ic_estimator.model_data

model_name="traffic-full-image-classification-model" + time.strftime('-%Y-%m-%d-%H-%M-%S', time.gmtime())

sm_model = boto3.Session().client(service_name='sagemaker') 

primary_container = {
    'Image': training_image,
    'ModelDataUrl': model_data,
}

create_model_response = sm_model.create_model(
    ModelName = model_name,
    ExecutionRoleArn = role,
    PrimaryContainer = primary_container)

We now populate the Transformer class and provide the instance count, instance type, the model we created and the output path for the results 

In [None]:
from sagemaker.transformer import Transformer

batch_output_path = f's3://{bucket}/batch_output'

transformer = Transformer(model_name=model_name,
                          instance_count=1,
                          instance_type='ml.m4.xlarge',
                          output_path=batch_output_path)

Finally we call the transform method with the input dataset for the batch inference
<br>
**NOTE** This cell takes **8 mins** to run

In [None]:
%%time
transformer.transform(f's3://{bucket}/test/')

### Viewing the results of the batch inference

In [None]:
! aws s3 sync {batch_output_path} batch_output

You should see a **batch_output** folder. Feed free to navigate and doubleclick the result **.out** files

### Attach to a training job that has been left to run 

If your kernel becomes disconnected and your training has already started, you can reattach to the training job.<br>
In the cell below, replace **your-unique-bucket-name** with the name of bucket you created in the data-prep notebook<br>
Simply look up the training job name and replace the **your-training-job-name** and then run the cell below. <br>
Once the training job is finished, you can continue the cells after the training cell

In [None]:
import sagemaker
import boto3

sess = sagemaker.Session()
role = sagemaker.get_execution_role()

bucket = "your-unique-bucket-name"

training_job_name = 'your-training-job-name'

if 'your-training' not in training_job_name:
    sm_ic_estimator = sagemaker.estimator.Estimator.attach(training_job_name=training_job_name, sagemaker_session=sess)