# End-to-End Multiclass Image Classification Example
1. [Introduction](#Introduction)
2. [Prerequisites and Preprocessing](#Prequisites-and-Preprocessing)
  1. [Permissions and environment variables](#Permissions-and-environment-variables)
  2. [Prepare the data](#Prepare-the-data)
3. [Training the model](#Training-the-model)
  1. [Training parameters](#Training-parameters)
  2. [Start the training](#Start-the-training)
4. [Compile](#Compile)

## Introduction

Welcome to our end-to-end example of distributed image classification algorithm. In this exercise, we will use the Amazon sagemaker image classification algorithm to train on the [caltech-256 dataset](http://www.vision.caltech.edu/Image_Datasets/Caltech256/). 

Before getting started - 

Please Note, you can incurr upto 15$ in charges (or more) for using Sagemaker instances to train the model here, depending on how long the instances are running in your own AWS account. If you are cost sensitive, you have the following options - 

* AWS provides trained model for common use cases such as image classification, object detection etc out of the box. These models can be deployed to the edge directly without additional training. Please choose the public component - variant.DLR.ImageClassification.ModelStore, keep the default options and follow the Greengrass deployment steps provided in the hands-on section of the chapter. 

* There is no cost to read through the below instructions though to understand the ML workflow. 
 
If you decide to proceed, lets get started with the instructions below -
We need to set up the environment with a few prerequisite steps, for permissions, configurations, and so on.

## Prequisites and Preprocessing

### Setup Permissions and environment variables

Here we set up the linkage and authentication to AWS services. There are three parts to this:

* The roles used to give learning and hosting access to your data. This will automatically be obtained from the role used to start the notebook
* The S3 bucket that you want to use for training and model data
* The Amazon sagemaker image classification docker image which need not be changed

In [None]:
import sagemaker
from sagemaker import get_execution_role
import boto3


role = get_execution_role()

sess = sagemaker.Session()
bucket = sess.default_bucket()
prefix = "ic-fulltraining"
client = boto3.client('sagemaker')

In [None]:
from sagemaker import image_uris

training_image = image_uris.retrieve(
    region=sess.boto_region_name, framework="image-classification", version="latest"
)
print(training_image)

### Data preparation
Download the data and transfer to S3 for use in training. In this demo, we are using [Caltech-256](http://www.vision.caltech.edu/Image_Datasets/Caltech256/) dataset, which contains 30608 images of 256 objects. For the training and validation data, we follow the splitting scheme in this MXNet [example](https://github.com/apache/incubator-mxnet/blob/master/example/image-classification/data/caltech256.sh). In particular, it randomly selects 60 images per class for training, and uses the remaining data for validation. The algorithm takes `RecordIO` file as input. The user can also provide the image files as input, which will be converted into `RecordIO` format using MXNet's [im2rec](https://mxnet.incubator.apache.org/how_to/recordio.html?highlight=im2rec) tool. It takes around 50 seconds to converted the entire Caltech-256 dataset (~1.2GB) on a p2.xlarge instance. However, for this demo, we will use record io format. 

In [None]:
import os
import urllib.request
import boto3


def download(url):
    filename = url.split("/")[-1]
    if not os.path.exists(filename):
        urllib.request.urlretrieve(url, filename)


def upload_to_s3(channel, file):
    s3 = boto3.resource("s3")
    data = open(file, "rb")
    key = channel + "/" + file
    s3.Bucket(bucket).put_object(Key=key, Body=data)


# caltech-256
download("http://data.mxnet.io/data/caltech-256/caltech-256-60-train.rec")
download("http://data.mxnet.io/data/caltech-256/caltech-256-60-val.rec")

In [None]:
# Four channels: train, validation, train_lst, and validation_lst
s3train = "s3://{}/{}/train/".format(bucket, prefix)
s3validation = "s3://{}/{}/validation/".format(bucket, prefix)

# upload the lst files to train and validation channels
!aws s3 cp caltech-256-60-train.rec $s3train --quiet
!aws s3 cp caltech-256-60-val.rec $s3validation --quiet



Once we have the data available in the correct format for training, the next step is to actually train the model using the data. After setting training parameters, we kick off training, and poll for status until training is completed.


## Training the model

Now that we are done with all the setup that is needed, we are ready to train our object detector. To begin, let us create a ``sageMaker.estimator.Estimator`` object. This estimator will launch the training job.
### Training parameters
There are two kinds of parameters that need to be set for training. The first one are the parameters for the training job. These include:

* **Training instance count**: This is the number of instances on which to run the training. When the number of instances is greater than one, then the image classification algorithm will run in distributed settings. 
* **Training instance type**: This indicates the type of machine on which to run the training. Typically, we use GPU instances for these training 
* **Output path**: This the s3 folder in which the training output is stored


In [None]:
s3_output_location = "s3://{}/{}/output".format(bucket, prefix)
ic = sagemaker.estimator.Estimator(
    training_image,
    role,
    instance_count=1,
    instance_type="ml.p2.xlarge",
    volume_size=50,
    max_run=360000,
    input_mode="File",
    output_path=s3_output_location,
    sagemaker_session=sess,
)

In [None]:
s3_output_location

Apart from the above set of parameters, there are hyperparameters that are specific to the algorithm. These are:

* **num_layers**: The number of layers (depth) for the network. We use 18 in this samples but other values such as 50, 152 can be used.
* **image_shape**: The input image dimensions,'num_channels, height, width', for the network. It should be no larger than the actual image size. The number of channels should be same as the actual image.
* **num_classes**: This is the number of output classes for the new dataset. Imagenet was trained with 1000 output classes but the number of output classes can be changed for fine-tuning. For caltech, we use 257 because it has 256 object categories + 1 clutter class.
* **num_training_samples**: This is the total number of training samples. It is set to 15240 for caltech dataset with the current split.
* **mini_batch_size**: The number of training samples used for each mini batch. In distributed training, the number of training samples used per batch will be N * mini_batch_size where N is the number of hosts on which training is run.
* **epochs**: Number of training epochs.
* **learning_rate**: Learning rate for training.
* **top_k**: Report the top-k accuracy during training.
* **precision_dtype**: Training datatype precision (default: float32). If set to 'float16', the training will be done in mixed_precision mode and will be faster than float32 mode


In [None]:
ic.set_hyperparameters(
    num_layers=18,
    image_shape="3,224,224",
    num_classes=257,
    num_training_samples=15420,
    mini_batch_size=128,
    epochs=5,
    learning_rate=0.01,
    top_k=2,
    precision_dtype="float32",
)

## Input data specification
Set the data type and channels used for training

In [None]:
train_data = sagemaker.inputs.TrainingInput(
    s3train,
    distribution="FullyReplicated",
    content_type="application/x-recordio",
    s3_data_type="S3Prefix",
)
validation_data = sagemaker.inputs.TrainingInput(
    s3validation,
    distribution="FullyReplicated",
    content_type="application/x-recordio",
    s3_data_type="S3Prefix",
)

data_channels = {"train": train_data, "validation": validation_data}

## Start the training
Start training by calling the fit method in the estimator

In [None]:
ic.fit(inputs=data_channels, logs=True)

# Compile

***

[Amazon SageMaker Neo](https://aws.amazon.com/sagemaker/neo/) optimizes models to run up to twice as fast, with no loss in accuracy. When calling `create_compilation_job()` function, we specify the target platform as well as the S3 bucket to which the compiled model would be stored.

You have to change the Target Platform values below based on the hardware / environment you are using along with the default s3 output location. Please find the list of all supported Target Platforms [here](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_OutputConfig.html)

Please change the S3Uri and S3Outputlocation attributes below

In [None]:
response = client.create_compilation_job(
    CompilationJobName='<update-job-name-here>',
    RoleArn=role,
    InputConfig={
        'S3Uri': 's3://sagemaker-<region>-<account-id>/ic-fulltraining/output/image-classification-<timestamp>/output/model.tar.gz',
        'DataInputConfig': '{"data": [1, 3, 224, 224]}',
        'Framework': 'MXNET'
    },
    OutputConfig={
        'S3OutputLocation': 's3://sagemaker-<region>-<account-id>/ic-fulltraining/',
        'TargetPlatform': {
            'Os': 'LINUX',
            'Arch': 'ARM64'
        },
    },
    StoppingCondition={
        'MaxRuntimeInSeconds': 900,
        'MaxWaitTimeInSeconds': 900
    }
)

Now the model is optimized by Sagemaker Neo for edge deployment, lets zip it up and upload to S3 bucket. AWS IoT Greengrass v2 service will retrieve the model from the respective bucket and deploy it to the edge gateway. 

* Note : At the time of authoring, Greengrass V2 service only supports zip as the compression method. So we need to change the compresssion format from a tarball to a zipped archive prior to the deployment. 

In [None]:
!apt-get update
!apt-get install zip

In [None]:
#archive s3 models to a .zip format
from sagemaker import s3, session
bucket = session.Session().default_bucket()

#the name of the model, will be different for other target platforms, change it
inputs = s3.S3Downloader.download(output_path+"/model-LINUX_ARM64.tar.gz", './')


In [None]:
#Extract the tar ball 
!tar -xvf  model-LINUX_ARM64.tar.gz

In [None]:
#Archive in Zip format
!zip DLR-resnet50-aarch64-cpu-ImageClassification.zip libdlr.so dlr.h compiled_model.json compiled.meta compiled.params compiled.so manifest synset.txt 

In [None]:
#Upload to S3 bucket
inputs = s3.S3Uploader.upload("./DLR-resnet50-aarch64-cpu-ImageClassification.zip",output_path)

Congratulations, the ML model is now trained for your respective platform and ready for deployment. Please navigate back to the hands-on section of your book to complete this chapter.