# Computer Vision with SageMaker's Image Classification Algorithm

1. [Introduction](#Introduction)
2. [Prerequisites and Preprocessing](#Prequisites-and-Preprocessing)
3. [Fine-tuning the Image classification model](#Fine-tuning-the-Image-classification-model)
4. [Training parameters](#Training-parameters)
5. [Start the training](#Start-the-training)
6. [Inference](#Inference)


## Introduction

Image classification is a fundamental computer vision task that involves predicting the overall label/class of an image.  Modern computer vision techniques use neural net models for these kinds of tasks.  Although neural nets can achieve high accuracy for image classification, they can be quite difficult to use directly.  Amazon SageMaker's built-in image classification algorithm makes  such neural nets much easier to use.  Simply provide your dataset and specify a few parameters, and you can train and deploy a custom model.  

This notebook is an end-to-end example of image classification to "fine-tune" a pretrained model. Fine-tuning, a former of "transfer learning" from one classification task to another, typically results in substantial time and cost savings compared to training from scratch.  We'll use SageMaker's built-in image classification algorithm in transfer learning mode to fine-tune a model previously trained on the well-known public ImageNet dataset.  This fine-tuned model will be used to classify a new dataset different from ImageNet. In particular, the pretrained model will be fine-tuned with the [Caltech-256 dataset](http://www.vision.caltech.edu/Image_Datasets/Caltech256/).  

SageMaker's built-in image classification algorithm has an option for training from scratch as well as transfer learning.  Using the built-in algorithm's transfer learning mode frees you from having to modify the underlying neural net architecture, which otherwise would be necessary if you used the neural net directly from a code base.  There are many other conveniences provided by this built-in algorithm, such as the ability to automatically train faster on a cluster of many instances without requiring you to manage cluster setup and teardown.  

To get started, we need to set up the environment with a few prerequisite steps, for permissions, configurations, and so on.

## Prequisites and Preprocessing

### Permissions and environment variables

Here we set up the linkage and authentication for AWS services. There are three parts to this:

* The IAM role used to give learning and hosting access to your data. This will be obtained from the role used to start the notebook.
* The S3 bucket for training and model data.
* The Amazon SageMaker image classification algoritm Docker image which you can use out of the box, without modifications.

In [None]:
%%time
import boto3
import sagemaker
from sagemaker import get_execution_role

role = get_execution_role()
print(role)

sess = sagemaker.Session()
bucket = sess.default_bucket()
prefix = 'ic-transfer-learning'

In [None]:
region = sess.boto_region_name
image_name = sagemaker.image_uris.retrieve(region=boto3.Session().region_name, framework='image-classification')
print(image_name)

###### Fine-tuning the Image Classification model

The Caltech 256 dataset consist of images from 256 categories plus a clutter category. It has a total of 30000 images, with a minimum of 80 images and a maximum of about 800 images per category. 

The image classification algorithm can take two types of input formats. The first is a [recordio format](https://mxnet.incubator.apache.org/faq/recordio.html), and the other is a [lst format](https://mxnet.incubator.apache.org/faq/recordio.html?highlight=im2rec). In this example, we will use the recordio format.

In [None]:
import os
import urllib.request
import boto3

def download(url):
    filename = url.split("/")[-1]
    if not os.path.exists(filename):
        urllib.request.urlretrieve(url, filename)

        
def upload_to_s3(channel, file):
    s3 = boto3.resource('s3')
    data = open(file, "rb")
    key = channel + '/' + file
    s3.Bucket(bucket).put_object(Key=key, Body=data)


# # caltech-256
download('http://data.mxnet.io/data/caltech-256/caltech-256-60-train.rec')
download('http://data.mxnet.io/data/caltech-256/caltech-256-60-val.rec')
upload_to_s3('validation', 'caltech-256-60-val.rec')
upload_to_s3('train', 'caltech-256-60-train.rec')

Next, we'll upload the data to Amazon S3 so it can be accessed by SageMaker for model training.

In [None]:
# Four channels: train, validation, train_lst, and validation_lst
s3train = 's3://{}/{}/train/'.format(bucket, prefix)
s3validation = 's3://{}/{}/validation/'.format(bucket, prefix)

# upload the lst files to train and validation channels
!aws s3 cp caltech-256-60-train.rec $s3train --quiet
!aws s3 cp caltech-256-60-val.rec $s3validation --quiet

Once we have the data available in S3 in the correct format for training, the next step is to actually train the model using the data. Before training the model, we need to setup the training parameters. The next section will explain the parameters in detail and dive into how to set up the training job.

## Training

Now that we are done with the data setup, we are almost ready to train our image classfication model. To begin, let's  create a ``sageMaker.estimator.Estimator`` object. This Estimator will launch the training job.

### Training parameters

There are two kinds of parameters to set for training. The first kind is the parameters for the training job itself, such as amount and type of hardware to use, and S3 location. For this example, these include:

* **Instance count**: This is the number of instances on which to run the training. When the number of instances is greater than one, then the image classification algorithm will run in a distributed cluster automatically without requiring you to manage cluster setup. 
* **Instance type**: This indicates the type of machine on which to run the training. Typically, we use GPU instances for computer vision models such as this one.
* **Output path**: This the S3 folder in which the training output will be stored.

In [None]:
s3_output_location = 's3://{}/{}/output'.format(bucket, prefix)

ic = sagemaker.estimator.Estimator(
                                    image_uri=image_name,
                                    role=role,
                                    instance_count=1,
                                    instance_type='ml.p3.8xlarge',
                                    volume_size = 50,
                                    sagemaker_session=sess,
                                    output_path=s3_output_location )

Apart from the above set of training job parameters, the second set of parameters are hyperparameters that are specific to the algorithm. These include:

* **num_layers**: The number of layers (depth) for the network. We use 18 in this example, but other values such as 50, 152 can be used to achieve greater accuracy at the cost of longer training time.
* **use_pretrained_model**: Set to 1 to use a pretrained model for transfer learning.
* **image_shape**: The input image dimensions,'num_channels, height, width', for the network. It should be no larger than the actual image size. The number of channels should be same as the actual image.
* **num_classes**: This is the number of output classes for the new dataset. Imagenet has 1000  classes, but the number of output classes for our pretrained network can be changed with fine-tuning. For this Caltech dataset, we use 257 because it has 256 object categories + 1 clutter class.
* **num_training_samples**: This is the total number of training samples. It is set to 15240 for the Caltech dataset due to the current split between training and validation data.
* **mini_batch_size**: The number of training samples used for each mini batch. In distributed training for multiple training instances (we just use one here), the number of training samples used per batch would be N * mini_batch_size, where N is the number of hosts on which training is run.
* **epochs**: Number of training epochs, i.e. passes over the complete training data.
* **learning_rate**: Learning rate for training.
* **precision_dtype**: Training datatype precision (default: float32). If set to 'float16', the training will be done in mixed_precision mode and will be faster than float32 mode, at the cost of slightly less accuracy.  

In [None]:
ic.set_hyperparameters(
                         num_layers=18,
                         use_pretrained_model=1,
                         image_shape = "3,224,224",
                         num_classes=257,
                         num_training_samples=15420,
                         mini_batch_size=128,
                         epochs=2,
                         learning_rate=0.01,
                         precision_dtype='float32' )

## Input data specification

The next step is to set the data type and channels used for training.  The channel definitions inform SageMaker about where to find both the training and validation datasets in S3.

In [None]:
train_data = sagemaker.inputs.TrainingInput(s3_data=s3train, content_type='application/x-recordio')
validation_data = sagemaker.inputs.TrainingInput(s3_data=s3validation, content_type='application/x-recordio')

data_channels = {'train': train_data, 'validation': validation_data}

## Start the training job

Now we can start the training job by calling the `fit` method of the Estimator object.

In [None]:
ic.fit(inputs=data_channels, logs=True)

# Inference

***

A trained model does nothing on its own. We now want to use the model to perform inference, i.e. get predictions from the model. For this example, that means predicting the Caltech-256 class of a given image. To deploy the trained model, we simply use the `deploy` method of the Estimator.  This will create a SageMaker endpoint that can return predictions in real time, for example for use with a consumer-facing app that must have low latency responses to user requests.  SageMaker also can perform offline batch, asynchronous inference with its Batch Transform feature.  

In [None]:
ic_classifier = ic.deploy(initial_instance_count = 1,
                          instance_type = 'ml.m5.xlarge')

### Download a test image

In [None]:
!wget -O test.jpg https://raw.githubusercontent.com/awslabs/amazon-sagemaker-workshop/master/images/clawfoot_bathtub.jpg
file_name = 'test.jpg'
# test image
from IPython.display import Image
Image(file_name)  

### Evaluation

Let's now use the SageMaker endpoint hosting the trained model to predict the Caltech-256 class of the test image. The model outputs class probabilities.  Typically, one selects the class with the maximum probability as the final predicted class output.

**Note:** Although the output class detected by the network is likely to predict the correct class (bathtub), it is not guaranteed to be accurate as model training is a stochastic process. To limit the training time and related cost, we have trained the model only for a couple of epochs. If the model is trained for more epochs (say 20), the output class will be more accurate.

In [None]:
import json
import numpy as np
from sagemaker.serializers import IdentitySerializer

with open(file_name, 'rb') as f:
    payload = f.read()
    payload = bytearray(payload)
    
ic_classifier.serializer = IdentitySerializer(content_type='application/x-image')

result = json.loads(ic_classifier.predict(payload))
# output the probabilities for all classes, then find the class with maximum probability and print its index
index = np.argmax(result)
object_categories = ['ak47', 'american-flag', 'backpack', 'baseball-bat', 'baseball-glove', 'basketball-hoop', 'bat', 'bathtub', 'bear', 'beer-mug', 'billiards', 'binoculars', 'birdbath', 'blimp', 'bonsai-101', 'boom-box', 'bowling-ball', 'bowling-pin', 'boxing-glove', 'brain-101', 'breadmaker', 'buddha-101', 'bulldozer', 'butterfly', 'cactus', 'cake', 'calculator', 'camel', 'cannon', 'canoe', 'car-tire', 'cartman', 'cd', 'centipede', 'cereal-box', 'chandelier-101', 'chess-board', 'chimp', 'chopsticks', 'cockroach', 'coffee-mug', 'coffin', 'coin', 'comet', 'computer-keyboard', 'computer-monitor', 'computer-mouse', 'conch', 'cormorant', 'covered-wagon', 'cowboy-hat', 'crab-101', 'desk-globe', 'diamond-ring', 'dice', 'dog', 'dolphin-101', 'doorknob', 'drinking-straw', 'duck', 'dumb-bell', 'eiffel-tower', 'electric-guitar-101', 'elephant-101', 'elk', 'ewer-101', 'eyeglasses', 'fern', 'fighter-jet', 'fire-extinguisher', 'fire-hydrant', 'fire-truck', 'fireworks', 'flashlight', 'floppy-disk', 'football-helmet', 'french-horn', 'fried-egg', 'frisbee', 'frog', 'frying-pan', 'galaxy', 'gas-pump', 'giraffe', 'goat', 'golden-gate-bridge', 'goldfish', 'golf-ball', 'goose', 'gorilla', 'grand-piano-101', 'grapes', 'grasshopper', 'guitar-pick', 'hamburger', 'hammock', 'harmonica', 'harp', 'harpsichord', 'hawksbill-101', 'head-phones', 'helicopter-101', 'hibiscus', 'homer-simpson', 'horse', 'horseshoe-crab', 'hot-air-balloon', 'hot-dog', 'hot-tub', 'hourglass', 'house-fly', 'human-skeleton', 'hummingbird', 'ibis-101', 'ice-cream-cone', 'iguana', 'ipod', 'iris', 'jesus-christ', 'joy-stick', 'kangaroo-101', 'kayak', 'ketch-101', 'killer-whale', 'knife', 'ladder', 'laptop-101', 'lathe', 'leopards-101', 'license-plate', 'lightbulb', 'light-house', 'lightning', 'llama-101', 'mailbox', 'mandolin', 'mars', 'mattress', 'megaphone', 'menorah-101', 'microscope', 'microwave', 'minaret', 'minotaur', 'motorbikes-101', 'mountain-bike', 'mushroom', 'mussels', 'necktie', 'octopus', 'ostrich', 'owl', 'palm-pilot', 'palm-tree', 'paperclip', 'paper-shredder', 'pci-card', 'penguin', 'people', 'pez-dispenser', 'photocopier', 'picnic-table', 'playing-card', 'porcupine', 'pram', 'praying-mantis', 'pyramid', 'raccoon', 'radio-telescope', 'rainbow', 'refrigerator', 'revolver-101', 'rifle', 'rotary-phone', 'roulette-wheel', 'saddle', 'saturn', 'school-bus', 'scorpion-101', 'screwdriver', 'segway', 'self-propelled-lawn-mower', 'sextant', 'sheet-music', 'skateboard', 'skunk', 'skyscraper', 'smokestack', 'snail', 'snake', 'sneaker', 'snowmobile', 'soccer-ball', 'socks', 'soda-can', 'spaghetti', 'speed-boat', 'spider', 'spoon', 'stained-glass', 'starfish-101', 'steering-wheel', 'stirrups', 'sunflower-101', 'superman', 'sushi', 'swan', 'swiss-army-knife', 'sword', 'syringe', 'tambourine', 'teapot', 'teddy-bear', 'teepee', 'telephone-box', 'tennis-ball', 'tennis-court', 'tennis-racket', 'theodolite', 'toaster', 'tomato', 'tombstone', 'top-hat', 'touring-bike', 'tower-pisa', 'traffic-light', 'treadmill', 'triceratops', 'tricycle', 'trilobite-101', 'tripod', 't-shirt', 'tuning-fork', 'tweezer', 'umbrella-101', 'unicorn', 'vcr', 'video-projector', 'washing-machine', 'watch-101', 'waterfall', 'watermelon', 'welding-mask', 'wheelbarrow', 'windmill', 'wine-bottle', 'xylophone', 'yarmulke', 'yo-yo', 'zebra', 'airplanes-101', 'car-side-101', 'faces-easy-101', 'greyhound', 'tennis-shoes', 'toad', 'clutter']
print("Result: label - " + object_categories[index] + ", probability - " + str(result[index]))

### Clean up

When we're done with the endpoint, we can just delete it and the backing instance will be released.  Run the following cell to delete the endpoint.

In [None]:
sess.delete_endpoint(ic_classifier.endpoint_name)