# Overview

* This notebook covers how to utilize Sagemaker NEO and Sagemaker Elastic Inference (EI)
* In this example, we build a ResNet transfer learning model to predict hot dog/not hot dog [a la Silicon Valley](https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&uact=8&ved=2ahUKEwieq5LR5_jkAhXFY98KHdvcBXEQwqsBMAB6BAgJEAQ&url=https%3A%2F%2Fwww.youtube.com%2Fwatch%3Fv%3DACmydtFDTGs&usg=AOvVaw1OQiCPPVe3B2B6ndhvDGnq)
* Note to run through this notebook Sagemaker P2 instances and a notebook with at least 10 gb of disk space is needed
* We utilize the Food101 dataset to create the hot dog/not hot dog dataset
* By the end of the notebook we show how inference speed are measured against cost for a ResNet Model


# Data Prep

* Download and unzip the Food101 dataset in a terminal using the code below 

`cd Sagemaker`

 `wget http://data.vision.ee.ethz.ch/cvl/food-101.tar.gz`
 
 `tar -zxvf food-101.tar.gz`

In [1]:
# load necessary packages
import json
from glob import glob
import shutil
import os
import numpy as np
from mxnet import gluon
import sys

In [21]:
!{sys.executable} -m pip install tqdm

Collecting tqdm
[?25l  Downloading https://files.pythonhosted.org/packages/7f/32/5144caf0478b1f26bd9d97f510a47336cf4ac0f96c6bc3b5af20d4173920/tqdm-4.40.2-py2.py3-none-any.whl (55kB)
[K    100% |████████████████████████████████| 61kB 26.1MB/s ta 0:00:01
[?25hInstalling collected packages: tqdm
Successfully installed tqdm-4.40.2
[33mYou are using pip version 10.0.1, however version 19.3.1 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.[0m


In [2]:
# need to create food101 folder with contents first
os.chdir('food-101')

In [3]:
train_json = json.load(open('meta/train.json'))
test_json = json.load(open('meta/test.json'))

In [87]:
# make directories for the hot dog/not hot dog dataset
os.makedirs('../hotdog_not_hotdog/train/hot_dog/', exist_ok=True)
os.makedirs('../hotdog_not_hotdog/test/hot_dog/', exist_ok=True)

os.makedirs('../hotdog_not_hotdog/train/not_hotdog/', exist_ok=True)
os.makedirs('../hotdog_not_hotdog/test/not_hotdog/', exist_ok=True)

In [5]:
import random
import copy
from tqdm import tqdm

def move_and_rename(json, dest, n_images):
    '''
    This function takes a json of file names, copies and renames these files into new directories
    All images are copied for hot dog files, the function randomly copies other images for number of n_images
    json : dict, dict of filenames
    dest, string, local folder where to deposit files
    n_images, int, number of images to randomly sample for not hot dog images
    '''
    json_copy = copy.deepcopy(json)
    hotdog_images = json_copy['hot_dog']
    for i in hotdog_images:
        shutil.copyfile('images/{}.jpg'.format(i), '../hotdog_not_hotdog/{}/{}.jpg'.format(dest,i))
    json_copy.pop('hot_dog')
    other_foods = list(json_copy.keys())
    cnt = 0
    for i in tqdm(list(range(n_images))):
        random_indexer = random.randint(0, len(other_foods)-1)
        other_class_imgs = json_copy[other_foods[random_indexer]]
        img_indexer = random.randint(0, len(other_class_imgs)-1)
        selected_image = other_class_imgs[img_indexer]
        destination_name = 'not_hotdog/{}'.format(cnt)
        shutil.copyfile('images/{}.jpg'.format(selected_image), '../hotdog_not_hotdog/{}/{}.jpg'.format(dest,destination_name))
        other_class_imgs.pop(img_indexer)
        # delete used image from list of possibilities
        json_copy[other_foods[random_indexer]] = other_class_imgs
        cnt += 1

In [6]:
# create dataset folders
move_and_rename(train_json, 'train', 750)
move_and_rename(test_json, 'test', 250)

100%|██████████| 750/750 [00:01<00:00, 739.61it/s]
100%|██████████| 250/250 [00:00<00:00, 915.15it/s]


In [7]:
#validate the number of images in the folders
len(glob('../hotdog_not_hotdog/train/hot_dog/*'))

750

In [8]:
len(glob('../hotdog_not_hotdog/test/not_hotdog/*'))

250

# Model Training

## Setup

Create Sagemaker session and role

In [9]:
import sagemaker
from sagemaker.mxnet import MXNet

sagemaker_session = sagemaker.Session()
role = sagemaker.get_execution_role()

## Upload Data

* Sagemaker expects the data to be in an s3 path

In [10]:
inputs = sagemaker_session.upload_data(path='../hotdog_not_hotdog', key_prefix='data/DEMO-hotdog_not_hotdog')
print('input spec (in this case, just an S3 path): {}'.format(inputs))

input spec (in this case, just an S3 path): s3://sagemaker-us-east-1-178197730631/data/DEMO-hotdog_not_hotdog


## Notes on the MxNet Script

* The 'hotdog-not-hotdog.py' file has functions for training and deploying the model
    * The model_fn and transform_fn's automatically deploy on the correct context based on environment
    * For elastic inference data must be placed on `mxnet.eia()` for both data and model
* Note that the model has the following hyperparameters for training
    * batch_size, int, number for batch size
    * epochs, int, number of epochs to run training
    * learning rate, float, the learning rate for the model
    * momentum, float, momentum for the SGD algorithm
    * wd, float, weight decay parameter for model params
    * resnet_size, str, size of resnet to use one of 18, 34, 50, 101, 152
    

* As a opposed to a standard MxNet script to use Sagemaker Neo special functions need to be added
    * These are seen at the bottom of the script (neo_postprocess and neo_preprocess)

## Training Job

* Instantiate the Sagemaker MxNet estimator with the role, instance type, number of instances and hyperparameters

In [11]:
m = MXNet('../hotdog-not-hotdog-mxnet.py',
          role=role, 
          framework_version='1.4.1',
          train_instance_count=1,
          train_instance_type='ml.p3.2xlarge',
          py_version='py3',
          hyperparameters={'batch_size': 32,
                           'epochs': 6,
                           'learning_rate': 0.01,
                           'momentum': 0.9,
                           'resnet_size':'101'})

* Fit the model against the s3 path specified earlier

In [12]:
m.fit("s3://sagemaker-us-east-1-178197730631/data/DEMO-hotdog_not_hotdog")

2019-12-12 02:34:27 Starting - Starting the training job...
2019-12-12 02:34:30 Starting - Launching requested ML instances......
2019-12-12 02:35:35 Starting - Preparing the instances for training......
2019-12-12 02:36:40 Downloading - Downloading input data...
2019-12-12 02:37:26 Training - Training image download completed. Training in progress..[34m2019-12-12 02:37:27,498 sagemaker-containers INFO     Imported framework sagemaker_mxnet_container.training[0m
[34m2019-12-12 02:37:27,524 sagemaker_mxnet_container.training INFO     MXNet training environment: {'SM_HOSTS': '["algo-1"]', 'SM_NETWORK_INTERFACE_NAME': 'eth0', 'SM_HPS': '{"batch_size":32,"epochs":6,"learning_rate":0.01,"momentum":0.9,"resnet_size":"101"}', 'SM_USER_ENTRY_POINT': 'hotdog-not-hotdog-mxnet.py', 'SM_FRAMEWORK_PARAMS': '{}', 'SM_RESOURCE_CONFIG': '{"current_host":"algo-1","hosts":["algo-1"],"network_interface_name":"eth0"}', 'SM_INPUT_DATA_CONFIG': '{"training":{"RecordWrapperType":"None","S3DistributionType

# Optimize the Models through Sagemaker Neo

* Sagemaker NEO compiles the models to optimize them for specific ml instance types in Sagemaker
* Here we create both a GPU optimized model and a CPU optimized model

In [13]:
output_path = '/'.join(m.output_path.split('/')[:-1]) 

In [14]:
compiled_model_gpu = m.compile_model(target_instance_family='ml_p2', input_shape={'data':[1,3,512,512]}, output_path=output_path)

??........!

In [53]:
compiled_model_p3 = m.compile_model(target_instance_family='ml_p3', input_shape={'data':[1,3,512,512]}, output_path=output_path)

?........!

In [15]:
compiled_model_cpu = m.compile_model(target_instance_family='ml_c5', input_shape={'data':[1,3,512,512]}, output_path=output_path)

?...............................................!

# Model Deployment

* We deploy the models with Sagemaker's one click deployment with a few modifications to the input and output serialization 

# Prepare Different Model Deployments

* Different Sagemaker Models need to be created to deploy on different container configurations

In [16]:
from sagemaker.mxnet import MXNetModel

In [17]:
model_output_location = f"{m.output_path}{m.latest_training_job.job_name}/output/model.tar.gz"

In [18]:
model_cpu = MXNetModel(model_data=model_output_location, entry_point='../hotdog-not-hotdog-mxnet.py', role=role,
                      py_version='py3', framework_version='1.4.1')

In [19]:
model_p2 = MXNetModel(model_data=model_output_location, entry_point='../hotdog-not-hotdog-mxnet.py', role=role,
                      py_version='py3', framework_version='1.4.1')

In [66]:
model_p3 = MXNetModel(model_data=model_output_location, entry_point='../hotdog-not-hotdog-mxnet.py', role=role,
                      py_version='py3', framework_version='1.4.1')

In [20]:
model_g4 = MXNetModel(model_data=model_output_location, entry_point='../hotdog-not-hotdog-mxnet.py', role=role,
                      py_version='py3', framework_version='1.4.1')

In [21]:
model_eia = MXNetModel(model_data=model_output_location, entry_point='../hotdog-not-hotdog-mxnet.py', role=role,
                      py_version='py3', framework_version='1.4.1')

# Model Inference Code

* Our files need to be normalized to ImageNet values for mean and standard deviations and cropped to be 224x224
* We define this code and a selection of images for use with our models
* Requires opencv package
* If this is not installed run the following code in a notebook cell

`import sys
!{sys.executable} -m pip install opencv-python`

In [22]:
filenames = glob('../hotdog_not_hotdog/test/*/*')

In [62]:
random_selection = [filenames[random.randint(0,499)] for x in range(0,50)]

In [24]:
import io
import cv2 

def predict_hotdog(endpoint, filenames):
    '''
    Function to preprocess and predict a list of images
    endpoint, str, Sagemaker endpoint
    filenames, list, list of images (local file locations)
    '''
    resps = []
    for img in filenames:
        img_np = cv2.imread(img)
        img_np = cv2.resize(img_np,(512,512))
        img_np = img_np.transpose(2, 0, 1)
        output_img = np.expand_dims(img_np, axis=0)
        resp = endpoint.predict(output_img)
        resps.append(resp)
    return resps

def numpy_bytes_serializer(data):
    '''
    function to serialize data for sagemaker neo endpoints
    '''
    f = io.BytesIO()
    np.save(f, data)
    f.seek(0)
    return f.read()

# Evaluating Inference on A Variety of Sagemaker Deployments

* We showcase how regular EI, Neo, and different ML ec2 types can impact endpoint latency

## P2 Instances

P2 instances are intended for general-purpose GPU compute applications.

In [25]:
predictor_p2 = model_p2.deploy(initial_instance_count=1,
                        instance_type='ml.p2.xlarge')

---------------------------------------------------------------------------------------------------------------------------!

In [26]:
# load model onto instance
predict_hotdog(predictor_p2, random_selection[:1])

[{'predicted_class': 'not_hot_dog', 'confidence': '0.8860937'}]

In [27]:
import time

t1 = time.time()
%timeit -n 1 predict_hotdog(predictor_p2, random_selection)
print(f"Total Time {(time.time()-t1)/60}")

20.1 s ± 305 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Total Time 2.345171244939168


In [28]:
predictor_p2.delete_endpoint()

## P3 Instances

P3
instances are the newest generation of ec2 intended for general-purpose GPU compute applications.

In [67]:
predictor_p3 = model_p3.deploy(initial_instance_count=1,
                        instance_type='ml.p3.2xlarge')

---------------------------------------------------------------------------------------------------------------------------!

In [68]:
# load model onto instance
predict_hotdog(predictor_p3, random_selection[:1])

[{'predicted_class': 'not_hot_dog', 'confidence': '0.704241'}]

In [70]:
import time

t1 = time.time()
%timeit -n 1 predict_hotdog(predictor_p3, random_selection)
print(f"Total Time {(time.time()-t1)/60}")

16.4 s ± 170 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Total Time 1.9120223760604858


In [71]:
predictor_p3.delete_endpoint()

## C5 Instances

C5 instances are optimized for compute-intensive workloads and deliver cost-effective high performance at a low price per compute ratio.

In [29]:
predictor_c5 = model_cpu.deploy(initial_instance_count=1,
                        instance_type='ml.c5.xlarge')

---------------------------------------------------------------------------------------!

In [30]:
# load model onto instance
predict_hotdog(predictor_c5, random_selection[:1])

[{'predicted_class': 'not_hot_dog', 'confidence': '0.88609374'}]

In [31]:
t1 = time.time()
%timeit -n 1 predict_hotdog(predictor_c5, random_selection)
print(f"Total Time {(time.time()-t1)/60}")

51.8 s ± 410 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Total Time 6.041615438461304


In [32]:
predictor_c5.delete_endpoint()

## G4 Instances 

G4 instances are designed to help accelerate machine learning inference and graphics-intensive workloads.

In [33]:
predictor_g4 = model_g4.deploy(initial_instance_count=1,
                        instance_type='ml.g4dn.xlarge')

---------------------------------------------------------------------------------------------------!

In [34]:
# load model onto instance
predict_hotdog(predictor_g4, random_selection[:1])

[{'predicted_class': 'not_hot_dog', 'confidence': '0.88609356'}]

In [35]:
t1 = time.time()
%timeit -n 1 predict_hotdog(predictor_g4, random_selection)
print(f"Total Time {(time.time()-t1)/60}")

15.9 s ± 217 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Total Time 1.8551597436269125


In [36]:
predictor_g4.delete_endpoint()

## Elastic Inference

Amazon Elastic Inference allows you to attach low-cost GPU-powered acceleration to Amazon EC2 and Amazon SageMaker instances or Amazon ECS tasks to reduce the cost of running deep learning inference by up to 75%.

In [37]:
predictor_ei = model_eia.deploy(initial_instance_count=1, 
                        instance_type='ml.c5.large',
                            accelerator_type='ml.eia2.medium')

---------------------------------------------------------------------------------------------------------------!

In [38]:
predict_hotdog(predictor_ei, random_selection[:1])

[{'predicted_class': 'not_hot_dog', 'confidence': '0.88609356'}]

In [39]:
t1 = time.time()
%timeit -n 1 predict_hotdog(predictor_ei, random_selection)
print(f"Total Time {(time.time()-t1)/60}")

20.8 s ± 620 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Total Time 2.4239836931228638


In [40]:
predictor_ei.delete_endpoint()

## Sagemaker Neo

Amazon SageMaker Neo enables developers to train machine learning models once and run them anywhere in the cloud and at the edge. Amazon SageMaker Neo optimizes models to run up to twice as fast, with less than a tenth of the memory footprint, with no loss in accuracy.

## Neo Optimized C5

In [41]:
compiled_predictor = compiled_model_cpu.deploy(initial_instance_count=1, 
                                               instance_type='ml.c5.xlarge')

compiled_predictor.content_type = 'application/vnd+python.numpy+binary'
compiled_predictor.serializer = numpy_bytes_serializer

---------------------------------------------------------------------------------------!

In [42]:
predict_hotdog(compiled_predictor, random_selection[:1])

[{'predicted_class': 'not_hot_dog', 'confidence': '0.8860937'}]

In [43]:
t1 = time.time()
%timeit -n 1 predict_hotdog(compiled_predictor, random_selection)
print(f"Total Time {(time.time()-t1)/60}")

21.7 s ± 1.97 s per loop (mean ± std. dev. of 7 runs, 1 loop each)
Total Time 2.5276662866274515


In [44]:
compiled_predictor.delete_endpoint()

## Neo Optimized G4

In [45]:
from sagemaker.predictor import npy_serializer, json_deserializer, json_serializer
compiled_predictor = compiled_model_gpu.deploy(initial_instance_count=1, 
                                               instance_type='ml.g4dn.xlarge')

compiled_predictor.content_type = 'application/vnd+python.numpy+binary'
compiled_predictor.serializer = numpy_bytes_serializer

----------------------------------------------------------------------------------------------------------------!

In [46]:
predict_hotdog(compiled_predictor, random_selection[:1])

[{'predicted_class': 'not_hot_dog', 'confidence': '0.8860937'}]

In [47]:
import time
t1 = time.time()
%timeit -n 1 predict_hotdog(compiled_predictor, random_selection)
print(f"Total Time {(time.time()-t1)/60}")

2.67 s ± 127 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Total Time 0.3110075076421102


In [48]:
compiled_predictor.delete_endpoint()

## Neo Optimized P2 

In [49]:
from sagemaker.predictor import npy_serializer, json_deserializer, json_serializer
compiled_predictor = compiled_model_gpu.deploy(initial_instance_count=1, 
                                               instance_type='ml.p2.xlarge')

compiled_predictor.content_type = 'application/vnd+python.numpy+binary'
compiled_predictor.serializer = numpy_bytes_serializer

---------------------------------------------------------------------------------------------------------------------------------------------------!

In [50]:
predict_hotdog(compiled_predictor, random_selection[:1])

[{'predicted_class': 'not_hot_dog', 'confidence': '0.8860937'}]

In [51]:
import time
t1 = time.time()
%timeit -n 1 predict_hotdog(compiled_predictor, random_selection)
print(f"Total Time {(time.time()-t1)/60}")

5.19 s ± 149 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Total Time 0.6050745129585267


In [52]:
compiled_predictor.delete_endpoint()

## Neo Compiled P3 Instances

P2 instances are intended for general-purpose GPU compute applications.

In [60]:
from sagemaker.predictor import npy_serializer, json_deserializer, json_serializer
compiled_predictor = compiled_model_p3.deploy(initial_instance_count=1, 
                                               instance_type='ml.p3.2xlarge')

compiled_predictor.content_type = 'application/vnd+python.numpy+binary'
compiled_predictor.serializer = numpy_bytes_serializer

---------------------------------------------------------------------------------------------------------------------------------------!

In [63]:
predict_hotdog(compiled_predictor, random_selection[:1])

[{'predicted_class': 'not_hot_dog', 'confidence': '0.70424116'}]

In [64]:
import time
t1 = time.time()
%timeit -n 1 predict_hotdog(compiled_predictor, random_selection)
print(f"Total Time {(time.time()-t1)/60}")

2.24 s ± 35.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Total Time 0.2618632833162943


In [65]:
predictor_p3.delete_endpoint()

ClientError: An error occurred (ValidationException) when calling the DeleteEndpointConfig operation: Could not find endpoint configuration "arn:aws:sagemaker:us-east-1:178197730631:endpoint-config/mxnet-inference-2019-12-14-03-52-58-645".