resources:  
[AWS docs - Build a multi model container](https://docs.aws.amazon.com/sagemaker/latest/dg/build-multi-model-build-container.html)  
[Multi Mode Server on github](https://github.com/awslabs/multi-model-server)

In [1]:
import pandas as pd

# Amazon SageMaker Multi-Model Endpoints using your own algorithm container
With [Amazon SageMaker multi-model endpoints](https://docs.aws.amazon.com/sagemaker/latest/dg/multi-model-endpoints.html), customers can create an endpoint that seamlessly hosts up to thousands of models. These endpoints are well suited to use cases where any one of a large number of models, which can be served from a common inference container, needs to be invokable on-demand and where it is acceptable for infrequently invoked models to incur some additional latency. For applications which require consistently low inference latency, a traditional endpoint is still the best choice.

At a high level, Amazon SageMaker manages the loading and unloading of models for a multi-model endpoint, as they are needed. 
  
When an invocation request is made for a particular model, Amazon SageMaker  
- routes the request to an instance assigned to that model, 
- downloads the model artifacts from S3 onto that instance, and  
- initiates loading of the model into the memory of the container. 
  
As soon as the loading is complete, Amazon SageMaker performs the requested invocation and returns the result. 
  
If the model is already loaded in memory on the selected instance, the downloading and loading steps are skipped and the invocation is performed immediately.

For the inference container to serve multiple models in a multi-model endpoint, it must implement [additional APIs](https://docs.aws.amazon.com/sagemaker/latest/dg/build-multi-model-build-container.html) in order to load, list, get, unload and invoke specific models. 
  
This notebook demonstrates how to build your own inference container that implements these APIs.

---

### Contents

1. [Introduction to Multi Model Server (MMS)](#Introduction-to-Multi-Model-Server-(MMS))
  1. [Handling Out Of Memory conditions](#Handling-Out-Of-Memory-conditions)
  1. [SageMaker Inference Toolkit](#SageMaker-Inference-Toolkit)
1. [Building and registering a container using MMS](#Building-and-registering-a-container-using-MMS)
1. [Set up the environment](#Set-up-the-environment)
1. [Upload model artifacts to S3](#Upload-model-artifacts-to-S3)
1. [Create a multi-model endpoint](#Create-a-multi-model-endpoint)
  1. [Import models into hosting](#Import-models-into-hosting)
  1. [Create endpoint configuration](#Create-endpoint-configuration)
  1. [Create endpoint](#Create-endpoint)
1. [Invoke models](#Invoke-models)
  1. [Add models to the endpoint](#Add-models-to-the-endpoint)
  1. [Updating a model](#Updating-a-model)
1. [(Optional) Delete the hosting resources](#(Optional)-Delete-the-hosting-resources)

## Introduction to Multi Model Server (MMS)

[Multi Model Server](https://github.com/awslabs/multi-model-server) is an open source framework for serving machine learning models. It provides the HTTP frontend and model management capabilities required by multi-model endpoints to host multiple models within a single container, load models into and unload models out of the container dynamically, and performing inference on a specified loaded model.

MMS supports a pluggable custom backend handler where you can implement your own algorithm. This example uses a handler that supports loading and inference for MXNet models, which we will inspect below.

In [2]:
! cat container/model_handler.py

"""
ModelHandler defines an example model handler for load and inference requests for MXNet CPU models
"""
from collections import namedtuple
import glob
import json
import logging
import os
import re

import mxnet as mx
import numpy as np

class ModelHandler(object):
    """
    A sample Model handler implementation.
    """

    def __init__(self):
        self.initialized = False
        self.mx_model = None
        self.shapes = None

    def get_model_files_prefix(self, model_dir):
        """
        Get the model prefix name for the model artifacts (symbol and parameter file).
        This assume model artifact directory contains a symbol file, parameter file, 
        model shapes file and a synset file defining the labels

        :param model_dir: Path to the directory with model artifacts
        :return: prefix string for model artifact files
        """
        sym_file_suffix = "-symbol.json"
        checkpoint_prefix_regex = "{}/*{}".format(model_dir, sym_file_suffix) # 

Of note are the `handle(data, context)` and `initialize(self, context)` methods.

The `initialize` method will be called when a model is loaded into memory. In this example, it loads the model artifacts at `model_dir` into MXNet.

The `handle` method will be called when invoking the model. In this example, it validates the input payload and then forwards the input to MXNet, returning the output.

This handler class is instantiated for every model loaded into the container, so state in the handler is not shared across models.

### Handling Out Of Memory conditions
If MXNet fails to load the model due to lack of memory, a `MemoryError` is raised. Any time a model cannot be loaded due to lack of memory or any other resource constraint, a `MemoryError` must be raised. MMS will interpret the `MemoryError`, and return a 507 HTTP status code to SageMaker, where SageMaker will initiate unloading unused models to reclaim resources so the requested model can be loaded.

### SageMaker Inference Toolkit
MMS supports [various settings](https://github.com/awslabs/multi-model-server/blob/master/docker/advanced_settings.md#description-of-config-file-settings) for the frontend server it starts.

[SageMaker Inference Toolkit](https://github.com/aws/sagemaker-inference-toolkit) is a library that bootstraps MMS in a way that is compatible with SageMaker multi-model endpoints, while still allowing you to tweak important performance parameters, such as the number of workers per model. The inference container in this example uses the Inference Toolkit to start MMS which can be seen in the __`container/dockerd-entrypoint.py`__ file.

## Building and registering a container using MMS

The shell script below will build a Docker image which uses MMS as the front end (configured through SageMaker Inference Toolkit), and `container/model_handler.py` that we inspected above as the backend handler. It will then upload the image to an ECR repository in your account.

In [14]:
%%sh

# The name of our algorithm
algorithm_name=demo-sagemaker-multimodel

cd container

account=$(aws sts get-caller-identity --query Account --output text)

# Get the region defined in the current configuration (default to us-west-2 if none defined)
region=$(aws configure get region)
# region=${region:-us-west-2}

fullname="${account}.dkr.ecr.${region}.amazonaws.com/${algorithm_name}:latest"

# If the repository doesn't exist in ECR, create it.
aws ecr describe-repositories --repository-names "${algorithm_name}" > /dev/null 2>&1

if [ $? -ne 0 ]
then
    aws ecr create-repository --repository-name "${algorithm_name}" > /dev/null
fi

# Get the login command from ECR and execute it directly
$(aws ecr get-login --region ${region} --no-include-email)

# Build the docker image locally with the image name and then push it to ECR
# with the full name.

docker build -q -t ${algorithm_name} .
docker tag ${algorithm_name} ${fullname}

docker push ${fullname}

Login Succeeded
sha256:e4646af98fa279d3ff01acfc2c44ec7a49bdc01e58d03500b25f0f4ace7f6b85
The push refers to repository [868024899531.dkr.ecr.us-east-2.amazonaws.com/demo-sagemaker-multimodel]
2bdff9425f8f: Preparing
9b8e4227eb2e: Preparing
c9dd188bbf3d: Preparing
ff19f4f3267a: Preparing
ced76b2cd145: Preparing
068b2e491227: Preparing
ef5afc19235f: Preparing
92f15a626238: Preparing
fa1693d66d0b: Preparing
293b479c17a5: Preparing
bd95983a8d99: Preparing
96eda0f553ba: Preparing
293b479c17a5: Waiting
bd95983a8d99: Waiting
96eda0f553ba: Waiting
fa1693d66d0b: Waiting
ced76b2cd145: Waiting
068b2e491227: Waiting
ef5afc19235f: Waiting
92f15a626238: Waiting
2bdff9425f8f: Layer already exists
ff19f4f3267a: Layer already exists
9b8e4227eb2e: Layer already exists
ced76b2cd145: Layer already exists
068b2e491227: Layer already exists
92f15a626238: Layer already exists
fa1693d66d0b: Layer already exists
c9dd188bbf3d: Layer already exists
bd95983a8d99: Layer already exists
96eda0f553ba: Layer already ex

https://docs.docker.com/engine/reference/commandline/login/#credentials-store



## Set up the environment
Define the S3 bucket and prefix where the model artifacts that will be invokable by your multi-model endpoint will be located.

Also define the IAM role that will give SageMaker access to the model artifacts and ECR image that was created above.

In [16]:
# !pip install -qU awscli boto3 sagemaker

In [17]:
import boto3
from sagemaker import get_execution_role

sm_client = boto3.client(service_name='sagemaker')
runtime_sm_client = boto3.client(service_name='sagemaker-runtime')

In [18]:
account_id = boto3.client('sts').get_caller_identity()['Account']
account_id

'868024899531'

In [19]:
region = boto3.Session().region_name
region

'us-east-2'

In [20]:
bucket = 'sagemaker-{}-{}'.format(region, account_id)
bucket

'sagemaker-us-east-2-868024899531'

In [21]:
bucket = 'md-ml-labs-bucket'

In [22]:
prefix = 'DEMO-multimodel-endpoint'

In [23]:
# role = get_execution_role()

In [24]:
# # execute this on aws sagemaker
# role = get_execution_role()

# use this if running sagemaker locally
def resolve_sm_role():
    client = boto3.client('iam', region_name='us-east-2')
    response_roles = client.list_roles(
        PathPrefix='/',
        # Marker='string',
        MaxItems=999
    )
    for role in response_roles['Roles']:
        if role['RoleName'].startswith('AmazonSageMaker-ExecutionRole-'):
            print('Resolved SageMaker IAM Role to: ' + str(role))
            return role['Arn']
    raise Exception('Could not resolve what should be the SageMaker role to be used')

# this is the role created by sagemaker notebook on aws
role_arn = resolve_sm_role()
print(role_arn)
role=role_arn

Resolved SageMaker IAM Role to: {'Path': '/service-role/', 'RoleName': 'AmazonSageMaker-ExecutionRole-20200208T092301', 'RoleId': 'AROA4UGSQ27FVTPNSTGPW', 'Arn': 'arn:aws:iam::868024899531:role/service-role/AmazonSageMaker-ExecutionRole-20200208T092301', 'CreateDate': datetime.datetime(2020, 2, 8, 15, 23, 34, tzinfo=tzlocal()), 'AssumeRolePolicyDocument': {'Version': '2012-10-17', 'Statement': [{'Effect': 'Allow', 'Principal': {'Service': 'sagemaker.amazonaws.com'}, 'Action': 'sts:AssumeRole'}]}, 'Description': 'SageMaker execution role created from the SageMaker AWS Management Console.', 'MaxSessionDuration': 3600}
arn:aws:iam::868024899531:role/service-role/AmazonSageMaker-ExecutionRole-20200208T092301


## Upload model artifacts to S3
In this example we will use pre-trained ResNet 18 and ResNet 152 models, both trained on the ImageNet datset. First we will download the models from MXNet's model zoo, and then upload them to S3.

In [25]:
import mxnet as mx
import os
import tarfile

In [26]:
model_path = 'http://data.mxnet.io/models/imagenet/'

In [27]:
mx.test_utils.download(model_path+'resnet/18-layers/resnet-18-0000.params', None, 'data/resnet_18')
mx.test_utils.download(model_path+'resnet/18-layers/resnet-18-symbol.json', None, 'data/resnet_18')
mx.test_utils.download(model_path+'synset.txt', None, 'data/resnet_18')

'data/resnet_18/synset.txt'

In [28]:
with open('data/resnet_18/resnet-18-shapes.json', 'w') as file:
    file.write('[{"shape": [1, 3, 224, 224], "name": "data"}]')


In [29]:
# create a resnet_18.tar.gz with the content of data/resnet_18 folder
with tarfile.open('data/resnet_18.tar.gz', 'w:gz') as tar:
    tar.add('data/resnet_18', arcname='.')

In [30]:
mx.test_utils.download(model_path+'resnet/152-layers/resnet-152-0000.params', None, 'data/resnet_152')
mx.test_utils.download(model_path+'resnet/152-layers/resnet-152-symbol.json', None, 'data/resnet_152')
mx.test_utils.download(model_path+'synset.txt', None, 'data/resnet_152')

'data/resnet_152/synset.txt'

In [31]:
with open('data/resnet_152/resnet-152-shapes.json', 'w') as file:
    file.write('[{"shape": [1, 3, 224, 224], "name": "data"}]')

In [32]:
# create resnet_152.tar.gz with the content of data/resnet_152 folder    
with tarfile.open('data/resnet_152.tar.gz', 'w:gz') as tar:
    tar.add('data/resnet_152', arcname='.')

In [33]:

# upload the two model artifacts created above to our bucket
# this takes a wile, about 300 Mb
from botocore.client import ClientError
import os

s3 = boto3.resource('s3')
try:
    s3.meta.client.head_bucket(Bucket=bucket)
except ClientError:
    s3.create_bucket(
        Bucket=bucket,
        CreateBucketConfiguration={
            'LocationConstraint': region
        })

models = {'resnet_18.tar.gz', 'resnet_152.tar.gz'}

for model in models:
    key = os.path.join(prefix, model)
    with open('data/'+model, 'rb') as file_obj:
        s3.Bucket(bucket).Object(key).upload_fileobj(file_obj)

## Create a multi-model endpoint
### Import models into hosting
When creating the Model entity for multi-model endpoints, the container's `ModelDataUrl` is the S3 prefix where the model artifacts that are invokable by the endpoint are located. The rest of the S3 path will be specified when invoking the model.

The `Mode` of container is specified as `MultiModel` to signify that the container will host multiple models.

In [44]:
from time import gmtime, strftime

model_name = 'DEMO-MultiModelModel-' + strftime("%Y-%m-%d-%H-%M-%S", gmtime())
model_url = 'https://s3-{}.amazonaws.com/{}/{}/'.format(region, bucket, prefix)
container = '{}.dkr.ecr.{}.amazonaws.com/{}:latest'.format(account_id, region, 'demo-sagemaker-multimodel')

print('Model name: ' + model_name)
print('Model data Url: ' + model_url)
print('Container image: ' + container)

Model name: DEMO-MultiModelModel-2020-02-09-18-11-42
Model data Url: https://s3-us-east-2.amazonaws.com/md-ml-labs-bucket/DEMO-multimodel-endpoint/
Container image: 868024899531.dkr.ecr.us-east-2.amazonaws.com/demo-sagemaker-multimodel:latest


In [45]:
container_param = {
    'Image': container,
    'ModelDataUrl': model_url,
    'Mode': 'MultiModel'
}
container_param

{'Image': '868024899531.dkr.ecr.us-east-2.amazonaws.com/demo-sagemaker-multimodel:latest',
 'ModelDataUrl': 'https://s3-us-east-2.amazonaws.com/md-ml-labs-bucket/DEMO-multimodel-endpoint/',
 'Mode': 'MultiModel'}

reference:  
[Create Model in AWS docs](https://docs.aws.amazon.com/goto/WebAPI/sagemaker-2017-07-24/CreateModel)

In [46]:
import json

create_model_response = sm_client.create_model(
    ModelName = model_name,
    ExecutionRoleArn = role,
    Containers = [container_param])

In [47]:
print("Model Arn: " + create_model_response['ModelArn'])

Model Arn: arn:aws:sagemaker:us-east-2:868024899531:model/demo-multimodelmodel-2020-02-09-18-11-42


### Create endpoint configuration
Endpoint config creation works the same way it does as single model endpoints.

In [48]:
endpoint_config_name = 'DEMO-MultiModelEndpointConfig-' + strftime("%Y-%m-%d-%H-%M-%S", gmtime())
print('Endpoint config name: ' + endpoint_config_name)

Endpoint config name: DEMO-MultiModelEndpointConfig-2020-02-09-18-14-02


In [49]:
create_endpoint_config_response = sm_client.create_endpoint_config(
    EndpointConfigName = endpoint_config_name,
    ProductionVariants=[{
        'InstanceType': 'ml.m5.xlarge',
        'InitialInstanceCount': 2,
        'InitialVariantWeight': 1,
        'ModelName': model_name,
        'VariantName': 'AllTraffic'}])

print("Endpoint config Arn: " + create_endpoint_config_response['EndpointConfigArn'])

Endpoint config Arn: arn:aws:sagemaker:us-east-2:868024899531:endpoint-config/demo-multimodelendpointconfig-2020-02-09-18-14-02


### Create endpoint
Similarly, endpoint creation works the same way as for single model endpoints.

In [50]:
import time

endpoint_name = 'DEMO-MultiModelEndpoint-' + strftime("%Y-%m-%d-%H-%M-%S", gmtime())
print('Endpoint name: ' + endpoint_name)

Endpoint name: DEMO-MultiModelEndpoint-2020-02-09-18-16-40


In [51]:
create_endpoint_response = sm_client.create_endpoint(
    EndpointName=endpoint_name,
    EndpointConfigName=endpoint_config_name)
print('Endpoint Arn: ' + create_endpoint_response['EndpointArn'])

Endpoint Arn: arn:aws:sagemaker:us-east-2:868024899531:endpoint/demo-multimodelendpoint-2020-02-09-18-16-40


In [52]:
resp = sm_client.describe_endpoint(EndpointName=endpoint_name)
status = resp['EndpointStatus']
print("Endpoint Status: " + status)

Endpoint Status: Creating


In [53]:
print('Waiting for {} endpoint to be in service...'.format(endpoint_name))
waiter = sm_client.get_waiter('endpoint_in_service')
waiter.wait(EndpointName=endpoint_name)

Waiting for DEMO-MultiModelEndpoint-2020-02-09-18-16-40 endpoint to be in service...


## Invoke models
Now we invoke the models that we uploaded to S3 previously. The first invocation of a model may be slow, since behind the scenes, SageMaker is downloading the model artifacts from S3 to the instance and loading it into the container.

First we will download an image of a cat as the payload to invoke the model, then call InvokeEndpoint to invoke the ResNet 18 model. The `TargetModel` field is concatenated with the S3 prefix specified in `ModelDataUrl` when creating the model, to generate the location of the model in S3.

In [54]:
fname = mx.test_utils.download('https://github.com/dmlc/web-data/blob/master/mxnet/doc/tutorials/python/predict_image/cat.jpg?raw=true', 'cat.jpg')

In [55]:
fname

'cat.jpg'

In [57]:
with open(fname, 'rb') as f:
    payload = f.read()


In [58]:
%%time

import json

response = runtime_sm_client.invoke_endpoint(
    EndpointName=endpoint_name,
    ContentType='application/x-image',
    TargetModel='resnet_18.tar.gz', # this is the rest of the S3 path where the model artifacts are located
    Body=payload)

print(*json.loads(response['Body'].read()), sep = '\n')

probability=0.244390, class=n02119022 red fox, Vulpes vulpes
probability=0.170341, class=n02119789 kit fox, Vulpes macrotis
probability=0.145019, class=n02113023 Pembroke, Pembroke Welsh corgi
probability=0.059833, class=n02356798 fox squirrel, eastern fox squirrel, Sciurus niger
probability=0.051555, class=n02123159 tiger cat
CPU times: user 13.1 ms, sys: 1.73 ms, total: 14.8 ms
Wall time: 4.89 s


When we invoke the same ResNet 18 model a 2nd time, it is already downloaded to the instance and loaded in the container, so inference is faster.

In [59]:
%%time

response = runtime_sm_client.invoke_endpoint(
    EndpointName=endpoint_name,
    ContentType='application/x-image',
    TargetModel='resnet_18.tar.gz',
    Body=payload)

print(*json.loads(response['Body'].read()), sep = '\n')

probability=0.244390, class=n02119022 red fox, Vulpes vulpes
probability=0.170341, class=n02119789 kit fox, Vulpes macrotis
probability=0.145019, class=n02113023 Pembroke, Pembroke Welsh corgi
probability=0.059833, class=n02356798 fox squirrel, eastern fox squirrel, Sciurus niger
probability=0.051555, class=n02123159 tiger cat
CPU times: user 4.12 ms, sys: 2.18 ms, total: 6.3 ms
Wall time: 2.44 s


In [60]:
response

{'ResponseMetadata': {'RequestId': '08ad9834-f417-43fa-b97a-9bb3cf22f7df',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'x-amzn-requestid': '08ad9834-f417-43fa-b97a-9bb3cf22f7df',
   'x-amzn-invoked-production-variant': 'AllTraffic',
   'date': 'Sun, 9 Feb 2020 18:27:07 GMT',
   'content-type': 'application/json',
   'content-length': '356'},
  'RetryAttempts': 0},
 'ContentType': 'application/json',
 'InvokedProductionVariant': 'AllTraffic',
 'Body': <botocore.response.StreamingBody at 0x7f94e53e9ac8>}

### Invoke another model
Exercising the power of a multi-model endpoint, we can specify a different model (resnet_152.tar.gz) as `TargetModel` and perform inference on it using the same endpoint.

In [61]:
%%time

response = runtime_sm_client.invoke_endpoint(
    EndpointName=endpoint_name,
    ContentType='application/x-image',
    TargetModel='resnet_152.tar.gz',
    Body=payload)

print(*json.loads(response['Body'].read()), sep = '\n')

probability=0.386026, class=n02119022 red fox, Vulpes vulpes
probability=0.300927, class=n02119789 kit fox, Vulpes macrotis
probability=0.029575, class=n02123045 tabby, tabby cat
probability=0.026005, class=n02123159 tiger cat
probability=0.023201, class=n02113023 Pembroke, Pembroke Welsh corgi
CPU times: user 4.9 ms, sys: 1.73 ms, total: 6.63 ms
Wall time: 8.71 s


### Add models to the endpoint
We can add more models to the endpoint without having to update the endpoint. Below we are adding a 3rd model, `squeezenet_v1.0`. To demonstrate hosting multiple models behind the endpoint, this model is duplicated 10 times with a slightly different name in S3. In a more realistic scenario, these could be 10 new different models.

In [62]:
mx.test_utils.download(model_path+'squeezenet/squeezenet_v1.0-0000.params', None, 'data/squeezenet_v1.0')
mx.test_utils.download(model_path+'squeezenet/squeezenet_v1.0-symbol.json', None, 'data/squeezenet_v1.0')
mx.test_utils.download(model_path+'synset.txt', None, 'data/squeezenet_v1.0')

with open('data/squeezenet_v1.0/squeezenet_v1.0-shapes.json', 'w') as file:
    file.write('[{"shape": [1, 3, 224, 224], "name": "data"}]')
    
with tarfile.open('data/squeezenet_v1.0.tar.gz', 'w:gz') as tar:
    tar.add('data/squeezenet_v1.0', arcname='.')

In [63]:
models

{'resnet_152.tar.gz', 'resnet_18.tar.gz'}

In [64]:
file = 'data/squeezenet_v1.0.tar.gz'

for x in range(0, 10):
    s3_file_name = 'demo-subfolder/squeezenet_v1.0_{}.tar.gz'.format(x)
    key = os.path.join(prefix, s3_file_name)
    with open(file, 'rb') as file_obj:
        s3.Bucket(bucket).Object(key).upload_fileobj(file_obj)
    models.add(s3_file_name)

print('Number of models: {}'.format(len(models)))
print('Models: {}'.format(models))

Number of models: 12
Models: {'demo-subfolder/squeezenet_v1.0_3.tar.gz', 'demo-subfolder/squeezenet_v1.0_5.tar.gz', 'demo-subfolder/squeezenet_v1.0_8.tar.gz', 'demo-subfolder/squeezenet_v1.0_2.tar.gz', 'demo-subfolder/squeezenet_v1.0_7.tar.gz', 'resnet_152.tar.gz', 'resnet_18.tar.gz', 'demo-subfolder/squeezenet_v1.0_6.tar.gz', 'demo-subfolder/squeezenet_v1.0_9.tar.gz', 'demo-subfolder/squeezenet_v1.0_0.tar.gz', 'demo-subfolder/squeezenet_v1.0_1.tar.gz', 'demo-subfolder/squeezenet_v1.0_4.tar.gz'}


After uploading the SqueezeNet models to S3, we will invoke the endpoint 100 times, randomly choosing from one of the 12 models behind the S3 prefix for each invocation, and keeping a count of the label with the highest probability on each invoke response.

In [65]:
%%time

import random
from collections import defaultdict

results = defaultdict(int)
results

CPU times: user 7 µs, sys: 2 µs, total: 9 µs
Wall time: 12.6 µs


In [68]:
for x in range(0, 100):
    target_model = random.choice(tuple(models))
    response = runtime_sm_client.invoke_endpoint(
        EndpointName=endpoint_name,
        ContentType='application/x-image',
        TargetModel=target_model,
        Body=payload)

    results[json.loads(response['Body'].read())[0]] += 1
    
print(*results.items(), sep = '\n')

('probability=0.294885, class=n02326432 hare', 247)
('probability=0.244390, class=n02119022 red fox, Vulpes vulpes', 30)
('probability=0.386026, class=n02119022 red fox, Vulpes vulpes', 23)


### Updating a model
To update a model, you would follow the same approach as above and add it as a new model. For example, if you have retrained the `resnet_18.tar.gz` model and wanted to start invoking it, you would upload the updated model artifacts behind the S3 prefix with a new name such as `resnet_18_v2.tar.gz`, and then change the `TargetModel` field to invoke `resnet_18_v2.tar.gz` instead of `resnet_18.tar.gz`. You do not want to overwrite the model artifacts in Amazon S3, because the old version of the model might still be loaded in the containers or on the storage volume of the instances on the endpoint. Invocations to the new model could then invoke the old version of the model.

## (Optional) Delete the hosting resources

In [69]:
# sm_client.delete_endpoint(EndpointName=endpoint_name)
# sm_client.delete_endpoint_config(EndpointConfigName=endpoint_config_name)
# sm_client.delete_model(ModelName=model_name)

{'ResponseMetadata': {'RequestId': 'b2ffb3e9-ec53-4d0e-8326-3f8469c5dd95',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'x-amzn-requestid': 'b2ffb3e9-ec53-4d0e-8326-3f8469c5dd95',
   'content-type': 'application/x-amz-json-1.1',
   'content-length': '0',
   'date': 'Sun, 09 Feb 2020 18:48:26 GMT'},
  'RetryAttempts': 0}}