# Amazon SageMaker Multi-Model Endpoints using Scikit Learn
With [Amazon SageMaker multi-model endpoints](https://docs.aws.amazon.com/sagemaker/latest/dg/multi-model-endpoints.html), customers can create an endpoint that seamlessly hosts up to thousands of models. These endpoints are well suited to use cases where any one of a large number of models, which can be served from a common inference container, needs to be invokable on-demand and where it is acceptable for infrequently invoked models to incur some additional latency. For applications which require consistently low inference latency, a traditional endpoint is still the best choice.

At a high level, Amazon SageMaker manages the loading and unloading of models for a multi-model endpoint, as they are needed. When an invocation request is made for a particular model, Amazon SageMaker routes the request to an instance assigned to that model, downloads the model artifacts from S3 onto that instance, and initiates loading of the model into the memory of the container. As soon as the loading is complete, Amazon SageMaker performs the requested invocation and returns the result. If the model is already loaded in memory on the selected instance, the downloading and loading steps are skipped and the invocation is performed immediately.

To demonstrate how multi-model endpoints are created and used, this notebook provides an example using a set of Scikit Learn models that each predict housing prices for a single location. This domain is used as a simple example to easily experiment with multi-model endpoints.

The Amazon SageMaker multi-model endpoint capability is designed to work across all machine learning frameworks and algorithms including those where you bring your own container.

### Contents

1. [Build and register a Scikit Learn container that can serve multiple models](#Build-and-register-a-Scikit-Learn-container-that-can-serve-multiple-models)
1. [Generate synthetic data for housing models](#Generate-synthetic-data-for-housing-models)
1. [Train multiple house value prediction models](#Train-multiple-house-value-prediction-models)
1. [Import models into hosting](#Import-models-into-hosting)
  1. [Deploy model artifacts to be found by the endpoint](#Deploy-model-artifacts-to-be-found-by-the-endpoint)
  1. [Create the Amazon SageMaker model entity](#Create-the-Amazon-SageMaker-model-entity)
  1. [Create the multi-model endpoint](#Create-the-multi-model-endpoint)
1. [Exercise the multi-model endpoint](#Exercise-the-multi-model-endpoint)
  1. [Dynamically deploy another model](#Dynamically-deploy-another-model)
  1. [Invoke the newly deployed model](#Invoke-the-newly-deployed-model)
  1. [Updating a model](#Updating-a-model)
1. [Clean up](#Clean-up)

## Build and register a Scikit Learn container that can serve multiple models

In [None]:
!pip install -qU awscli boto3 sagemaker

For the inference container to serve multiple models in a multi-model endpoint, it must implement [additional APIs](https://docs.aws.amazon.com/sagemaker/latest/dg/build-multi-model-build-container.html) in order to load, list, get, unload and invoke specific models.

The ['mme' branch of the SageMaker Scikit Learn Container repository](https://github.com/aws/sagemaker-scikit-learn-container/tree/mme) is an example implementation on how to adapt SageMaker's Scikit Learn framework container to use [Multi Model Server](https://github.com/awslabs/multi-model-server), a framework that provides an HTTP frontend that implements the additional container APIs required by multi-model endpoints, and also provides a pluggable backend handler for serving models using a custom framework, in this case the Scikit Learn framework.

Using this branch, below we will build a Scikit Learn container image that fulfills all of the multi-model endpoint container requirements, and then upload that image to Amazon Elastic Container Registry (ECR). Because uploading the image to ECR may create a new ECR repository, this notebook requires permissions in addition to the regular SageMakerFullAccess permissions. The easiest way to add these permissions is simply to add the managed policy AmazonEC2ContainerRegistryFullAccess to the role that you used to start your notebook instance. There's no need to restart your notebook instance when you do this, the new permissions will be available immediately.

In [None]:
ALGORITHM_NAME = 'multi-model-sklearn'

In [None]:
%%sh -s $ALGORITHM_NAME

algorithm_name=$1

account=$(aws sts get-caller-identity --query Account --output text)

# Get the region defined in the current configuration (default to us-west-2 if none defined)
region=$(aws configure get region)

ecr_image="${account}.dkr.ecr.${region}.amazonaws.com/${algorithm_name}:latest"

# If the repository doesn't exist in ECR, create it.
aws ecr describe-repositories --repository-names "${algorithm_name}" > /dev/null 2>&1

if [ $? -ne 0 ]
then
    aws ecr create-repository --repository-name "${algorithm_name}" > /dev/null
fi

# Get the login command from ECR and execute it directly
$(aws ecr get-login --region ${region} --no-include-email --registry-ids ${account})

# Build the docker image locally with the image name and then push it to ECR
# with the full image name.

# First clear out any prior version of the cloned repo
rm -rf sagemaker-scikit-learn-container/

# Clone the sklearn container repo
git clone --single-branch --branch mme https://github.com/aws/sagemaker-scikit-learn-container.git
cd sagemaker-scikit-learn-container/

# Build the "base" container image that encompasses the installation of the
# scikit-learn framework and all of the dependencies needed.
docker build -q -t sklearn-base:0.20-2-cpu-py3 -f docker/0.20-2/base/Dockerfile.cpu --build-arg py_version=3 .

# Create the SageMaker Scikit-learn Container Python package.
python setup.py bdist_wheel --universal

# Build the "final" container image that encompasses the installation of the
# code that implements the SageMaker multi-model container requirements.
docker build -q -t ${algorithm_name} -f docker/0.20-2/final/Dockerfile.cpu .

docker tag ${algorithm_name} ${ecr_image}

docker push ${ecr_image}

## Generate synthetic data for housing models

In [None]:
import numpy as np
import pandas as pd
import json
import datetime
import time
from time import gmtime, strftime
import matplotlib.pyplot as plt

In [None]:
NUM_HOUSES_PER_LOCATION = 1000
LOCATIONS  = ['NewYork_NY',    'LosAngeles_CA',   'Chicago_IL',    'Houston_TX',   'Dallas_TX',
              'Phoenix_AZ',    'Philadelphia_PA', 'SanAntonio_TX', 'SanDiego_CA',  'SanFrancisco_CA']
PARALLEL_TRAINING_JOBS = 4 # len(LOCATIONS) if your account limits can handle it
MAX_YEAR = 2019

In [None]:
def gen_price(house):
    _base_price = int(house['SQUARE_FEET'] * 150)
    _price = int(_base_price + (10000 * house['NUM_BEDROOMS']) + \
                               (15000 * house['NUM_BATHROOMS']) + \
                               (15000 * house['LOT_ACRES']) + \
                               (15000 * house['GARAGE_SPACES']) - \
                               (5000 * (MAX_YEAR - house['YEAR_BUILT'])))
    return _price

In [None]:
def gen_random_house():
    _house = {'SQUARE_FEET':   int(np.random.normal(3000, 750)),
              'NUM_BEDROOMS':  np.random.randint(2, 7),
              'NUM_BATHROOMS': np.random.randint(2, 7) / 2,
              'LOT_ACRES':     round(np.random.normal(1.0, 0.25), 2),
              'GARAGE_SPACES': np.random.randint(0, 4),
              'YEAR_BUILT':    min(MAX_YEAR, int(np.random.normal(1995, 10)))}
    _price = gen_price(_house)
    return [_price, _house['YEAR_BUILT'],   _house['SQUARE_FEET'], 
                    _house['NUM_BEDROOMS'], _house['NUM_BATHROOMS'], 
                    _house['LOT_ACRES'],    _house['GARAGE_SPACES']]

In [None]:
COLUMNS = ['PRICE', 'YEAR_BUILT', 'SQUARE_FEET', 'NUM_BEDROOMS',
           'NUM_BATHROOMS', 'LOT_ACRES', 'GARAGE_SPACES']
def gen_houses(num_houses):
    _house_list = []
    for i in range(num_houses):
        _house_list.append(gen_random_house())
    _df = pd.DataFrame(_house_list, 
                       columns=COLUMNS)
    return _df

## Train multiple house value prediction models

In [None]:
import sagemaker
from sagemaker import get_execution_role
from sagemaker.predictor import csv_serializer
import boto3

sm_client = boto3.client(service_name='sagemaker')
runtime_sm_client = boto3.client(service_name='sagemaker-runtime')

s3 = boto3.resource('s3')
s3_client = boto3.client('s3')

sagemaker_session = sagemaker.Session()
role = get_execution_role()

ACCOUNT_ID  = boto3.client('sts').get_caller_identity()['Account']
REGION      = boto3.Session().region_name
BUCKET      = sagemaker_session.default_bucket()
SCRIPT_FILENAME     = 'script.py'
USER_CODE_ARTIFACTS = 'user_code.tar.gz'

MULTI_MODEL_SKLEARN_IMAGE = '{}.dkr.ecr.{}.amazonaws.com/{}:latest'.format(ACCOUNT_ID, REGION, 
                                                                           ALGORITHM_NAME)

DATA_PREFIX            = 'DEMO_MME_SCIKIT'
HOUSING_MODEL_NAME     = 'housing'
MULTI_MODEL_ARTIFACTS  = 'multi_model_artifacts'

TRAIN_INSTANCE_TYPE    = 'ml.m4.xlarge'
ENDPOINT_INSTANCE_TYPE = 'ml.m4.xlarge'

### Split a given dataset into train, validation, and test

In [None]:
from sklearn.model_selection import train_test_split
SEED = 7
SPLIT_RATIOS = [0.6, 0.3, 0.1]

def split_data(df):
    # split data into train and test sets
    seed      = SEED
    val_size  = SPLIT_RATIOS[1]
    test_size = SPLIT_RATIOS[2]
    
    num_samples = df.shape[0]
    X1 = df.values[:num_samples, 1:] # keep only the features, skip the target, all rows
    Y1 = df.values[:num_samples, :1] # keep only the target, all rows

    # Use split ratios to divide up into train/val/test
    X_train, X_val, y_train, y_val = \
        train_test_split(X1, Y1, test_size=(test_size + val_size), random_state=seed)
    # Of the remaining non-training samples, give proper ratio to validation and to test
    X_test, X_test, y_test, y_test = \
        train_test_split(X_val, y_val, test_size=(test_size / (test_size + val_size)), 
                         random_state=seed)
    # reassemble the datasets with target in first column and features after that
    _train = np.concatenate([y_train, X_train], axis=1)
    _val   = np.concatenate([y_val,   X_val],   axis=1)
    _test  = np.concatenate([y_test,  X_test],  axis=1)

    return _train, _val, _test

### Launch a single training job for a given housing location
There is nothing specific to multi-model endpoints in terms of the models it will host. They are trained in the same way as all other SageMaker models. Here we are using the Scikit Learn estimator and not waiting for the job to complete.

In [None]:
%%writefile $SCRIPT_FILENAME

import argparse
import os
import glob

import numpy as np
import pandas as pd
from sklearn.ensemble import RandomForestRegressor
from sklearn.externals import joblib

# inference functions ---------------
def model_fn(model_dir):
    print('loading model.joblib from: {}'.format(model_dir))
    _loaded_model = joblib.load(os.path.join(model_dir, 'model.joblib'))
    return _loaded_model


if __name__ =='__main__':

    print('extracting arguments')
    parser = argparse.ArgumentParser()

    # hyperparameters sent by the client are passed as command-line arguments to the script.
    # to simplify the demo we don't use all sklearn RandomForest hyperparameters
    parser.add_argument('--n-estimators', type=int, default=10)
    parser.add_argument('--min-samples-leaf', type=int, default=3)

    # Data, model, and output directories
    parser.add_argument('--model-dir', type=str, default=os.environ.get('SM_MODEL_DIR'))
    parser.add_argument('--train', type=str, default=os.environ.get('SM_CHANNEL_TRAIN'))
    parser.add_argument('--validation', type=str, default=os.environ.get('SM_CHANNEL_VALIDATION'))
    parser.add_argument('--model-name', type=str)

    args, _ = parser.parse_known_args()

    print('reading data')
    print('model_name: {}'.format(args.model_name))

    train_file = os.path.join(args.train, args.model_name + '_train.csv')    
    train_df = pd.read_csv(train_file)

    val_file = os.path.join(args.validation, args.model_name + '_val.csv')
    test_df = pd.read_csv(os.path.join(val_file))

    print('building training and testing datasets')
    X_train = train_df[train_df.columns[1:train_df.shape[1]]] 
    X_test = test_df[test_df.columns[1:test_df.shape[1]]]
    y_train = train_df[train_df.columns[0]]
    y_test = test_df[test_df.columns[0]]

    # train
    print('training model')
    model = RandomForestRegressor(
        n_estimators=args.n_estimators,
        min_samples_leaf=args.min_samples_leaf,
        n_jobs=-1)
    
    model.fit(X_train, y_train)

    # print abs error
    print('validating model')
    abs_err = np.abs(model.predict(X_test) - y_test)

    # print couple perf metrics
    for q in [10, 50, 90]:
        print('AE-at-' + str(q) + 'th-percentile: '
              + str(np.percentile(a=abs_err, q=q)))
        
    # persist model
    path = os.path.join(args.model_dir, 'model.joblib')
    joblib.dump(model, path)
    print('model persisted at ' + path)

In [None]:
#  can test the model locally 
# ! python script.py --n-estimators 100 \
#                    --min-samples-leaf 2 \
#                    --model-dir ./ \
#                    --model-name 'NewYork_NY' \
#                    --train ./data/NewYork_NY/train/ \
#                    --validation ./data/NewYork_NY/val/
# from sklearn.externals import joblib
# regr = joblib.load('./model.joblib')
# _start_time = time.time()
# regr.predict([[0.0, 0.0, 0.0, 0.0, 0.0, 0.0]])
# _duration = time.time() - _start_time
# print('took {:,d} ms'.format(int(_duration * 1000)))

In [None]:
from sagemaker.sklearn.estimator import SKLearn

def launch_training_job(location):
    # clear out old versions of the data
    _s3_bucket = s3.Bucket(BUCKET)
    _full_input_prefix = '{}/model_prep/{}'.format(DATA_PREFIX, location)
    _s3_bucket.objects.filter(Prefix=_full_input_prefix + '/').delete()

    # upload the entire set of data for all three channels
    _local_folder = 'data/{}'.format(location)
    _inputs = sagemaker_session.upload_data(path=_local_folder, 
                                            key_prefix=_full_input_prefix)
    print('Training data uploaded: {}'.format(_inputs))
    
    _job = 'mme-{}'.format(location.replace('_', '-'))
    _full_output_prefix = '{}/model_artifacts/{}'.format(DATA_PREFIX, 
                                                        location)
    _s3_output_path = 's3://{}/{}'.format(BUCKET, _full_output_prefix)

    _estimator = SKLearn(
         entry_point=SCRIPT_FILENAME, role=role,
         train_instance_count=1, train_instance_type=TRAIN_INSTANCE_TYPE,
         framework_version='0.20.0',
         output_path=_s3_output_path,
         base_job_name=_job,
         metric_definitions=[
             {'Name' : 'median-AE',
              'Regex': 'AE-at-50th-percentile: ([0-9.]+).*$'}],
         hyperparameters = {'n-estimators'    : 100,
                            'min-samples-leaf': 3,
                            'model-name'      : location})
    
    DISTRIBUTION_MODE = 'FullyReplicated'
    _train_input = sagemaker.s3_input(s3_data=_inputs+'/train', 
                                      distribution=DISTRIBUTION_MODE, content_type='csv')
    _val_input   = sagemaker.s3_input(s3_data=_inputs+'/val', 
                                      distribution=DISTRIBUTION_MODE, content_type='csv')
    _remote_inputs = {'train': _train_input, 'validation': _val_input}

    _estimator.fit(_remote_inputs, wait=False)
    
    return _estimator.latest_training_job.name

### Kick off a model training job for each housing location

In [None]:
def save_data_locally(location, train, val, test):
    _header = ','.join(COLUMNS)
    
    os.makedirs('data/{}/train'.format(location))
    np.savetxt( 'data/{0}/train/{0}_train.csv'.format(location), train, delimiter=',', fmt='%.2f')
    
    os.makedirs('data/{}/val'.format(location))
    np.savetxt( 'data/{0}/val/{0}_val.csv'.format(location),     val,   delimiter=',', fmt='%.2f')
    
    os.makedirs('data/{}/test'.format(location))
    np.savetxt( 'data/{0}/test/{0}_test.csv'.format(location),   test,  delimiter=',', fmt='%.2f')

In [None]:
import shutil
import os

training_jobs = []

shutil.rmtree('data', ignore_errors=True)

for loc in LOCATIONS[:PARALLEL_TRAINING_JOBS]:
    _houses = gen_houses(NUM_HOUSES_PER_LOCATION)
    _train, _val, _test = split_data(_houses)
    save_data_locally(loc, _train, _val, _test)
    _job = launch_training_job(loc)
    training_jobs.append(_job)
print('{} training jobs launched: {}'.format(len(training_jobs), training_jobs))

### Wait for all model training to finish

In [None]:
def wait_for_training_job_to_complete(job_name):
    print('Waiting for job {} to complete...'.format(job_name))
    _resp   = sm_client.describe_training_job(TrainingJobName=job_name)
    _status = _resp['TrainingJobStatus']
    while _status=='InProgress':
        time.sleep(60)
        _resp   = sm_client.describe_training_job(TrainingJobName=job_name)
        _status = _resp['TrainingJobStatus']
        if _status == 'InProgress':
            print('{} job status: {}'.format(job_name, _status))
    print('DONE. Status for {} is {}\n'.format(job_name, _status))

In [None]:
# wait for the jobs to finish
for j in training_jobs:
    wait_for_training_job_to_complete(j)

## Import models into hosting
When creating the Model entity for a multi-model endpoint, the `ModelDataUrl` of the ContainerDefinition is treated as an S3 prefix for the model artifacts that will be loaded on-demand. The rest of the S3 path will be specified in the `InvokeEndpoint` request. Remember to close the location with a trailing slash.

The `Mode` of container is specified as `MultiModel` to signify that the container will host multiple models.

### Deploy model artifacts to be found by the endpoint
As described above, the multi-model endpoint is configured to find its model artifacts in a specific location in S3. For each trained model, we make a copy of its model artifacts into that location.

In our example, we are storing all the models within a single folder. The implementation of multi-model endpoints is flexible enough to permit an arbitrary folder structure. For a set of housing models for example, you could have a top level folder for each region, and the model artifacts would be copied to those regional folders. The target model referenced when invoking such a model would include the folder path. For example, `northeast/Boston_MA.tar.gz`.

In [None]:
import re
def parse_model_artifacts(model_data_url):
    # extract the s3 key from the full url to the model artifacts
    _s3_key = model_data_url.split('s3://{}/'.format(BUCKET))[1]
    # get the part of the key that identifies the model within the model artifacts folder
    _model_name_plus = _s3_key[_s3_key.find('model_artifacts') + len('model_artifacts') + 1:]
    # finally, get the unique model name (e.g., "NewYork_NY")
    _model_name = re.findall('^(.*?)/', _model_name_plus)[0]
    return _s3_key, _model_name 

In [None]:
# make a copy of the model artifacts from the original output of the training job to the place in
# s3 where the multi model endpoint will dynamically load individual models
def deploy_artifacts_to_mme(job_name):
    _resp = sm_client.describe_training_job(TrainingJobName=job_name)
    _source_s3_key, _model_name = parse_model_artifacts(_resp['ModelArtifacts']['S3ModelArtifacts'])
    _copy_source = {'Bucket': BUCKET, 'Key': _source_s3_key}
    _key = '{}/{}/{}.tar.gz'.format(DATA_PREFIX, MULTI_MODEL_ARTIFACTS, _model_name)
    
    print('Copying {} model\n   from: {}\n     to: {}...'.format(_model_name, _source_s3_key, _key))
    s3_client.copy_object(Bucket=BUCKET, CopySource=_copy_source, Key=_key)
    return _key

Note that we are purposely *not* copying the first model. This will be copied later in the notebook to demonstrate how to dynamically add new models to an already running endpoint.

In [None]:
# First, clear out old versions of the model artifacts from previous runs of this notebook
s3 = boto3.resource('s3')
s3_bucket = s3.Bucket(BUCKET)
full_input_prefix = '{}/multi_model_artifacts'.format(DATA_PREFIX)
print('Removing old model artifacts from {}'.format(full_input_prefix))
filter_resp = s3_bucket.objects.filter(Prefix=full_input_prefix + '/').delete()

In [None]:
# copy every model except the first one
for job in training_jobs[1:]:
    deploy_artifacts_to_mme(job)

### Create the Amazon SageMaker model entity
Here we use `boto3` to create the model entity. Instead of describing a single model, it will indicate the use of multi-model semantics and will identify the source location of all specific model artifacts.

In [None]:
# When using multi-model endpoints with the Scikit Learn container, we need to provide an entry point for
# inference that will at least load the saved model. This function uploads a model artifact containing such a
# script. This tar.gz file will be fed to the SageMaker multi-model creation and pointed to by the 
# SAGEMAKER_SUBMIT_DIRECTORY environment variable.

def upload_inference_code(script_file_name, prefix):
    _tmp_folder = 'inference-code'
    if not os.path.exists(_tmp_folder):
        os.makedirs(_tmp_folder)
    !tar -czvf $_tmp_folder/$USER_CODE_ARTIFACTS $script_file_name > /dev/null
    _loc = sagemaker_session.upload_data(_tmp_folder, 
                                         key_prefix='{}/{}'.format(prefix, _tmp_folder))
    return _loc + '/' + USER_CODE_ARTIFACTS

In [None]:
def create_multi_model_entity(multi_model_name, role):
    # establish the place in S3 from which the endpoint will pull individual models
    _model_url  = 's3://{}/{}/{}/'.format(BUCKET, DATA_PREFIX, MULTI_MODEL_ARTIFACTS)
    _container = {
        'Image':        MULTI_MODEL_SKLEARN_IMAGE,
        'ModelDataUrl': _model_url,
        'Mode':         'MultiModel',
        'Environment': {
            'SAGEMAKER_PROGRAM' : SCRIPT_FILENAME,
            'SAGEMAKER_SUBMIT_DIRECTORY' : upload_inference_code(SCRIPT_FILENAME, DATA_PREFIX)
        }
    }
    create_model_response = sm_client.create_model(
        ModelName = multi_model_name,
        ExecutionRoleArn = role,
        Containers = [_container])
    
    return _model_url

In [None]:
multi_model_name = '{}-{}'.format(HOUSING_MODEL_NAME, strftime('%Y-%m-%d-%H-%M-%S', gmtime()))
model_url = create_multi_model_entity(multi_model_name, role)
print('Multi model name: {}'.format(multi_model_name))

In [None]:
print('Here are the models that the endpoint has at its disposal:')
!aws s3 ls --human-readable --summarize $model_url

### Create the multi-model endpoint
There is nothing special about the SageMaker endpoint config for a multi-model endpoint. You need to consider the appropriate instance type and number of instances for the projected prediction workload. The number and size of the individual models will drive memory requirements.

Once the endpoint config is in place, the endpoint creation is straightforward.

In [None]:
endpoint_config_name = multi_model_name
print('Endpoint config name: ' + endpoint_config_name)

create_endpoint_config_response = sm_client.create_endpoint_config(
    EndpointConfigName = endpoint_config_name,
    ProductionVariants=[{
        'InstanceType': ENDPOINT_INSTANCE_TYPE,
        'InitialInstanceCount': 1,
        'InitialVariantWeight': 1,
        'ModelName'   : multi_model_name,
        'VariantName' : 'AllTraffic'}])

endpoint_name = multi_model_name
print('Endpoint name: ' + endpoint_name)

In [None]:
create_endpoint_response = sm_client.create_endpoint(
    EndpointName=endpoint_name,
    EndpointConfigName=endpoint_config_name)
print('Endpoint Arn: ' + create_endpoint_response['EndpointArn'])

In [None]:
print('Waiting for {} endpoint to be in service...'.format(endpoint_name))
waiter = sm_client.get_waiter('endpoint_in_service')
waiter.wait(EndpointName=endpoint_name)

## Exercise the multi-model endpoint

### Invoke multiple individual models hosted behind a single endpoint
Here we iterate through a set of housing predictions, choosing the specific location-based housing model at random. Notice the cold start price paid for the first invocation of any given model. Subsequent invocations of the same model take advantage of the model already being loaded into memory.

In [None]:
def predict_one_house_value(features, model_name):
    print('Using model {} to predict price of this house: {}'.format(full_model_name,
                                                                     features))

    _float_features = [float(i) for i in features]
    _body = ','.join(map(str, _float_features)) + '\n'
    
    _start_time = time.time()

    _response = runtime_sm_client.invoke_endpoint(
                        EndpointName=endpoint_name,
                        ContentType='text/csv',
                        TargetModel=full_model_name,
                        Body=_body)
    _predicted_value = json.loads(_response['Body'].read())[0]

    _duration = time.time() - _start_time
    
    print('${:,.2f}, took {:,d} ms\n'.format(_predicted_value, int(_duration * 1000)))

In [None]:
# iterate through invocations with random inputs against a random model showing results and latency
for i in range(10):
    model_name = LOCATIONS[np.random.randint(1, len(LOCATIONS[:PARALLEL_TRAINING_JOBS]))]
    full_model_name = '{}.tar.gz'.format(model_name)
    predict_one_house_value(gen_random_house()[1:], full_model_name)

### Dynamically deploy another model
Here we demonstrate the power of dynamic loading of new models. We purposely did not copy the first model when deploying models earlier. Now we deploy an additional model and can immediately invoke it through the multi-model endpoint. As with the earlier models, the first invocation to the new model takes longer, as the endpoint takes time to download the model and load it into memory.

In [None]:
# add another model to the endpoint and exercise it
deploy_artifacts_to_mme(training_jobs[0])

### Invoke the newly deployed model
Exercise the newly deployed model without the need for any endpoint update or restart.

In [None]:
print('Here are the models that the endpoint has at its disposal:')
!aws s3 ls $model_url

In [None]:
model_name = LOCATIONS[0]
full_model_name = '{}.tar.gz'.format(model_name)
for i in range(5):
    features = gen_random_house()
    predict_one_house_value(gen_random_house()[1:], full_model_name)

### Updating a model
To update a model, you would follow the same approach as above and add it as a new model. For example, if you have retrained the `NewYork_NY.tar.gz` model and wanted to start invoking it, you would upload the updated model artifacts behind the S3 prefix with a new name such as `NewYork_NY_v2.tar.gz`, and then change the `TargetModel` field to invoke `NewYork_NY_v2.tar.gz` instead of `NewYork_NY.tar.gz`. You do not want to overwrite the model artifacts in Amazon S3, because the old version of the model might still be loaded in the containers or on the storage volume of the instances on the endpoint. Invocations to the new model could then invoke the old version of the model.

Alternatively, you could stop the endpoint and re-deploy a fresh set of models.

## Clean up
Here, to be sure we are not billed for endpoints we are no longer using, we clean up.

In [None]:
# shut down the endpoint
sm_client.delete_endpoint(EndpointName=endpoint_name)

In [None]:
# and the endpoint config
sm_client.delete_endpoint_config(EndpointConfigName=endpoint_config_name)

In [None]:
# delete model too
sm_client.delete_model(ModelName=multi_model_name)