# Amazon SageMaker Multi-Model Endpoints using XGBoost
With [Amazon SageMaker multi-model endpoints](https://docs.aws.amazon.com/sagemaker/latest/dg/multi-model-endpoints.html), customers can create an endpoint that seamlessly hosts up to thousands of models. These endpoints are well suited to use cases where any one of a large number of models, which can be served from a common inference container, needs to be invokable on-demand and where it is acceptable for infrequently invoked models to incur some additional latency. For applications which require consistently low inference latency, a traditional endpoint is still the best choice.

At a high level, Amazon SageMaker manages the loading and unloading of models for a multi-model endpoint, as they are needed. When an invocation request is made for a particular model, Amazon SageMaker routes the request to an instance assigned to that model, downloads the model artifacts from S3 onto that instance, and initiates loading of the model into the memory of the container. As soon as the loading is complete, Amazon SageMaker performs the requested invocation and returns the result. If the model is already loaded in memory on the selected instance, the downloading and loading steps are skipped and the invocation is performed immediately.

To demonstrate how multi-model endpoints are created and used, this notebook provides an example using a set of XGBoost models that each predict housing prices for a single location. This domain is used as a simple example to easily experiment with multi-model endpoints.

The Amazon SageMaker multi-model endpoint capability is designed to work across all machine learning frameworks and algorithms including those where you bring your own container.

### Contents

1. [Build and register an XGBoost container that can serve multiple models](#Build-and-register-an-XGBoost-container-that-can-serve-multiple-models)
1. [Generate synthetic data for housing models](#Generate-synthetic-data-for-housing-models)
1. [Train multiple house value prediction models](#Train-multiple-house-value-prediction-models)
1. [Import models into hosting](#Import-models-into-hosting)
  1. [Deploy model artifacts to be found by the endpoint](#Deploy-model-artifacts-to-be-found-by-the-endpoint)
  1. [Create the Amazon SageMaker model entity](#Create-the-Amazon-SageMaker-model-entity)
  1. [Create the multi-model endpoint](#Create-the-multi-model-endpoint)
1. [Exercise the multi-model endpoint](#Exercise-the-multi-model-endpoint)
  1. [Dynamically deploy another model](#Dynamically-deploy-another-model)
  1. [Invoke the newly deployed model](#Invoke-the-newly-deployed-model)
  1. [Updating a model](#Updating-a-model)
1. [Clean up](#Clean-up)

## Build and register an XGBoost container that can serve multiple models

In [115]:
!pip install -qU awscli boto3 sagemaker

[33mYou are using pip version 10.0.1, however version 20.0.2 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.[0m


For the inference container to serve multiple models in a multi-model endpoint, it must implement [additional APIs](https://docs.aws.amazon.com/sagemaker/latest/dg/build-multi-model-build-container.html) in order to load, list, get, unload and invoke specific models.

The ['mme' branch of the SageMaker XGBoost Container repository](https://github.com/aws/sagemaker-xgboost-container/tree/mme) is an example implementation on how to adapt SageMaker's XGBoost framework container to use [Multi Model Server](https://github.com/awslabs/multi-model-server), a framework that provides an HTTP frontend that implements the additional container APIs required by multi-model endpoints, and also provides a pluggable backend handler for serving models using a custom framework, in this case the XGBoost framework.

Using this branch, below we will build an XGBoost container that fulfills all of the multi-model endpoint container requirements, and then upload that image to Amazon Elastic Container Registry (ECR). Because uploading the image to ECR may create a new ECR repository, this notebook requires permissions in addition to the regular SageMakerFullAccess permissions. The easiest way to add these permissions is simply to add the managed policy AmazonEC2ContainerRegistryFullAccess to the role that you used to start your notebook instance. There's no need to restart your notebook instance when you do this, the new permissions will be available immediately.

In [116]:
ALGORITHM_NAME = 'multi-model-xgboost'

In [117]:
%%sh -s $ALGORITHM_NAME

algorithm_name=$1

account=$(aws sts get-caller-identity --query Account --output text)

# Get the region defined in the current configuration
region=$(aws configure get region)

ecr_image="${account}.dkr.ecr.${region}.amazonaws.com/${algorithm_name}:latest"

# If the repository doesn't exist in ECR, create it.
aws ecr describe-repositories --repository-names "${algorithm_name}" > /dev/null 2>&1

if [ $? -ne 0 ]
then
    aws ecr create-repository --repository-name "${algorithm_name}" > /dev/null
fi

# Get the login command from ECR and execute it directly
$(aws ecr get-login --region ${region} --no-include-email --registry-ids ${account})

# Build the docker image locally with the image name and then push it to ECR
# with the full name.

# First clear out any prior version of the cloned repo
rm -rf sagemaker-xgboost-container/

# Clone the xgboost container repo
git clone --single-branch --branch mme https://github.com/aws/sagemaker-xgboost-container.git
cd sagemaker-xgboost-container/

# Build the "base" container image that encompasses the installation of the
# XGBoost framework and all of the dependencies needed.
docker build -q -t xgboost-container-base:0.90-2-cpu-py3 -f docker/0.90-2/base/Dockerfile.cpu .

# Create the SageMaker XGBoost Container Python package.
python setup.py bdist_wheel --universal

# Build the "final" container image that encompasses the installation of the
# code that implements the SageMaker multi-model container requirements.
docker build -q -t ${algorithm_name} -f docker/0.90-2/final/Dockerfile.cpu .

docker tag ${algorithm_name} ${ecr_image}

docker push ${ecr_image}

Login Succeeded
sha256:fcaf910662066f7d118d6d7f50ff6d7399d189611a5f5a8772123ce74c15cdff
running bdist_wheel
running build
running build_py
creating build
creating build/lib
creating build/lib/sagemaker_xgboost_container
copying src/sagemaker_xgboost_container/distributed.py -> build/lib/sagemaker_xgboost_container
copying src/sagemaker_xgboost_container/encoder.py -> build/lib/sagemaker_xgboost_container
copying src/sagemaker_xgboost_container/serving.py -> build/lib/sagemaker_xgboost_container
copying src/sagemaker_xgboost_container/__init__.py -> build/lib/sagemaker_xgboost_container
copying src/sagemaker_xgboost_container/checkpointing.py -> build/lib/sagemaker_xgboost_container
copying src/sagemaker_xgboost_container/training.py -> build/lib/sagemaker_xgboost_container
copying src/sagemaker_xgboost_container/handler_service.py -> build/lib/sagemaker_xgboost_container
copying src/sagemaker_xgboost_container/data_utils.py -> build/lib/sagemaker_xgboost_container
creating build/lib/sa

https://docs.docker.com/engine/reference/commandline/login/#credentials-store

Cloning into 'sagemaker-xgboost-container'...


## Generate synthetic data for housing models

In [118]:
import numpy as np
import pandas as pd
import json
import datetime
import time
from time import gmtime, strftime
import matplotlib.pyplot as plt

In [119]:
!pip install datapackage

[33mYou are using pip version 10.0.1, however version 20.0.2 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.[0m


In [120]:
from datapackage import Package
import random

package = Package('https://datahub.io/core/world-cities/datapackage.json')

# print list of all resources:
print(package.resource_names)

limit = 10
usa_list = []
city_list = []

# print processed tabular data (if exists any)
for resource in package.resources:
    if resource.descriptor['datahub']['type'] == 'derived/csv':
        for each in resource.read():
            if each[1] == 'United States':
                usa_list.append((each[0]+'_'+each[2]).replace(' ','').replace("'",''))
                
city_list = random.choices(usa_list, k=limit)  
print(city_list)



['validation_report', 'world-cities_csv', 'world-cities_json', 'world-cities_zip', 'world-cities_csv_preview', 'world-cities']
['Boardman_Ohio', 'Melrose_Massachusetts', 'Belmont_California', 'Bedford_Texas', 'Louisville_Kentucky', 'Bellingham_Washington', 'Kailua_Hawaii', 'MiddleRiver_Maryland', 'Hawthorne_NewJersey', 'EastPeoria_Illinois']


In [121]:
NUM_HOUSES_PER_LOCATION = 1000
#LOCATIONS  = ['NewYork_NY',    'LosAngeles_CA',   'Chicago_IL',    'Houston_TX',   'Dallas_TX',
#              'Phoenix_AZ',    'Philadelphia_PA', 'SanAntonio_TX', 'SanDiego_CA',  'SanFrancisco_CA']

LOCATIONS = city_list
PARALLEL_TRAINING_JOBS = 10 # len(LOCATIONS) if your account limits can handle it
MAX_YEAR = 2019



In [122]:
def gen_price(house):
    _base_price = int(house['SQUARE_FEET'] * 150)
    _price = int(_base_price + (10000 * house['NUM_BEDROOMS']) + \
                               (15000 * house['NUM_BATHROOMS']) + \
                               (15000 * house['LOT_ACRES']) + \
                               (15000 * house['GARAGE_SPACES']) - \
                               (5000 * (MAX_YEAR - house['YEAR_BUILT'])))
    return _price

In [123]:
def gen_random_house():
    _house = {'SQUARE_FEET':   int(np.random.normal(3000, 750)),
              'NUM_BEDROOMS':  np.random.randint(2, 7),
              'NUM_BATHROOMS': np.random.randint(2, 7) / 2,
              'LOT_ACRES':     round(np.random.normal(1.0, 0.25), 2),
              'GARAGE_SPACES': np.random.randint(0, 4),
              'YEAR_BUILT':    min(MAX_YEAR, int(np.random.normal(1995, 10)))}
    _price = gen_price(_house)
    return [_price, _house['YEAR_BUILT'],   _house['SQUARE_FEET'], 
                    _house['NUM_BEDROOMS'], _house['NUM_BATHROOMS'], 
                    _house['LOT_ACRES'],    _house['GARAGE_SPACES']]

In [124]:
def gen_houses(num_houses):
    _house_list = []
    for i in range(num_houses):
        _house_list.append(gen_random_house())
    _df = pd.DataFrame(_house_list, 
                       columns=['PRICE',        'YEAR_BUILT',    'SQUARE_FEET',  'NUM_BEDROOMS',
                                'NUM_BATHROOMS','LOT_ACRES',     'GARAGE_SPACES'])
    return _df

## Train multiple house value prediction models

In [125]:
import sagemaker
from sagemaker import get_execution_role
from sagemaker.predictor import csv_serializer
import boto3

sm_client = boto3.client(service_name='sagemaker')
runtime_sm_client = boto3.client(service_name='sagemaker-runtime')

s3 = boto3.resource('s3')
s3_client = boto3.client('s3')

sagemaker_session = sagemaker.Session()
role = get_execution_role()

ACCOUNT_ID = boto3.client('sts').get_caller_identity()['Account']
REGION     = boto3.Session().region_name
BUCKET     = sagemaker_session.default_bucket()

MULTI_MODEL_XGBOOST_IMAGE = '{}.dkr.ecr.{}.amazonaws.com/{}:latest'.format(ACCOUNT_ID, REGION, 
                                                                           ALGORITHM_NAME)

DATA_PREFIX            = 'DEMO_MME_XGBOOST'
HOUSING_MODEL_NAME     = 'housing'
MULTI_MODEL_ARTIFACTS  = 'multi_model_artifacts'

TRAIN_INSTANCE_TYPE    = 'ml.m4.xlarge'
ENDPOINT_INSTANCE_TYPE = 'ml.m4.xlarge'

### Split a given dataset into train, validation, and test

In [126]:
from sklearn.model_selection import train_test_split
SEED = 7
SPLIT_RATIOS = [0.6, 0.3, 0.1]

def split_data(df):
    # split data into train and test sets
    seed      = SEED
    val_size  = SPLIT_RATIOS[1]
    test_size = SPLIT_RATIOS[2]
    
    num_samples = df.shape[0]
    X1 = df.values[:num_samples, 1:] # keep only the features, skip the target, all rows
    Y1 = df.values[:num_samples, :1] # keep only the target, all rows

    # Use split ratios to divide up into train/val/test
    X_train, X_val, y_train, y_val = \
        train_test_split(X1, Y1, test_size=(test_size + val_size), random_state=seed)
    # Of the remaining non-training samples, give proper ratio to validation and to test
    X_test, X_test, y_test, y_test = \
        train_test_split(X_val, y_val, test_size=(test_size / (test_size + val_size)), 
                         random_state=seed)
    # reassemble the datasets with target in first column and features after that
    _train = np.concatenate([y_train, X_train], axis=1)
    _val   = np.concatenate([y_val,   X_val],   axis=1)
    _test  = np.concatenate([y_test,  X_test],  axis=1)

    return _train, _val, _test

### Launch a single training job for a given housing location
There is nothing specific to multi-model endpoints in terms of the models it will host. They are trained in the same way as all other SageMaker models. Here we are using the XGBoost estimator and not waiting for the job to complete.

In [127]:
def launch_training_job(location):
    # clear out old versions of the data
    s3_bucket = s3.Bucket(BUCKET)
    full_input_prefix = '{}/model_prep/{}'.format(DATA_PREFIX, location)
    s3_bucket.objects.filter(Prefix=full_input_prefix + '/').delete()

    # upload the entire set of data for all three channels
    local_folder = 'data/{}'.format(location)
    inputs = sagemaker_session.upload_data(path=local_folder, key_prefix=full_input_prefix)
    print('Training data uploaded: {}'.format(inputs))
    
    _job = 'xgb-{}'.format(location.replace('_', '-'))
    full_output_prefix = '{}/model_artifacts/{}'.format(DATA_PREFIX, location)
    s3_output_path = 's3://{}/{}'.format(BUCKET, full_output_prefix)

    xgb = sagemaker.estimator.Estimator(MULTI_MODEL_XGBOOST_IMAGE, role, 
                                        train_instance_count=1, train_instance_type=TRAIN_INSTANCE_TYPE,
                                        output_path=s3_output_path, base_job_name=_job,
                                        sagemaker_session=sagemaker_session)
    xgb.set_hyperparameters(max_depth=5, eta=0.2, gamma=4, min_child_weight=6, subsample=0.8, silent=0, 
                            early_stopping_rounds=5, objective='reg:linear', num_round=25) 
    
    DISTRIBUTION_MODE = 'FullyReplicated'
    train_input = sagemaker.s3_input(s3_data=inputs+'/train', 
                                     distribution=DISTRIBUTION_MODE, content_type='csv')
    val_input   = sagemaker.s3_input(s3_data=inputs+'/val', 
                                     distribution=DISTRIBUTION_MODE, content_type='csv')
    remote_inputs = {'train': train_input, 'validation': val_input}

    xgb.fit(remote_inputs, wait=False)
    
    return xgb.latest_training_job.name

### Kick off a model training job for each housing location

In [128]:
def save_data_locally(location, train, val, test):
    os.makedirs('data/{}/train'.format(location))
    np.savetxt( 'data/{0}/train/{0}_train.csv'.format(location), train, delimiter=',', fmt='%.2f')
    
    os.makedirs('data/{}/val'.format(location))
    np.savetxt( 'data/{0}/val/{0}_val.csv'.format(location),     val, delimiter=',', fmt='%.2f')
    
    os.makedirs('data/{}/test'.format(location))
    np.savetxt( 'data/{0}/test/{0}_test.csv'.format(location),   test, delimiter=',', fmt='%.2f')

In [129]:
import shutil
import os

# Yield successive n-sized 
# chunks from l. 
def divide_chunks(l, n): 
      
    # looping till length l 
    for i in range(0, len(l), n):  
        yield l[i:i + n] 
  
# How many elements each 
# list should have 
n = 10
  
city_list_break = list(divide_chunks(LOCATIONS, n)) 

training_jobs_all = []


for each in city_list_break:

    def wait_for_training_job_to_complete(job_name):
        print('Waiting for job {} to complete...'.format(job_name))
        resp = sm_client.describe_training_job(TrainingJobName=job_name)
        status = resp['TrainingJobStatus']
        while status=='InProgress':
            time.sleep(60)
            resp = sm_client.describe_training_job(TrainingJobName=job_name)
            status = resp['TrainingJobStatus']
            if status == 'InProgress':
                print('{} job status: {}'.format(job_name, status))
        print('DONE. Status for {} is {}\n'.format(job_name, status))

    training_jobs = []

    shutil.rmtree('data', ignore_errors=True)

    for loc in each[:PARALLEL_TRAINING_JOBS]:
        _houses = gen_houses(NUM_HOUSES_PER_LOCATION)
        _train, _val, _test = split_data(_houses)
        save_data_locally(loc, _train, _val, _test)
        _job = launch_training_job(loc)
        training_jobs.append(_job)
        training_jobs_all.append(_job)
    print('{} training jobs launched: {}'.format(len(training_jobs), training_jobs))


    # wait for the jobs to finish
    for j in training_jobs:
        wait_for_training_job_to_complete(j)

Training data uploaded: s3://sagemaker-us-east-2-111016121260/DEMO_MME_XGBOOST/model_prep/Boardman_Ohio
Training data uploaded: s3://sagemaker-us-east-2-111016121260/DEMO_MME_XGBOOST/model_prep/Melrose_Massachusetts
Training data uploaded: s3://sagemaker-us-east-2-111016121260/DEMO_MME_XGBOOST/model_prep/Belmont_California
Training data uploaded: s3://sagemaker-us-east-2-111016121260/DEMO_MME_XGBOOST/model_prep/Bedford_Texas
Training data uploaded: s3://sagemaker-us-east-2-111016121260/DEMO_MME_XGBOOST/model_prep/Louisville_Kentucky
Training data uploaded: s3://sagemaker-us-east-2-111016121260/DEMO_MME_XGBOOST/model_prep/Bellingham_Washington
Training data uploaded: s3://sagemaker-us-east-2-111016121260/DEMO_MME_XGBOOST/model_prep/Kailua_Hawaii
Training data uploaded: s3://sagemaker-us-east-2-111016121260/DEMO_MME_XGBOOST/model_prep/MiddleRiver_Maryland
Training data uploaded: s3://sagemaker-us-east-2-111016121260/DEMO_MME_XGBOOST/model_prep/Hawthorne_NewJersey
Training data uploaded: 

### Wait for all model training to finish

In [130]:
def wait_for_training_job_to_complete(job_name):
    print('Waiting for job {} to complete...'.format(job_name))
    resp = sm_client.describe_training_job(TrainingJobName=job_name)
    status = resp['TrainingJobStatus']
    while status=='InProgress':
        time.sleep(60)
        resp = sm_client.describe_training_job(TrainingJobName=job_name)
        status = resp['TrainingJobStatus']
        if status == 'InProgress':
            print('{} job status: {}'.format(job_name, status))
    print('DONE. Status for {} is {}\n'.format(job_name, status))

In [131]:
# wait for the jobs to finish
for j in training_jobs:
    wait_for_training_job_to_complete(j)

Waiting for job xgb-Boardman-Ohio-2020-04-16-01-18-06-150 to complete...
DONE. Status for xgb-Boardman-Ohio-2020-04-16-01-18-06-150 is Completed

Waiting for job xgb-Melrose-Massachusetts-2020-04-16-01-18-06-529 to complete...
DONE. Status for xgb-Melrose-Massachusetts-2020-04-16-01-18-06-529 is Completed

Waiting for job xgb-Belmont-California-2020-04-16-01-18-08-182 to complete...
DONE. Status for xgb-Belmont-California-2020-04-16-01-18-08-182 is Completed

Waiting for job xgb-Bedford-Texas-2020-04-16-01-18-09-087 to complete...
DONE. Status for xgb-Bedford-Texas-2020-04-16-01-18-09-087 is Completed

Waiting for job xgb-Louisville-Kentucky-2020-04-16-01-18-09-984 to complete...
DONE. Status for xgb-Louisville-Kentucky-2020-04-16-01-18-09-984 is Completed

Waiting for job xgb-Bellingham-Washington-2020-04-16-01-18-13-518 to complete...
DONE. Status for xgb-Bellingham-Washington-2020-04-16-01-18-13-518 is Completed

Waiting for job xgb-Kailua-Hawaii-2020-04-16-01-18-16-349 to complete.

## Import models into hosting
A big difference for multi-model endpoints is that when creating the Model entity, the container's `ModelDataUrl` is the S3 prefix where the model artifacts that are invokable by the endpoint are located. The rest of the S3 path will be specified when actually invoking the model. Remember to close the location with a trailing slash.

The `Mode` of container is specified as `MultiModel` to signify that the container will host multiple models.

### Deploy model artifacts to be found by the endpoint
As described above, the multi-model endpoint is configured to find its model artifacts in a specific location in S3. For each trained model, we make a copy of its model artifacts into that location.

In our example, we are storing all the models within a single folder. The implementation of multi-model endpoints is flexible enough to permit an arbitrary folder structure. For a set of housing models for example, you could have a top level folder for each region, and the model artifacts would be copied to those regional folders. The target model referenced when invoking such a model would include the folder path. For example, `northeast/Boston_MA.tar.gz`.

In [132]:
import re
def parse_model_artifacts(model_data_url):
    # extract the s3 key from the full url to the model artifacts
    _s3_key = model_data_url.split('s3://{}/'.format(BUCKET))[1]
    # get the part of the key that identifies the model within the model artifacts folder
    _model_name_plus = _s3_key[_s3_key.find('model_artifacts') + len('model_artifacts') + 1:]
    # finally, get the unique model name (e.g., "NewYork_NY")
    _model_name = re.findall('^(.*?)/', _model_name_plus)[0]
    return _s3_key, _model_name 

In [133]:
# make a copy of the model artifacts from the original output of the training job to the place in
# s3 where the multi model endpoint will dynamically load individual models
def deploy_artifacts_to_mme(job_name):
    _resp = sm_client.describe_training_job(TrainingJobName=job_name)
    _source_s3_key, _model_name = parse_model_artifacts(_resp['ModelArtifacts']['S3ModelArtifacts'])
    _copy_source = {'Bucket': BUCKET, 'Key': _source_s3_key}
    _key = '{}/{}/{}.tar.gz'.format(DATA_PREFIX, MULTI_MODEL_ARTIFACTS, _model_name)
    
    print('Copying {} model\n   from: {}\n     to: {}...'.format(_model_name, _source_s3_key, _key))
    s3_client.copy_object(Bucket=BUCKET, CopySource=_copy_source, Key=_key)
    return _key

Note that we are purposely *not* copying the first model. This will be copied later in the notebook to demonstrate how to dynamically add new models to an already running endpoint.

In [134]:
# First, clear out old versions of the model artifacts from previous runs of this notebook
s3 = boto3.resource('s3')
s3_bucket = s3.Bucket(BUCKET)
full_input_prefix = '{}/multi_model_artifacts'.format(DATA_PREFIX)
print('Removing old model artifacts from {}'.format(full_input_prefix))
s3_bucket.objects.filter(Prefix=full_input_prefix + '/').delete()

Removing old model artifacts from DEMO_MME_XGBOOST/multi_model_artifacts


[{'ResponseMetadata': {'RequestId': '982D83D244B70CFE',
   'HostId': '8k9lc+z1aSGNk9Z3CdnkLJa/9BVkDOH0DVnJ6H1gONq66+c+9mhsJaI9yWatRGJ6+UoMNBYIxHw=',
   'HTTPStatusCode': 200,
   'HTTPHeaders': {'x-amz-id-2': '8k9lc+z1aSGNk9Z3CdnkLJa/9BVkDOH0DVnJ6H1gONq66+c+9mhsJaI9yWatRGJ6+UoMNBYIxHw=',
    'x-amz-request-id': '982D83D244B70CFE',
    'date': 'Thu, 16 Apr 2020 01:23:34 GMT',
    'connection': 'close',
    'content-type': 'application/xml',
    'transfer-encoding': 'chunked',
    'server': 'AmazonS3'},
   'RetryAttempts': 0},
  'Deleted': [{'Key': 'DEMO_MME_XGBOOST/multi_model_artifacts/Edgewater_Florida.tar.gz'},
   {'Key': 'DEMO_MME_XGBOOST/multi_model_artifacts/Akron_Ohio.tar.gz'},
   {'Key': 'DEMO_MME_XGBOOST/multi_model_artifacts/Riverside_Ohio.tar.gz'},
   {'Key': 'DEMO_MME_XGBOOST/multi_model_artifacts/BayshoreGardens_Florida.tar.gz'}]}]

In [135]:
# copy every model except the first one
for job in training_jobs_all[1:]:
    deploy_artifacts_to_mme(job)

Copying Melrose_Massachusetts model
   from: DEMO_MME_XGBOOST/model_artifacts/Melrose_Massachusetts/xgb-Melrose-Massachusetts-2020-04-16-01-18-06-529/output/model.tar.gz
     to: DEMO_MME_XGBOOST/multi_model_artifacts/Melrose_Massachusetts.tar.gz...
Copying Belmont_California model
   from: DEMO_MME_XGBOOST/model_artifacts/Belmont_California/xgb-Belmont-California-2020-04-16-01-18-08-182/output/model.tar.gz
     to: DEMO_MME_XGBOOST/multi_model_artifacts/Belmont_California.tar.gz...
Copying Bedford_Texas model
   from: DEMO_MME_XGBOOST/model_artifacts/Bedford_Texas/xgb-Bedford-Texas-2020-04-16-01-18-09-087/output/model.tar.gz
     to: DEMO_MME_XGBOOST/multi_model_artifacts/Bedford_Texas.tar.gz...
Copying Louisville_Kentucky model
   from: DEMO_MME_XGBOOST/model_artifacts/Louisville_Kentucky/xgb-Louisville-Kentucky-2020-04-16-01-18-09-984/output/model.tar.gz
     to: DEMO_MME_XGBOOST/multi_model_artifacts/Louisville_Kentucky.tar.gz...
Copying Bellingham_Washington model
   from: DEMO_MM

### Create the Amazon SageMaker model entity
Here we use `boto3` to create the model entity. Instead of describing a single model, it will indicate the use of multi-model semantics and will identify the source location of all specific model artifacts.

In [136]:
def create_multi_model_entity(multi_model_name, role):
    # establish the place in S3 from which the endpoint will pull individual models
    _model_url  = 's3://{}/{}/{}/'.format(BUCKET, DATA_PREFIX, MULTI_MODEL_ARTIFACTS)
    _container = {
        'Image':        MULTI_MODEL_XGBOOST_IMAGE,
        'ModelDataUrl': _model_url,
        'Mode':         'MultiModel'
    }
    create_model_response = sm_client.create_model(
        ModelName = multi_model_name,
        ExecutionRoleArn = role,
        Containers = [_container])
    
    return _model_url

In [137]:
multi_model_name = '{}-{}'.format(HOUSING_MODEL_NAME, strftime('%Y-%m-%d-%H-%M-%S', gmtime()))
model_url = create_multi_model_entity(multi_model_name, role)
print('Multi model name: {}'.format(multi_model_name))

Multi model name: housing-2020-04-16-01-23-41


### Create the multi-model endpoint
There is nothing special about the SageMaker endpoint config for a multi-model endpoint. You need to consider the appropriate instance type and number of instances for the projected prediction workload. The number and size of the individual models will drive memory requirements.

Once the endpoint config is in place, the endpoint creation is straightforward.

In [138]:
endpoint_config_name = multi_model_name
print('Endpoint config name: ' + endpoint_config_name)

create_endpoint_config_response = sm_client.create_endpoint_config(
    EndpointConfigName = endpoint_config_name,
    ProductionVariants=[{
        'InstanceType': ENDPOINT_INSTANCE_TYPE,
        'InitialInstanceCount': 1,
        'InitialVariantWeight': 1,
        'ModelName': multi_model_name,
        'VariantName': 'AllTraffic'}])

endpoint_name = multi_model_name
print('Endpoint name: ' + endpoint_name)

Endpoint config name: housing-2020-04-16-01-23-41
Endpoint name: housing-2020-04-16-01-23-41


In [139]:
create_endpoint_response = sm_client.create_endpoint(
    EndpointName=endpoint_name,
    EndpointConfigName=endpoint_config_name)
print('Endpoint Arn: ' + create_endpoint_response['EndpointArn'])

Endpoint Arn: arn:aws:sagemaker:us-east-2:111016121260:endpoint/housing-2020-04-16-01-23-41


In [140]:
print('Waiting for {} endpoint to be in service...'.format(endpoint_name))
waiter = sm_client.get_waiter('endpoint_in_service')
waiter.wait(EndpointName=endpoint_name)

Waiting for housing-2020-04-16-01-23-41 endpoint to be in service...


## Exercise the multi-model endpoint

### Invoke multiple individual models hosted behind a single endpoint
Here we iterate through a set of housing predictions, choosing the specific location-based housing model at random. Notice the cold start price paid for the first invocation of any given model. Subsequent invocations of the same model take advantage of the model already being loaded into memory.

In [141]:
def predict_one_house_value(features, model_name):
    print('Using model {} to predict price of this house: {}'.format(full_model_name,
                                                                     features))
    body = ','.join(map(str, features)) + '\n'
    start_time = time.time()

    response = runtime_sm_client.invoke_endpoint(
                        EndpointName=endpoint_name,
                        ContentType='text/csv',
                        TargetModel=full_model_name,
                        Body=body)
    predicted_value = json.loads(response['Body'].read())[0]

    duration = time.time() - start_time
    
    print('${:,.2f}, took {:,d} ms\n'.format(predicted_value, int(duration * 1000)))

In [142]:
print('Here are the models that the endpoint has at its disposal:')
!aws s3 ls --human-readable --summarize $model_url

Here are the models that the endpoint has at its disposal:
2020-04-16 01:23:35   11.5 KiB Bedford_Texas.tar.gz
2020-04-16 01:23:35   11.1 KiB Bellingham_Washington.tar.gz
2020-04-16 01:23:35   11.2 KiB Belmont_California.tar.gz
2020-04-16 01:23:36   11.3 KiB EastPeoria_Illinois.tar.gz
2020-04-16 01:23:36   11.5 KiB Hawthorne_NewJersey.tar.gz
2020-04-16 01:23:35   10.8 KiB Kailua_Hawaii.tar.gz
2020-04-16 01:23:35   10.7 KiB Louisville_Kentucky.tar.gz
2020-04-16 01:23:35   11.5 KiB Melrose_Massachusetts.tar.gz
2020-04-16 01:23:35   10.8 KiB MiddleRiver_Maryland.tar.gz

Total Objects: 9
   Total Size: 100.5 KiB


In [143]:
# iterate through invocations with random inputs against a random model showing results and latency
for i in range(10):
    model_name = LOCATIONS[np.random.randint(1, len(LOCATIONS[:PARALLEL_TRAINING_JOBS]))]
    full_model_name = '{}.tar.gz'.format(model_name)
    predict_one_house_value(gen_random_house()[1:], full_model_name)

Using model MiddleRiver_Maryland.tar.gz to predict price of this house: [1995, 2923, 6, 3.0, 0.94, 2]
$449,660.94, took 1,182 ms

Using model Kailua_Hawaii.tar.gz to predict price of this house: [1984, 2470, 3, 2.5, 1.05, 2]
$317,114.06, took 670 ms

Using model Kailua_Hawaii.tar.gz to predict price of this house: [1981, 1799, 4, 1.0, 1.14, 2]
$217,920.53, took 17 ms

Using model MiddleRiver_Maryland.tar.gz to predict price of this house: [2007, 2929, 5, 2.0, 1.21, 1]
$484,644.28, took 16 ms

Using model Melrose_Massachusetts.tar.gz to predict price of this house: [1970, 4033, 4, 2.0, 1.57, 2]
$533,263.12, took 690 ms

Using model Hawthorne_NewJersey.tar.gz to predict price of this house: [1983, 2535, 5, 3.0, 0.71, 2]
$326,593.00, took 647 ms

Using model Kailua_Hawaii.tar.gz to predict price of this house: [1994, 3351, 2, 2.0, 0.9, 2]
$493,389.38, took 15 ms

Using model MiddleRiver_Maryland.tar.gz to predict price of this house: [2005, 3897, 5, 1.0, 1.31, 3]
$652,086.56, took 15 ms



### Dynamically deploy another model
Here we demonstrate the power of dynamic loading of new models. We purposely did not copy the first model when deploying models earlier. Now we deploy an additional model and can immediately invoke it through the multi-model endpoint. As with the earlier models, the first invocation to the new model takes longer, as the endpoint takes time to download the model and load it into memory.

In [144]:
# add another model to the endpoint and exercise it
deploy_artifacts_to_mme(training_jobs_all[0])

Copying Boardman_Ohio model
   from: DEMO_MME_XGBOOST/model_artifacts/Boardman_Ohio/xgb-Boardman-Ohio-2020-04-16-01-18-06-150/output/model.tar.gz
     to: DEMO_MME_XGBOOST/multi_model_artifacts/Boardman_Ohio.tar.gz...


'DEMO_MME_XGBOOST/multi_model_artifacts/Boardman_Ohio.tar.gz'

### Invoke the newly deployed model
Exercise the newly deployed model without the need for any endpoint update or restart.

In [145]:
print('Here are the models that the endpoint has at its disposal:')
!aws s3 ls $model_url

Here are the models that the endpoint has at its disposal:
2020-04-16 01:23:35      11785 Bedford_Texas.tar.gz
2020-04-16 01:23:35      11340 Bellingham_Washington.tar.gz
2020-04-16 01:23:35      11494 Belmont_California.tar.gz
2020-04-16 01:30:49      11453 Boardman_Ohio.tar.gz
2020-04-16 01:23:36      11615 EastPeoria_Illinois.tar.gz
2020-04-16 01:23:36      11768 Hawthorne_NewJersey.tar.gz
2020-04-16 01:23:35      11028 Kailua_Hawaii.tar.gz
2020-04-16 01:23:35      10990 Louisville_Kentucky.tar.gz
2020-04-16 01:23:35      11812 Melrose_Massachusetts.tar.gz
2020-04-16 01:23:35      11043 MiddleRiver_Maryland.tar.gz


In [146]:
model_name = LOCATIONS[0]
full_model_name = '{}.tar.gz'.format(model_name)
for i in range(5):
    features = gen_random_house()
    predict_one_house_value(gen_random_house()[1:], full_model_name)

Using model Boardman_Ohio.tar.gz to predict price of this house: [1991, 4135, 5, 3.0, 1.15, 0]
$584,386.31, took 679 ms

Using model Boardman_Ohio.tar.gz to predict price of this house: [1993, 3172, 5, 2.0, 0.96, 3]
$467,577.41, took 17 ms

Using model Boardman_Ohio.tar.gz to predict price of this house: [2003, 2420, 2, 1.0, 0.89, 2]
$371,897.81, took 12 ms

Using model Boardman_Ohio.tar.gz to predict price of this house: [1999, 3874, 4, 2.0, 0.6, 0]
$566,508.06, took 12 ms

Using model Boardman_Ohio.tar.gz to predict price of this house: [1981, 3454, 6, 1.5, 1.45, 3]
$487,151.31, took 13 ms



### Updating a model
To update a model, you would follow the same approach as above and add it as a new model. For example, if you have retrained the `NewYork_NY.tar.gz` model and wanted to start invoking it, you would upload the updated model artifacts behind the S3 prefix with a new name such as `NewYork_NY_v2.tar.gz`, and then change the `TargetModel` field to invoke `NewYork_NY_v2.tar.gz` instead of `NewYork_NY.tar.gz`. You do not want to overwrite the model artifacts in Amazon S3, because the old version of the model might still be loaded in the containers or on the storage volume of the instances on the endpoint. Invocations to the new model could then invoke the old version of the model.

Alternatively, you could stop the endpoint and re-deploy a fresh set of models.

## Clean up
Here, to be sure we are not billed for endpoints we are no longer using, we clean up.

In [147]:
# shut down the endpoint
sm_client.delete_endpoint(EndpointName=endpoint_name)

{'ResponseMetadata': {'RequestId': 'ca16203f-36f7-4cf0-81fc-0b25fe2c6fb6',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'x-amzn-requestid': 'ca16203f-36f7-4cf0-81fc-0b25fe2c6fb6',
   'content-type': 'application/x-amz-json-1.1',
   'content-length': '0',
   'date': 'Thu, 16 Apr 2020 01:31:10 GMT'},
  'RetryAttempts': 0}}

In [148]:
# and the endpoint config
sm_client.delete_endpoint_config(EndpointConfigName=endpoint_config_name)

{'ResponseMetadata': {'RequestId': 'ea9dc9c7-6412-4346-a8e9-79032387dd4c',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'x-amzn-requestid': 'ea9dc9c7-6412-4346-a8e9-79032387dd4c',
   'content-type': 'application/x-amz-json-1.1',
   'content-length': '0',
   'date': 'Thu, 16 Apr 2020 01:31:10 GMT'},
  'RetryAttempts': 0}}

In [149]:
# delete model too
sm_client.delete_model(ModelName=multi_model_name)

{'ResponseMetadata': {'RequestId': 'c8c48dbd-2040-4376-8d78-cf420db90829',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'x-amzn-requestid': 'c8c48dbd-2040-4376-8d78-cf420db90829',
   'content-type': 'application/x-amz-json-1.1',
   'content-length': '0',
   'date': 'Thu, 16 Apr 2020 01:31:11 GMT'},
  'RetryAttempts': 0}}