# Amazon SageMaker Multi-Model Endpoints using Linear Learner and Inference Pipeline
With [Amazon SageMaker multi-model endpoints](https://docs.aws.amazon.com/sagemaker/latest/dg/multi-model-endpoints.html), customers can create an endpoint that seamlessly hosts up to thousands of models. These endpoints are well suited to use cases where any one of a large number of models, which can be served from a common inference container, needs to be invokable on-demand and where it is acceptable for infrequently invoked models to incur some additional latency. For applications which require consistently low inference latency, a traditional endpoint is still the best choice.

At a high level, Amazon SageMaker manages the loading and unloading of models for a multi-model endpoint, as they are needed. When an invocation request is made for a particular model, Amazon SageMaker routes the request to an instance assigned to that model, downloads the model artifacts from S3 onto that instance, and initiates loading of the model into the memory of the container. As soon as the loading is complete, Amazon SageMaker performs the requested invocation and returns the result. If the model is already loaded in memory on the selected instance, the downloading and loading steps are skipped and the invocation is performed immediately.

Amazon SageMaker inference pipeline model consists of a sequence of containers that serve inference requests by combining preprocessing, predictions and post-processing data science tasks.  An inference pipeline allows you to apply the same preprocessing code used during model training, to process the inference request data used for predictions.


![](notebook_images/Hosting_RealTime_InfPipeline_MME.png)

![](notebook_images/Hosting_RealTime_InfPipeline.png)

![](notebook_images/Hosting_RealTime_MME-1.png)

![](notebook_images/Hosting_RealTime_MME-2.png)



To demonstrate how multi-model endpoints are created and used with inference pipeline, this notebook provides an example using a set of Linear Learner models that each predict housing prices for a single location. This domain is used as a simple example to easily experiment with multi-model endpoints.  

This notebook showcases two MME capabilities: 
* Native MME support with Amazon SageMaker Linear Learner algorithm.  Because of the native support there is no need for you to create a custom container.  
* Native MME support with Amazon SageMaker Inference Pipelines.


To demonstrate these capabilities, the notebook discusses the use case of predicting house prices in multiple cities using linear regression.  House prices are predicted based on features like number of bedrooms, number of garages, square footage etc.  Depending on the city, the features affect the house price differently.  For example, small changes in the square footage cause a drastic change in house prices in New York when compared to price changes in Houston.  For accurate house price predictions, we will train multiple linear regression models, a unique location specific model per city.  

It is also possible to provide Granular InvokeModel access to multiple models hosted on the MME using IAM condition key. For details of how to do it, please, refer to the full version of this notebook which could be found here:

https://github.com/aws/amazon-sagemaker-examples/blob/master/advanced_functionality/multi_model_linear_learner_home_value/linear_learner_multi_model_endpoint_inf_pipeline.ipynb

https://sagemaker-examples.readthedocs.io/en/latest/advanced_functionality/multi_model_linear_learner_home_value/linear_learner_multi_model_endpoint_inf_pipeline.html


### Contents

1. [Generate synthetic data for housing models](#Generate-synthetic-data-for-housing-models)
1. [Preprocess-synthetic-housing-data-using-scikit-learn](#Preprocess-synthetic-housing-data-using-scikit-learn)
1. [Create model entity with multi model support](#Create-sagemaker-multi-model-support)
1. [Create an inference pipeline with sklearn model and MME linear learner model](#Create-inference-pipeline)
1. [Exercise the inference pipeline - Get predictions from the different  linear learner models](#Exercise-inference-pipeline)
1. [Update Multi Model Endpoint with new models](#update-models)
1. [Clean up](#CleanUp)


## Section 1 - Generate synthetic data for housing models <a id='Generate-synthetic-data-for-housing-models'></a>

In this section, you will generate synthetic data that will be used to train the linear learner models.  The data generated consists of 6 numerical features - the year the house was built in, house size in square feet, number of bedrooms, number of bathroom, the lot size and number of garages and two categorial features - deck and front_porch.  

In [1]:
import numpy as np
import pandas as pd
import json
import datetime
import time
import boto3
import sagemaker
import os

from time import gmtime, strftime
from random import choice

from sagemaker import get_execution_role

from sagemaker.multidatamodel import MULTI_MODEL_CONTAINER_MODE
from sagemaker.multidatamodel import MultiDataModel

from sklearn.model_selection import train_test_split

In [2]:
NUM_HOUSES_PER_LOCATION = 1000
LOCATIONS  = ['NewYork_NY',    'LosAngeles_CA',   'Chicago_IL',    'Houston_TX',   'Dallas_TX',
              'Phoenix_AZ',    'Philadelphia_PA', 'SanAntonio_TX', 'SanDiego_CA',  'SanFrancisco_CA']
MAX_YEAR = 2019

In [3]:
# Helper functions to generate random house price dataset.
def gen_price(house):
    """Generate price based on features of the house"""

    if house['FRONT_PORCH'] == 'y':
        garage = 1
    else:
        garage = 0

    if house['FRONT_PORCH'] == 'y':
        front_porch = 1
    else:
        front_porch = 0

    price = int(150 * house['SQUARE_FEET'] + \
                10000 * house['NUM_BEDROOMS'] + \
                15000 * house['NUM_BATHROOMS'] + \
                15000 * house['LOT_ACRES'] + \
                10000 * garage + \
                10000 * front_porch + \
                15000 * house['GARAGE_SPACES'] - \
                5000 * (MAX_YEAR - house['YEAR_BUILT']))
    return price

def gen_yes_no():
    """Generate values (y/n) for categorical features"""
    answer = choice(['y', 'n'])
    return answer

def gen_random_house():
    """Generate a row of data (single house information)"""
    house = {'SQUARE_FEET':    np.random.normal(3000, 750),
             'NUM_BEDROOMS':  np.random.randint(2, 7),
             'NUM_BATHROOMS': np.random.randint(2, 7) / 2,
             'LOT_ACRES':     round(np.random.normal(1.0, 0.25), 2),
             'GARAGE_SPACES': np.random.randint(0, 4),
             'YEAR_BUILT':    min(MAX_YEAR, int(np.random.normal(1995, 10))),
             'FRONT_PORCH':   gen_yes_no(),
             'DECK':          gen_yes_no()
            }
    
    price = gen_price(house)
    
    return [house['YEAR_BUILT'],   
            house['SQUARE_FEET'], 
            house['NUM_BEDROOMS'], 
            house['NUM_BATHROOMS'], 
            house['LOT_ACRES'],    
            house['GARAGE_SPACES'],
            house['FRONT_PORCH'],    
            house['DECK'], 
            price]

In [4]:
def gen_houses(num_houses):
    """Generate housing dataset"""
    house_list = []

    for _ in range(num_houses):
        house_list.append(gen_random_house())

    df = pd.DataFrame(
        house_list,
        columns=[
            'YEAR_BUILT',
            'SQUARE_FEET',
            'NUM_BEDROOMS',
            'NUM_BATHROOMS',
            'LOT_ACRES',
            'GARAGE_SPACES',
            'FRONT_PORCH',
            'DECK',
            'PRICE']
    )
    return df

In [5]:
#Generate housing data for multiple locations.

for loc in LOCATIONS[:10]:
    houses = gen_houses(NUM_HOUSES_PER_LOCATION)

#Shows the first few lines of data.
houses.head()

Unnamed: 0,YEAR_BUILT,SQUARE_FEET,NUM_BEDROOMS,NUM_BATHROOMS,LOT_ACRES,GARAGE_SPACES,FRONT_PORCH,DECK,PRICE
0,2013,4054.084574,2,2.0,0.82,2,n,n,670412
1,1989,2290.361147,5,2.5,1.01,1,n,n,311204
2,1994,3615.299424,2,2.5,1.17,2,y,n,542344
3,1993,3489.692094,4,2.5,1.36,2,n,y,521353
4,2007,2830.288837,5,1.5,0.93,0,y,n,470993


## Section 2 - Preprocess the raw housing data using Scikit Learn <a id='Preprocess-synthetic-housing-data-using-scikit-learn'></a>

In this section, the categorical features of the data (deck and porch) are pre-processed using sklearn to convert them to one hot encoding representation.  

In [6]:
sm_client = boto3.client(service_name='sagemaker')
runtime_sm_client = boto3.client(service_name='sagemaker-runtime')
sagemaker_session = sagemaker.Session()

s3 = boto3.resource('s3')
s3_client = boto3.client('s3')

BUCKET  = sagemaker_session.default_bucket()
print("BUCKET : ", BUCKET)

role = get_execution_role()
print("ROLE : ", role)

ACCOUNT_ID = boto3.client('sts').get_caller_identity()['Account']
REGION = boto3.Session().region_name

PREFIX = 'Pipeline_MME_demo'
DATA_PREFIX = 'DEMO_MME_LINEAR_LEARNER'
HOUSING_MODEL_NAME = 'housing'
MULTI_MODEL_ARTIFACTS = 'multi_model_artifacts'

BUCKET :  sagemaker-us-west-2-328296961357
ROLE :  arn:aws:iam::328296961357:role/service-role/AmazonSageMaker-ExecutionRole-20191125T182032


In [7]:
container = sagemaker.image_uris.retrieve(region=boto3.Session().region_name, framework='linear-learner')
container

'174872318107.dkr.ecr.us-west-2.amazonaws.com/linear-learner:1'

In [8]:
## upload model data to S3
SKLearn_model_uri = sagemaker_session.upload_data(
    path='model/SKLearn/model.tar.gz',
    bucket=BUCKET,
    key_prefix='{}/model/SKLearn'.format(PREFIX)
)
SKLearn_model_uri

's3://sagemaker-us-west-2-328296961357/Pipeline_MME_demo/model/SKLearn/model.tar.gz'

## Section 3 - Create Sagemaker model with multi model support <a id='Create-sagemaker-multi-model-support'></a>

In [9]:
import re
def parse_model_artifacts(model_data_url):
    # extract the s3 key from the full url to the model artifacts
    s3_key = model_data_url.split('s3://{}/'.format(BUCKET))[1]
    # get the part of the key that identifies the model within the model artifacts folder
    model_name_plus = s3_key[s3_key.find('model_artifacts') + len('model_artifacts') + 1:]
    # finally, get the unique model name (e.g., "NewYork_NY")
    model_name = re.findall('^(.*?)/', model_name_plus)[0]
    return s3_key, model_name 

In [10]:
# make a copy of the model artifacts from the original output of the training job to the place in
# s3 where the multi model endpoint will dynamically load individual models
def deploy_artifacts_to_mme(model_name):
    print("model_name :", model_name)
    key = '{}/{}/{}'.format(DATA_PREFIX, MULTI_MODEL_ARTIFACTS, model_name)    
    copy_source = 'model/LL/{}'.format(model_name)  
    print('Copying {} model\n   from: {}\n     to: {}...'.format(model_name, copy_source, key))
    sagemaker_session.upload_data(path=copy_source, bucket=BUCKET, key_prefix=key)


In [11]:
from os import listdir
from os.path import isfile, join
mypath='model/LL/'
model_list=listdir(mypath)

In [12]:
# First, clear out old versions of the model artifacts from previous runs of this notebook
s3_bucket = s3.Bucket(BUCKET)
full_input_prefix = '{}/multi_model_artifacts'.format(DATA_PREFIX)
print('Removing old model artifacts from {}'.format(full_input_prefix))
s3_bucket.objects.filter(Prefix=full_input_prefix + '/').delete()

Removing old model artifacts from DEMO_MME_LINEAR_LEARNER/multi_model_artifacts


[{'ResponseMetadata': {'RequestId': 'J0FXNGWX3V0XTYET',
   'HostId': 'KoJILHi4NWvAOFBK7ba9aG3dai3zUMKnPVDwijloJ7N/GxkzQx7B0Jej9PRVhBVjKHOsU19jCiU=',
   'HTTPStatusCode': 200,
   'HTTPHeaders': {'x-amz-id-2': 'KoJILHi4NWvAOFBK7ba9aG3dai3zUMKnPVDwijloJ7N/GxkzQx7B0Jej9PRVhBVjKHOsU19jCiU=',
    'x-amz-request-id': 'J0FXNGWX3V0XTYET',
    'date': 'Sun, 30 Jan 2022 23:26:28 GMT',
    'content-type': 'application/xml',
    'transfer-encoding': 'chunked',
    'server': 'AmazonS3',
    'connection': 'close'},
   'RetryAttempts': 0},
  'Deleted': [{'Key': 'DEMO_MME_LINEAR_LEARNER/multi_model_artifacts/NewYork_NY/NewYork_NY.tar.gz'},
   {'Key': 'DEMO_MME_LINEAR_LEARNER/multi_model_artifacts/Houston_TX/Houston_TX.tar.gz'},
   {'Key': 'DEMO_MME_LINEAR_LEARNER/multi_model_artifacts/LosAngeles_CA/LosAngeles_CA.tar.gz'},
   {'Key': 'DEMO_MME_LINEAR_LEARNER/multi_model_artifacts/Chicago_IL/Chicago_IL.tar.gz'}]}]

In [13]:
## Deploy all but the last model trained to MME
## We will use the last model to show how to update an existing MME in Section 7
for model in model_list[:-1]:
    deploy_artifacts_to_mme(model)

model_name : Houston_TX
Copying Houston_TX model
   from: model/LL/Houston_TX
     to: DEMO_MME_LINEAR_LEARNER/multi_model_artifacts/Houston_TX...
model_name : Chicago_IL
Copying Chicago_IL model
   from: model/LL/Chicago_IL
     to: DEMO_MME_LINEAR_LEARNER/multi_model_artifacts/Chicago_IL...
model_name : LosAngeles_CA
Copying LosAngeles_CA model
   from: model/LL/LosAngeles_CA
     to: DEMO_MME_LINEAR_LEARNER/multi_model_artifacts/LosAngeles_CA...


In [14]:
MODEL_NAME = '{}-{}'.format(HOUSING_MODEL_NAME, strftime('%Y-%m-%d-%H-%M-%S', gmtime()))

_model_url  = 's3://{}/{}/{}/'.format(BUCKET, DATA_PREFIX, MULTI_MODEL_ARTIFACTS)

ll_multi_model = MultiDataModel(
        name=MODEL_NAME,
        model_data_prefix=_model_url,
        image_uri=container,
        role=role,
        sagemaker_session=sagemaker_session
    )
_model_url
container

'174872318107.dkr.ecr.us-west-2.amazonaws.com/linear-learner:1'

## Section 4 : Create an inference pipeline with sklearn model and MME linear learner model <a id='Create-inference-pipeline'></a>

Set up the inference pipeline using the Pipeline Model API.  This sets up a list of models in a single endpoint; In this example, we configure our pipeline model with the fitted Scikit-learn inference model and the fitted Linear Learner model.

In [15]:
from sagemaker.sklearn.model import SKLearnModel
SKLearn_model_uri = 's3://sagemaker-us-west-2-328296961357/scikit-learn-preprocessor-2022-01-29-00-05-36/output/model.tar.gz' 
scikit_learn_inference_model = SKLearnModel(
    role=role,
    model_data=SKLearn_model_uri,
    framework_version="0.20.0",    
    py_version="py3",
#    source_dir="code",
    entry_point="sklearn_preprocessor.py",
    sagemaker_session=sagemaker_session,
)

In [20]:
from sagemaker.model import Model
from sagemaker.pipeline import PipelineModel
import boto3
from time import gmtime, strftime

timestamp_prefix = strftime("%Y-%m-%d-%H-%M-%S", gmtime())

model_name = '{}-{}'.format('inference-pipeline', timestamp_prefix)
endpoint_name = '{}-{}'.format('inference-pipeline-ep', timestamp_prefix)

sm_model = PipelineModel(
    name=model_name, 
    role=role, 
    sagemaker_session=sagemaker_session,
    models=[
        scikit_learn_inference_model, 
        ll_multi_model])

In [21]:
sm_model.deploy(initial_instance_count=1, instance_type='ml.m4.xlarge', endpoint_name=endpoint_name)

-------------------!

## Section 5 :  Exercise the inference pipeline - Get predictions from  different  linear learner models. <a id='Exercise-inference-pipeline'></a>

In [22]:
#Create Predictor
from sagemaker.predictor import Predictor

csv_serializer = sagemaker.serializers.CSVSerializer()

predictor = Predictor(
    endpoint_name=endpoint_name,
    sagemaker_session=sagemaker_session,
    serializer=csv_serializer)

In [23]:
def predict_one_house_value(features, model_name, predictor_to_use):
    print('Using model {} to predict price of this house: {}'.format(model_name,
                                                                     features))
    body = ','.join(map(str, features)) + '\n'
    start_time = time.time()
     
    response = predictor_to_use.predict(features, target_model=model_name)
    
    response_json = json.loads(response)
        
    predicted_value = response_json['predictions'][0]['score']    
    
    duration = time.time() - start_time
    
    print('${:,.2f}, took {:,d} ms\n'.format(predicted_value, int(duration * 1000)))

In [24]:
for _ in range(5):
    model_name = model_list[np.random.randint(0, len(model_list) - 1)]
    full_model_name = '{}/{}.tar.gz'.format(model_name,model_name)
    predict_one_house_value(gen_random_house()[:-1], full_model_name, predictor)

Using model Houston_TX/Houston_TX.tar.gz to predict price of this house: [1972, 2109.3014242250642, 5, 1.0, 1.4, 2, 'y', 'n']
$221,570.69, took 2,568 ms

Using model Houston_TX/Houston_TX.tar.gz to predict price of this house: [2009, 2719.5126905496927, 6, 1.5, 0.88, 2, 'n', 'y']
$481,461.19, took 56 ms

Using model Chicago_IL/Chicago_IL.tar.gz to predict price of this house: [1987, 3251.018458718614, 3, 2.0, 1.01, 0, 'n', 'y']
$407,258.41, took 1,618 ms

Using model Houston_TX/Houston_TX.tar.gz to predict price of this house: [2004, 2665.6648461374516, 3, 2.5, 0.97, 2, 'y', 'n']
$452,122.09, took 48 ms

Using model Chicago_IL/Chicago_IL.tar.gz to predict price of this house: [1978, 2685.346721775181, 2, 1.5, 0.7, 3, 'y', 'y']
$316,825.41, took 43 ms



## Section 6 - Add a new model to the endpoint by simply copying the model artifact to the S3 location and invoking it
<a id='update-models'></a>

In [25]:
## Copy the last model
deploy_artifacts_to_mme(model_list[-1])

model_name : NewYork_NY
Copying NewYork_NY model
   from: model/LL/NewYork_NY
     to: DEMO_MME_LINEAR_LEARNER/multi_model_artifacts/NewYork_NY...


In [26]:
model_name = model_list[-1]
full_model_name = '{}/{}.tar.gz'.format(model_name,model_name)
predict_one_house_value(gen_random_house()[:-1], full_model_name, predictor)

Using model NewYork_NY/NewYork_NY.tar.gz to predict price of this house: [1996, 2742.3434898920777, 4, 2.5, 0.88, 0, 'y', 'n']
$404,989.22, took 1,611 ms



## Clean up<a id='CleanUp'></a>
Clean up the endpoint to avoid unneccessary costs.



In [27]:
#Delete the model and the endpoint
predictor.delete_model() 
predictor.delete_endpoint() 