# Deploying multiple models (AB Testing)

In [2]:
from datetime import datetime, timedelta
import time
import os
import boto3
import re
import json
from sagemaker import get_execution_role, session
from sagemaker.s3 import S3Downloader, S3Uploader

region = boto3.Session().region_name
role = get_execution_role()
sm_session = session.Session(boto3.Session())
sm = boto3.Session().client("sagemaker")
sm_runtime = boto3.Session().client("sagemaker-runtime")

# You can use a different bucket, but make sure the role you chose for this notebook
# has the s3:PutObject permissions. This is the bucket into which the model artifacts will be uploaded
bucket = sm_session.default_bucket()
prefix = "boston-dataset"

What we will do in this lab is to deploy one endpoint with 2 model instances, such that we choose the traffic allocation.
Not only this shows how to deploy a model that was trained somewhere else, but also we show how to AB test models

In [3]:
from sagemaker.image_uris import retrieve

model_name = None  # Choose unique names
model_name2 = None    # Choose unique names
model_url = None   # Here it will be the output of your training runs! You could store it in the other notebook if needed with %store
model_url2 = None
image_uri = retrieve("linear-learner", boto3.Session().region_name, "latest")
image_uri2 = retrieve("linear-learner", boto3.Session().region_name, "latest")

Defaulting to the only supported framework/algorithm version: 1. Ignoring framework/algorithm version: latest.
Defaulting to the only supported framework/algorithm version: 1. Ignoring framework/algorithm version: latest.


'hpo-boston-2021-09-07-12-34-55'

The first step to deploy a model is to register it with the create_model method. Registering a model is the equivalent of determining it will pass to staging

In [None]:
# Exercise, create both models to deploy


Secondly, we need to create a Variant. A variant refers mainly to the model deployed in an image in an instance. It relates 1:1 to the autoscaling group for that model

In [6]:
from sagemaker.session import production_variant

# create the production variants. You can check the docs here: https://sagemaker.readthedocs.io/en/stable/api/utility/session.html
variant1 = None  # FILLME
variant2 = None

(variant1, variant2)

({'ModelName': 'linear-boston-2021-09-07-12-34-55',
  'InstanceType': 'ml.m5.large',
  'InitialInstanceCount': 1,
  'VariantName': 'Variant1',
  'InitialVariantWeight': 1},
 {'ModelName': 'hpo-boston-2021-09-07-12-34-55',
  'InstanceType': 'ml.m5.large',
  'InitialInstanceCount': 1,
  'VariantName': 'Variant2',
  'InitialVariantWeight': 1})

In [None]:
Once we have a variant and a model we are ready to create the endpoint based on those variants.
The endpoint name must be unique

In [7]:
endpoint_name = f"linear-boston-{datetime.now():%Y-%m-%d-%H-%M-%S}"
print(f"EndpointName={endpoint_name}")

sm_session.endpoint_from_production_variants(
    name=endpoint_name, production_variants=[variant1, variant2]
)

EndpointName=linear-boston-2021-09-07-12-38-12
-------------!

'linear-boston-2021-09-07-12-38-12'

In [None]:
And to call it without the predictor interface, we can use the invoke_endpoint method. It expects a bytes, file, or RecordIO format
Invoking the endpoint uses the low level API which is needed once you loose reference to the predictor object.

In [43]:
%store -r X_test
%store -r Y_test

import pandas as pd
import numpy as np
import json

test_set = pd.concat([X_test, Y_test], axis=1)
payload = None # FILLME

# Exercise, find out how to configure the payload to get a result from invoke_endpoint!
response = sm_runtime.invoke_endpoint(EndpointName=endpoint_name, ContentType="text/csv", Body=payload)

# Exercise: Configure the response into a predictions pandas series.
predictions = None #FILLME

Unnamed: 0,score
0,-0.297590
1,1.813267
2,0.255625
3,-0.010827
4,-0.591371
...,...
138,0.370642
139,0.670967
140,-1.053767
141,-0.031928


In [49]:
comparison = pd.concat([predictions, Y_test.reset_index(drop=True)], axis=1)

In [52]:
np.sum((comparison['score'] - comparison['PRICE'])**2)


53.99994625993155

In [None]:
We can specify the specific model to call behind the endpoint setting the variant name, this is extremely useful!

In [57]:
# Exercise: Use the invoke_method as before, setting TargetVariant to the variant name. Get the predictions for both models behind the endpoint
response_model1 = None # FILLME
response_model2 = None # FILLME
predictions_model1 =  None # FILLME
predictions_model2 =  None # FILLME

comparison = pd.concat([predictions_model1, predictions_model2, Y_test.reset_index(drop=True)], axis=1)


In [58]:
comparison

Unnamed: 0,score model 1,score model 2,PRICE
0,-0.057428,-0.297590,-0.731324
1,1.473565,1.813267,2.985812
2,0.194408,0.255625,0.116048
3,-0.038954,-0.010827,-0.166409
4,-1.058822,-0.591371,-1.058974
...,...,...,...
138,0.500220,0.370642,0.161241
139,0.724937,0.670967,0.952121
140,-1.102085,-1.053767,-1.047676
141,0.039267,-0.031928,0.036960


In [None]:
You can check the model that was the output of HPO should be better (less error) than the one with default parameters!

In [59]:
# Calculate the RMSE for both models and compare them!
error_1 = None # Fillme
error_2 = None # Fillme

print(f'RMSE for model 1 is {error_1} and for model 2 is {error_2}')

RMSE for model 1 is 60.6976674528303 and for model 2 is 53.99994625993155
