## Chapter 10 : SageMaker Endpoint Production Variants and Deployment Strategies

This notebook demonstrates how to update a deployed model using SageMaker Endpoint Production variants.  Specifically it demonstrates the A/B deployment strategy.  You can use this notebook as a starting point to implement other strategies discussed in Chapter 10, since the APIs used to either deploy a new endpoint or update an existing endpoint remain the same. 

### Overview

1. Set up
2. Prepare (Reuse or Train) models to deploy and update
3. Create an endpoint (with single production variant)
4. Invoke the endpoint
5. Update endpoint (with two production variants)
6. CloudWatch Analysis
7. Update endpoint
8. Clean up

### 1. Set up

#### 1.1 Imports

In [None]:
##Imports
import sagemaker
import boto3
import time
from datetime import datetime, timedelta
from sagemaker import image_uris
from sagemaker.session import Session
from sagemaker.inputs import TrainingInput
from sagemaker.session import production_variant
from botocore.response import StreamingBody

#### 1.2 Setup variables

In [None]:
s3_bucket = 'datascience-environment-notebookinstance--06dc7a0224df'
s3_prefix = 'prepared'
m_prefix = 'xgboost-sample'

sagemaker_session = sagemaker.Session()
region = sagemaker_session.boto_region_name

#### 1.3 Setup service clients

In [None]:
sm = boto3.Session().client("sagemaker")
smrt = boto3.Session().client("sagemaker-runtime")
s3 = boto3.client("s3")

In [None]:
### Define variable to toggle between using trained models from previous chapters and training the models in this notebook

### Set use_trained_models to True, if you have XGBoost models trained in previous chapters, use those models to save training time and costs.
### To train models in this notebook set use_trained_model to False.
#use_trained_models = 'False'
use_trained_models = 'True'

if use_trained_models == 'True':
    print("Using models trained before")
else:
    print("Train the model")

### Section 2 - Prepare (Reuse or Train) models to deploy and update

In [None]:
### Use the XGBoost models previously trained
### Note: Update to use the models available in your datascience account
if use_trained_models == 'True':
    model_name_1='sagemaker-xgboost-2021-06-24-02-34-20-510'
    model_name_2='sagemaker-xgboost-2021-06-24-02-47-08-912'

In [None]:
if use_trained_models == 'False':

    # set an output path where the trained model will be saved
    output_path = 's3://{}/{}/{}/output'.format(s3_bucket, m_prefix, 'xgboost')
    
    # this line automatically looks for the XGBoost image URI and builds an XGBoost container.
    # specify the repo_version depending on your preference.
    xgboost_container = sagemaker.image_uris.retrieve("xgboost", region, "1.2-1")
    
    # define the data type and paths to the training and validation datasets
    content_type = "csv"
    train_input = TrainingInput("s3://{}/{}/{}/".format(s3_bucket, s3_prefix, 'train'), content_type=content_type)
    validation_input = TrainingInput("s3://{}/{}/{}/".format(s3_bucket, s3_prefix, 'validation'), content_type=content_type)

    #### Train and get the name of the first model 
    # initialize hyperparameters
    hyperparameters_1 = {
        "max_depth":"5",
        "eta":"0.2",
        "gamma":"4",
        "min_child_weight":"6",
        "subsample":"0.7",
        "objective":"reg:squarederror",
        "num_round":"5"}

    # construct a SageMaker estimator that calls the xgboost-container
    estimator_1 = sagemaker.estimator.Estimator(image_uri=xgboost_container, 
                                          hyperparameters=hyperparameters_1,
                                          role=sagemaker.get_execution_role(),
                                          instance_count=1, 
                                          instance_type='ml.m5.12xlarge', 
                                          volume_size=200, # 5 GB 
                                          output_path=output_path)


    # execute the XGBoost training job
    estimator_1.fit({'train': train_input, 'validation': validation_input})
    
    training_job_name_1 = estimator_1.latest_training_job.name
    
    model_name_1 = sagemaker_session.create_model_from_job(training_job_name_1)
    
    
    #### Train and get the name of the second model 
    # initialize hyperparameters
    hyperparameters_2 = {
        "max_depth":"10",  ##Different value of the hyperparameter
        "eta":"0.2",
        "gamma":"4",
        "min_child_weight":"6",
        "subsample":"0.7",
        "objective":"reg:squarederror",
        "num_round":"5"}

    # construct a SageMaker estimator that calls the xgboost-container
    estimator_2 = sagemaker.estimator.Estimator(image_uri=xgboost_container, 
                                          hyperparameters=hyperparameters_2,
                                          role=sagemaker.get_execution_role(),
                                          instance_count=1, 
                                          instance_type='ml.m5.12xlarge', 
                                          volume_size=200, # 5 GB 
                                          output_path=output_path)


    # execute the XGBoost training job
    estimator_2.fit({'train': train_input, 'validation': validation_input})
    
    training_job_name_2 = estimator_2.latest_training_job.name
    
    model_name_2 = sagemaker_session.create_model_from_job(training_job_name_2)


In [None]:
print("Model 1 : " , model_name_1)
print("Model 2 : " , model_name_2)

### 3 Create an endpoint (with single production variant)

In [None]:
#Create production variant A
variantA = production_variant(model_name=model_name_1,
                              instance_type="ml.m5.xlarge",
                              initial_instance_count=1,
                              variant_name='VariantA',
                              initial_weight=1)

In [None]:
#Variable for endpoint name
endpoint_name=f"abtest-{datetime.now():%Y-%m-%d-%H-%M-%S}"

In [None]:
##First create an endpoint with single variant
##Note this step automatically creates an endpointconfig with same name as the endpoint, that you can update later

#Create an endpoint with a single production variant
sagemaker_session.endpoint_from_production_variants(
    name=endpoint_name,
    production_variants=[variantA]
)

### 4. Invoke the endpoint

In [None]:
##Get the file name at index from the 'prefix' folder
def get_file_in_bucket(prefix,index):
    response = s3.list_objects(
        Bucket=s3_bucket,
        Prefix=s3_prefix + "/" + prefix
    )
    ## At '0' index you will find the SUCCESS/FAILURE of file uploades to S3. First data file is at index 1
    file_name = response['Contents'][index]['Key']
    print("Returing file name : " + file_name)
    return file_name

In [None]:
##Download the test files to execute inferences
s3.download_file(s3_bucket, get_file_in_bucket('test',1), 't_file.csv')

with open('t_file.csv', 'r') as TF:
    t_lines = TF.readlines()

In [None]:
### Define a method to run inferences against the endpoint
def get_predictions():
    #Skip the first line since it has column headers
    for tl in t_lines[1:50]:
        #Remove the first column since it is the label
        test_list = tl.split(",")
        test_list.pop(0)
        test_string = ','.join([str(elem) for elem in test_list])
    
        result = smrt.invoke_endpoint(EndpointName=endpoint_name,
                                   ContentType="text/csv",
                                   Body=test_string)
        #print(result)                              
        rbody = StreamingBody(raw_stream=result['Body'],content_length=int(result['ResponseMetadata']['HTTPHeaders']['content-length']))
        print(f"Result from {result['InvokedProductionVariant']} = {rbody.read().decode('utf-8')}")

In [None]:
#Get predictions
get_predictions()

### 5. Update endpoint with two production variants

In [None]:
#Create production variant B
variantB = production_variant(model_name=model_name_2,
                              instance_type="ml.m5.xlarge",
                              initial_instance_count=1,
                              variant_name='VariantB',
                              initial_weight=1)

In [None]:
##Next update the endpoint to include both production variants
endpoint_config_new =f"abtest-new-config-{datetime.now():%Y-%m-%d-%H-%M-%S}"

sagemaker_session.create_endpoint_config_from_existing (
    existing_config_name=endpoint_name,
    new_config_name=endpoint_config_new,
    new_production_variants=[variantA,variantB]  ## Two production variants
)

In [None]:
##Update the endpoint
sagemaker_session.update_endpoint(endpoint_name=endpoint_name, endpoint_config_name=endpoint_config_new, wait=False)

In [None]:
#Show that you can still get inferences while the endpoint is being updated
#Get predictions
get_predictions()

### 6. CloudWatch Analysis

Observe the CloudWatch metrics generated for the two variants to understand the endpoint behavior.  Here we are plotting the number of invocations of each variant.
You can use the same pattern to plot other metrics.

In [None]:
##Define utility methods to retrieve and plot cloudwatch metrics
import pandas as pd

cw = boto3.Session().client("cloudwatch")

def get_invocation_metrics_for_endpoint_variant(endpoint_name, variant_name, start_time, end_time):
    metrics = cw.get_metric_statistics(
        Namespace="AWS/SageMaker",
        MetricName="Invocations",
        StartTime=start_time,
        EndTime=end_time,
        Period=60,
        Statistics=["Sum"],
        Dimensions=[
            {"Name": "EndpointName", "Value": endpoint_name},
            {"Name": "VariantName", "Value": variant_name},
        ],
    )
    return (
        pd.DataFrame(metrics["Datapoints"])
        .sort_values("Timestamp")
        .set_index("Timestamp")
        .drop("Unit", axis=1)
        .rename(columns={"Sum": variant_name})
    )


def plot_endpoint_metrics(start_time=None):
    start_time = start_time or datetime.now() - timedelta(minutes=60)
    end_time = datetime.now()
    metrics_variant1 = get_invocation_metrics_for_endpoint_variant(
        endpoint_name, variantA["VariantName"], start_time, end_time
    )
    metrics_variant2 = get_invocation_metrics_for_endpoint_variant(
        endpoint_name, variantB["VariantName"], start_time, end_time
    )
    metrics_variants = metrics_variant1.join(metrics_variant2, how="outer")
    metrics_variants.plot()
    return metrics_variants

In [None]:
##Send traffic to endpoint for about 2 minutes.  
##You should see both the variants serving traffic, after the endpoint is updated.
print(f"Sending test traffic to the endpoint {endpoint_name}. \nPlease wait...")
#Skip the first line since it has column headers
for tl in t_lines[1:200]:
    #print(".", end="", flush=True)
    #Remove the first column since it is the label
    test_list = tl.split(",")
    test_list.pop(0)
    test_string = ','.join([str(elem) for elem in test_list])
    
    result = smrt.invoke_endpoint(EndpointName=endpoint_name,
                                   ContentType="text/csv",
                                   Body=test_string)
    #print(result)                              
    rbody = StreamingBody(raw_stream=result['Body'],content_length=int(result['ResponseMetadata']['HTTPHeaders']['content-length']))
    print(f"Result from {result['InvokedProductionVariant']} = {rbody.read().decode('utf-8')}")
    time.sleep(0.5)  
print("Done!")

In [None]:
print("Waiting a minute for initial metric creation...")
time.sleep(60)
plot_endpoint_metrics()

### 7. Update endpoint to contain just the VariantB

#### 7.1 - Gradually update the weights of each production variants

In [None]:
#Update the product variant weight to route 60% of traffic to VariantB
sm.update_endpoint_weights_and_capacities(
    EndpointName=endpoint_name,
    DesiredWeightsAndCapacities=[
        {"DesiredWeight": 4, "VariantName": variantA["VariantName"]},
        {"DesiredWeight": 6, "VariantName": variantB["VariantName"]},
    ],
)

##### 7.2 - Alternatively, update the endpoint to route all live traffic to VariantB in a single step

In [None]:
##Update the endpoint to point to VariantB
endpoint_config_new =f"abtest-b-config-{datetime.now():%Y-%m-%d-%H-%M-%S}"

sagemaker_session.create_endpoint_config_from_existing (
    existing_config_name=endpoint_name,
    new_config_name=endpoint_config_new,
    new_production_variants=[variantB]
)

In [None]:
##Update the endpoint
##Note : This step will fail if the endpoint is still updating
sagemaker_session.update_endpoint(endpoint_name=endpoint_name, endpoint_config_name=endpoint_config_new, wait=False)

### 8. Cleanup

In [None]:
# If you do not plan to use this endpoint further, you should delete the endpoint to avoid incurring additional charges.
sagemaker_session.delete_endpoint(endpoint_name)