# Perform A/B Test using REST Endpoints

You can test and deploy new models behind a single SageMaker Endpoint with a concept called “production variants.” These variants can differ by hardware (CPU/GPU), by data (comedy/drama movies), or by region (US West or Germany North). You can shift traffic between the models in your endpoint for canary rollouts and blue/green deployments. You can split traffic for A/B tests. And you can configure your endpoint to automatically scale your endpoints out or in based on a given metric like requests per second. As more requests come in, SageMaker will automatically scale the model prediction API to meet the demand.

<img src="img/model_ab.png" width="80%" align="left">

We can use traffic splitting to direct subsets of users to different model variants for the purpose of comparing and testing different models in live production. The goal is to see which variants perform better. Often, these tests need to run for a long period of time (weeks) to be statistically significant. The figure shows 2 different recommendation models deployed using a random 50-50 traffic split between the 2 variants.

In [None]:
import boto3
import sagemaker
import pandas as pd

sess   = sagemaker.Session()
bucket = sess.default_bucket()
role = sagemaker.get_execution_role()
region = boto3.Session().region_name
account_id = boto3.client('sts').get_caller_identity().get('Account')

sm = boto3.Session().client(service_name='sagemaker', region_name=region)
cw = boto3.Session().client(service_name='cloudwatch', region_name=region)

# Clean Up Previous Endpoints to Save Resources

In [None]:
%store -r autopilot_endpoint_name

In [None]:
try: 
    autopilot_endpoint_name
    sm.delete_endpoint(
        EndpointName=autopilot_endpoint_name
    )
    print('Autopilot Endpoint has been deleted to save resources.  This is good.')    
except:
    print('Endpoints are cleaned up.  This is good.  Keep moving forward!')

In [None]:
%store -r training_job_name

In [None]:
print(training_job_name)

In [None]:
%store -r pytorch_model_name

In [None]:
print(pytorch_model_name)

# Set the Docker Image URI Built in a Previous Notebook

In [None]:
# docker_repo = 'torchserve'
# docker_tag = 'torch-1.5.0-1.0.0'

# image_uri = f'{account_id}.dkr.ecr.{region}.amazonaws.com/{docker_repo}:{docker_tag}'

# Set the S3 Location of the Trained PyTorch Model `model.tar.gz`

In [None]:
# tmp_torchserve_model_name = 'reviews-distilbert-pytorch'

# print(tmp_torchserve_model_name)

In [None]:
# tmp_torchserve_tar_s3_uri = 's3://{}/models/torchserve/model.tar.gz'.format(bucket, tmp_torchserve_model_name)

# print(tmp_torchserve_tar_s3_uri)

# Prepare Model VariantA

In [None]:
# import time
# timestamp = int(time.time())

# pytorch_model_a_name = '{}-{}-{}-{}'.format(training_job_name, 'pt', 'a', timestamp)

# print(pytorch_model_a_name)

In [None]:
# from sagemaker.model import Model
# from sagemaker.predictor import RealTimePredictor

# pytorch_model_variant_a = Model(model_data=tmp_torchserve_tar_s3_uri, 
#                                 image=image_uri,
#                                 role=role,
#                                 predictor_cls=RealTimePredictor,
#                                 name=pytorch_model_a_name)

# Prepare Model VariantB

In [None]:
# tmp_torchserve_model_name = 'reviews-distilbert-pytorch'

# print(tmp_torchserve_model_name)

In [None]:
# import time
# timestamp = int(time.time())

# pytorch_model_b_name = '{}-{}-{}-{}'.format(training_job_name, 'pt', 'b', timestamp)

# print(pytorch_model_b_name)

In [None]:
# from sagemaker.model import Model
# from sagemaker.predictor import RealTimePredictor

# pytorch_model_variant_b = Model(model_data=tmp_torchserve_tar_s3_uri, 
#                                 image=image_uri,
#                                 role=role,
#                                 predictor_cls=RealTimePredictor,
#                                 name=pytorch_model_b_name)

# Canary Rollouts and A/B Testing

Canary rollouts are used to release new models safely to only a small subset of users such as 5%. They are useful if you want to test in live production without affecting the entire user base. Since the majority of traffic goes to the existing model, the cluster size of the canary model can be relatively small since it’s only receiving 5% traffic.

Instead of `deploy()`, we can create an `Endpoint Configuration` with multiple variants for canary rollouts and A/B testing.

In [None]:
from sagemaker.session import production_variant

timestamp = '{}'.format(int(time.time()))

endpoint_config_name = '{}-{}-{}-{}'.format(training_job_name, 'pt', 'ab', timestamp)

variantA = production_variant(model_name='tensorflow-training-2020-08-18-03-46-11-116-pt-1597726343',
                              instance_type='ml.m5.large',
                              initial_instance_count=1,
                              variant_name='VariantA',
                              initial_weight=50)

variantB = production_variant(model_name='tensorflow-training-2020-08-18-03-46-11-116-pt-1597726343',
                              instance_type='ml.m5.large',
                              initial_instance_count=1,
                              variant_name='VariantB',
                              initial_weight=50)

endpoint_config = sm.create_endpoint_config(
    EndpointConfigName=endpoint_config_name,
    ProductionVariants=[variantA, variantB]
)

In [None]:
from IPython.core.display import display, HTML

display(HTML('<b>Review <a target="blank" href="https://console.aws.amazon.com/sagemaker/home?region={}#/endpointConfig/{}">REST Endpoint Configuration</a></b>'.format(region, endpoint_config_name)))


In [None]:
pytorch_model_ab_endpoint_name = '{}-{}-{}-{}'.format(training_job_name, 'pt', 'ab', timestamp)

endpoint_response = sm.create_endpoint(
    EndpointName=pytorch_model_ab_endpoint_name,
    EndpointConfigName=endpoint_config_name)

# Store Endpoint Name for Next Notebook(s)

In [None]:
# %store pytorch_model_ab_endpoint_name

# Track the Deployment Within our Experiment

In [None]:
%store -r experiment_name

In [None]:
print(experiment_name)

In [None]:
%store -r trial_name

In [None]:
print(trial_name)

In [None]:
from smexperiments.trial import Trial

timestamp = '{}'.format(int(time.time()))

trial = Trial.load(trial_name=trial_name)
print(trial)

In [None]:
from smexperiments.tracker import Tracker

tracker_deploy = Tracker.create(display_name='deploy', 
                                sagemaker_boto_client=sm)

deploy_trial_component_name = tracker_deploy.trial_component.trial_component_name
print('Deploy trial component name {}'.format(deploy_trial_component_name))

# Attach the `deploy` Trial Component and Tracker as a Component to the Trial

In [None]:
trial.add_trial_component(tracker_deploy.trial_component)

# Track the Endpoint Name

In [None]:
tracker_deploy.log_parameters({
    'endpoint_name': pytorch_model_ab_endpoint_name,
})

# must save after logging
tracker_deploy.trial_component.save()

In [None]:
from sagemaker.analytics import ExperimentAnalytics

lineage_table = ExperimentAnalytics(
    sagemaker_session=sess,
    experiment_name=experiment_name,
    metric_names=['validation:accuracy'],
    sort_by="CreationTime",
    sort_order="Ascending",
)

lineage_df = lineage_table.dataframe()
lineage_df.shape

In [None]:
lineage_df

In [None]:
from IPython.core.display import display, HTML

display(HTML('<b>Review <a target="blank" href="https://console.aws.amazon.com/sagemaker/home?region={}#/endpoints/{}">REST Endpoint</a></b>'.format(region, pytorch_model_ab_endpoint_name)))


# _Wait Until the ^^ Endpoint ^^ is Deployed_

In [None]:
waiter = sm.get_waiter('endpoint_in_service')
waiter.wait(EndpointName=pytorch_model_ab_endpoint_name)

# Simulate a Prediction from an Application

In [106]:
#from sagemaker.tensorflow.serving import Predictor

#predictor = Predictor(endpoint_name=pytorch_model_ab_endpoint_name,
                      sagemaker_session=sess,
#                      content_type='application/json',
#                      model_name='saved_model',
#                      model_version=0
                     )

# Predict the `star_rating` with `review_body` Samples from our TSV's

In [107]:
import csv

df_reviews = pd.read_csv('./data/amazon_reviews_us_Digital_Software_v1_00.tsv.gz', 
                                delimiter='\t', 
                                quoting=csv.QUOTE_NONE,
                                compression='gzip')
df_sample_reviews = df_reviews[['review_body', 'star_rating']].sample(n=50)
df_sample_reviews = df_sample_reviews.reset_index()
df_sample_reviews.shape

(50, 3)

In [108]:
import pandas as pd

def predict(review_body):
    return predictor.predict(review_body).decode('utf-8')

df_sample_reviews['predicted_class'] = df_sample_reviews['review_body'].map(predict)
df_sample_reviews.head(5)

TypeError: Object of type 'bytes' is not JSON serializable

# Predict the `star_rating` with Ad Hoc `review_body` Samples

In [101]:
predicted_classes = predictor.predict("This is a wonderful product!")

print(predicted_classes.decode('utf-8'))

ValidationError: An error occurred (ValidationError) when calling the InvokeEndpoint operation: Endpoint tensorflow-training-2020-08-18-03-46-11-116-pt-ab-1597728923 of account 835319576252 not found.

# Review the REST Endpoint Performance Metrics in CloudWatch

In [None]:
from IPython.core.display import display, HTML

display(HTML('<b>Review <a target="blank" href="https://console.aws.amazon.com/sagemaker/home?region={}#/endpoints/{}">REST Endpoint Performance Metrics</a></b>'.format(region, pytorch_model_ab_endpoint_name)))


# Review the REST Endpoint Performance Metrics in a Dataframe

Amazon SageMaker emits metrics such as Latency and Invocations (full list of metrics [here](https://alpha-docs-aws.amazon.com/sagemaker/latest/dg/monitoring-cloudwatch.html)) for each variant in Amazon CloudWatch. Let’s query CloudWatch to get the InvocationsPerVariant to show how invocations are split across variants.

In [None]:
from datetime import datetime, timedelta

import boto3
import pandas as pd

def get_invocation_metrics_for_endpoint_variant(endpoint_name,
                                                namespace_name,
                                                metric_name,
                                                variant_name,
                                                start_time,
                                                end_time):
    metrics = cw.get_metric_statistics(
        Namespace=namespace_name,
        MetricName=metric_name,
        StartTime=start_time,
        EndTime=end_time,
        Period=60,
        Statistics=["Sum"],
        Dimensions=[
            {
                "Name": "EndpointName",
                "Value": pytorch_model_ab_endpoint_name
            },
            {
                "Name": "VariantName",
                "Value": variant_name
            }
        ]
    )

    if metrics['Datapoints']:
        return pd.DataFrame(metrics["Datapoints"])\
                .sort_values("Timestamp")\
                .set_index("Timestamp")\
                .drop("Unit", axis=1)\
                .rename(columns={"Sum": variant_name})
    else:
        return pd.DataFrame()


def plot_endpoint_metrics_for_variants(endpoint_name,
                                       namespace_name,
                                       metric_name,
                                       start_time=None):
    try:
        start_time = start_time or datetime.now() - timedelta(minutes=60)
        end_time = datetime.now()

        metrics_variantA = get_invocation_metrics_for_endpoint_variant(endpoint_name=pytorch_model_ab_endpoint_name, 
                                                                       namespace_name=namespace_name,
                                                                       metric_name=metric_name,
                                                                       variant_name=variantA["VariantName"], 
                                                                       start_time=start_time, 
                                                                       end_time=end_time)

        metrics_variantB = get_invocation_metrics_for_endpoint_variant(endpoint_name=pytorch_model_ab_endpoint_name,
                                                                       namespace_name=namespace_name,
                                                                       metric_name=metric_name,                                                                   
                                                                       variant_name=variantB["VariantName"], 
                                                                       start_time=start_time, 
                                                                       end_time=end_time)

        metrics_variants = metrics_variantA.join(metrics_variantB, how="outer")
        metrics_variants.plot()
    except:
        pass
    

# Show the Metrics for Each Variant
If you see `Metrics not yet available`, please be patient as metrics may take a few mins to appear in CloudWatch.

Also, make sure the predictions ran successfully above.

In [None]:
import matplotlib.pyplot as plt
%matplotlib inline
%config InlineBackend.figure_format='retina'

time.sleep(20)
plot_endpoint_metrics_for_variants(endpoint_name=pytorch_model_ab_endpoint_name,
                                   namespace_name='/aws/sagemaker/Endpoints',
                                   metric_name='CPUUtilization')

In [None]:
import matplotlib.pyplot as plt
%matplotlib inline
%config InlineBackend.figure_format='retina'

time.sleep(5)
plot_endpoint_metrics_for_variants(endpoint_name=pytorch_model_ab_endpoint_name,
                                   namespace_name='AWS/SageMaker',                                   
                                   metric_name='Invocations')

In [None]:
import matplotlib.pyplot as plt
%matplotlib inline
%config InlineBackend.figure_format='retina'

time.sleep(5)
plot_endpoint_metrics_for_variants(endpoint_name=pytorch_model_ab_endpoint_name,
                                   namespace_name='AWS/SageMaker',                                   
                                   metric_name='InvocationsPerInstance')

In [None]:
import matplotlib.pyplot as plt
%matplotlib inline
%config InlineBackend.figure_format='retina'

time.sleep(5)
plot_endpoint_metrics_for_variants(endpoint_name=pytorch_model_ab_endpoint_name,
                                   namespace_name='AWS/SageMaker',                                   
                                   metric_name='ModelLatency')

# Shift All Traffic to Variant B
_**No downtime** occurs during this traffic-shift activity._

This may take a few minutes.  Please be patient.

In [None]:
updated_endpoint_config = [
    {
        'VariantName': variantA['VariantName'],
        'DesiredWeight': 0,
    },
    {
        'VariantName': variantB['VariantName'],
        'DesiredWeight': 100,
    }
]

In [None]:
sm.update_endpoint_weights_and_capacities(
    EndpointName=pytorch_model_ab_endpoint_name,
    DesiredWeightsAndCapacities=updated_endpoint_config
)

In [None]:
from IPython.core.display import display, HTML

display(HTML('<b>Review <a target="blank" href="https://console.aws.amazon.com/sagemaker/home?region={}#/endpoints/{}">REST Endpoint</a></b>'.format(region, model_ab_endpoint_name)))


# _Wait for the ^^ Endpoint Update ^^ to Complete Above_
This may take a few minutes.  Please be patient.

In [None]:
waiter = sm.get_waiter('endpoint_in_service')
waiter.wait(EndpointName=pytorch_model_ab_endpoint_name)

# Run Some More Predictions

In [None]:
import pandas as pd

def predict(review_body):
    return predictor.predict(review_body).decode('utf-8')

df_sample_reviews['predicted_class'] = df_sample_reviews['review_body'].map(predict)
df_sample_reviews.head(5)

# Show the Metrics for Each Variant
If you see `Metrics not yet available`, please be patient as metrics may take a few mins to appear in CloudWatch.

Also, make sure the predictions ran successfully above.

In [None]:
import matplotlib.pyplot as plt
%matplotlib inline
%config InlineBackend.figure_format='retina'

time.sleep(20)
plot_endpoint_metrics_for_variants(endpoint_name=pytorch_model_ab_endpoint_name,
                                   namespace_name='/aws/sagemaker/Endpoints',
                                   metric_name='CPUUtilization')

In [None]:
import matplotlib.pyplot as plt
%matplotlib inline
%config InlineBackend.figure_format='retina'

time.sleep(5)
plot_endpoint_metrics_for_variants(endpoint_name=pytorch_model_ab_endpoint_name,
                                   namespace_name='AWS/SageMaker',                                   
                                   metric_name='Invocations')

In [None]:
import matplotlib.pyplot as plt
%matplotlib inline
%config InlineBackend.figure_format='retina'

time.sleep(5)
plot_endpoint_metrics_for_variants(endpoint_name=pytorch_model_ab_endpoint_name,
                                   namespace_name='AWS/SageMaker',                                   
                                   metric_name='InvocationsPerInstance')

In [None]:
import matplotlib.pyplot as plt
%matplotlib inline
%config InlineBackend.figure_format='retina'

time.sleep(5)
plot_endpoint_metrics_for_variants(endpoint_name=pytorch_model_ab_endpoint_name,
                                   namespace_name='AWS/SageMaker',                                   
                                   metric_name='ModelLatency')

# Remove Variant A to Reduce Cost
Modify the Endpoint Configuration to only use variant B.

_**No downtime** occurs during this scale-down activity._

This may take a few mins.  Please be patient.

In [None]:
import time
timestamp = '{}'.format(int(time.time()))

updated_endpoint_config_name = '{}-{}'.format(training_job_name, timestamp)

updated_endpoint_config = sm.create_endpoint_config(
    EndpointConfigName=updated_endpoint_config_name,
    ProductionVariants=[
        {
         'VariantName': variantB['VariantName'],  # Only specify variant B to remove variant A
         'ModelName': pytorch_model_name,
         'InstanceType':'ml.m5.large',
         'InitialInstanceCount': 1,
         'InitialVariantWeight': 100
        }
    ])

In [None]:
sm.update_endpoint(
    EndpointName=pytorch_model_ab_endpoint_name,
    EndpointConfigName=updated_endpoint_config_name
)

# _If You See An ^^ Error ^^ Above, Please Wait Until the Endpoint is Updated_

In [None]:
from IPython.core.display import display, HTML

display(HTML('<b>Review <a target="blank" href="https://console.aws.amazon.com/sagemaker/home?region={}#/endpoints/{}">REST Endpoint</a></b>'.format(region, model_ab_endpoint_name)))


# _Wait for the ^^ Endpoint Update ^^ to Complete Above_
This may take a few minutes.  Please be patient.

In [None]:
waiter = sm.get_waiter('endpoint_in_service')
waiter.wait(EndpointName=pytorch_model_ab_endpoint_name)

# Run Some More Predictions

In [None]:
import pandas as pd

def predict(review_body):
    return predictor.predict(review_body).decode('utf-8')

df_sample_reviews['predicted_class'] = df_sample_reviews['review_body'].map(predict)
df_sample_reviews.head(5)

# Show the Metrics for Each Variant
If you see `Metrics not yet available`, please be patient as metrics may take a few mins to appear in CloudWatch.

Also, make sure the predictions ran successfully above.

In [None]:
import matplotlib.pyplot as plt
%matplotlib inline
%config InlineBackend.figure_format='retina'

time.sleep(20)
plot_endpoint_metrics_for_variants(endpoint_name=pytorch_model_ab_endpoint_name,
                                   namespace_name='/aws/sagemaker/Endpoints',
                                   metric_name='CPUUtilization')

In [None]:
import matplotlib.pyplot as plt
%matplotlib inline
%config InlineBackend.figure_format='retina'

time.sleep(5)
plot_endpoint_metrics_for_variants(endpoint_name=pytorch_model_ab_endpoint_name,
                                   namespace_name='AWS/SageMaker',                                   
                                   metric_name='Invocations')

In [None]:
import matplotlib.pyplot as plt
%matplotlib inline
%config InlineBackend.figure_format='retina'

time.sleep(5)
plot_endpoint_metrics_for_variants(endpoint_name=pytorch_model_ab_endpoint_name,
                                   namespace_name='AWS/SageMaker',                                   
                                   metric_name='InvocationsPerInstance')

In [None]:
import matplotlib.pyplot as plt
%matplotlib inline
%config InlineBackend.figure_format='retina'

time.sleep(5)
plot_endpoint_metrics_for_variants(endpoint_name=pytorch_model_ab_endpoint_name,
                                   namespace_name='AWS/SageMaker',                                   
                                   metric_name='ModelLatency')

# Delete Endpoint
To save money, we should delete the endpoint.

In [None]:
sm.delete_endpoint(
     EndpointName=pytorch_model_ab_endpoint_name
)

In [None]:
%%javascript
Jupyter.notebook.save_checkpoint();
Jupyter.notebook.session.delete();

# More Links
* Optimize Cost with TensorFlow and Elastic Inference
https://aws.amazon.com/blogs/machine-learning/optimizing-costs-in-amazon-elastic-inference-with-amazon-tensorflow/

* Using API Gateway with SageMaker Endpoints
https://aws.amazon.com/blogs/machine-learning/creating-a-machine-learning-powered-rest-api-with-amazon-api-gateway-mapping-templates-and-amazon-sagemaker/