# AWS SageMaker Inference Immersion Day
This notebook shows how to:
* __Lab 1:__ Deploy a real time endpoint with a prebuilt container and invoke it.
* __Lab 2:__ Deploy a real time endpoint with a custom container.
* __Lab 3:__ Host an endpoint with multiple production variants with different traffic.

## General Setup (~20 - 30 min)

### Account Configuration with Event Engine

Please follow the instructions to configure the AWS account that you will be using for this workshop. Go to https://dashboard.eventengine.run/ and paste the `Event hash` provided.

### SageMaker Studio Configuration

Create a SageMaker Studio User clicking on `Add user`. Once it's been created, click on the user name and copy the IAM role associated with this one.

![config](../assets/sagemaker-studio-2.png)

![config](../assets/sagemaker-studio-3.png)

Go to IAM and search for the role and click on it.

![config](../assets/sagemaker-studio-4.png)

Click on `Trust Relationships` and in the Edit Trust Relationship text area, paste the following JSON, then click on `Update Trust Policy`.

![config](../assets/sagemaker-studio-5.png)

```json
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
              "Service": "sagemaker.amazonaws.com"
            },
            "Action": "sts:AssumeRole"
        },
        {
            "Effect": "Allow",
            "Principal": {
              "Service": "codebuild.amazonaws.com"
            },
            "Action": "sts:AssumeRole"
         }
    ]
}
```


The `Trust Relationships` tab should show 2 Trusted entities now.

![config](../assets/sagemaker-studio-6.png)

To be able to run docker image building from Studio notebook, you need to add an inline policy to this role as follows. Frist click on `Permissions` tab, then click on `Add inline policy`.

![config](../assets/sagemaker-studio-7.png)

In the `Create policy` page, click on `JSON` tab then copy the following text and paste it in the text area replacing the default text, then click on `Review Policy` follow the remaining steps to create the inline policy.

![config](../assets/sagemaker-studio-8.png)

```bash
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "codebuild:DeleteProject",
                "codebuild:CreateProject",
                "codebuild:BatchGetBuilds",
                "codebuild:StartBuild"
            ],
            "Resource": "arn:aws:codebuild:*:*:project/sagemaker-studio*"
        },
        {
            "Effect": "Allow",
            "Action": "logs:CreateLogStream",
            "Resource": "arn:aws:logs:*:*:log-group:/aws/codebuild/sagemaker-studio*"
        },
        {
            "Effect": "Allow",
            "Action": [
                "logs:GetLogEvents",
                "logs:PutLogEvents"
            ],
            "Resource": "arn:aws:logs:*:*:log-group:/aws/codebuild/sagemaker-studio*:log-stream:*"
        },
        {
            "Effect": "Allow",
            "Action": "logs:CreateLogGroup",
            "Resource": "*"
        },
        {
            "Effect": "Allow",
            "Action": [
                "ecr:CreateRepository",
                "ecr:BatchGetImage",
                "ecr:CompleteLayerUpload",
                "ecr:DescribeImages",
                "ecr:DescribeRepositories",
                "ecr:UploadLayerPart",
                "ecr:ListImages",
                "ecr:InitiateLayerUpload",
                "ecr:BatchCheckLayerAvailability",
                "ecr:PutImage"
            ],
            "Resource": "arn:aws:ecr:*:*:repository/sagemaker-studio*"
        },
        {
            "Effect": "Allow",
            "Action": "ecr:GetAuthorizationToken",
            "Resource": "*"
        },
        {
            "Effect": "Allow",
            "Action": [
                "s3:GetObject",
                "s3:DeleteObject",
                "s3:PutObject"
            ],
            "Resource": "arn:aws:s3:::sagemaker-*/*"
        },
        {
            "Effect": "Allow",
            "Action": [
                "s3:CreateBucket"
            ],
            "Resource": "arn:aws:s3:::sagemaker*"
        },
        {
            "Effect": "Allow",
            "Action": [
                "iam:GetRole",
                "iam:ListRoles"
            ],
            "Resource": "*"
        },
        {
            "Effect": "Allow",
            "Action": "iam:PassRole",
            "Resource": "arn:aws:iam::*:role/*",
            "Condition": {
                "StringLikeIfExists": {
                    "iam:PassedToService": "codebuild.amazonaws.com"
                }
            }
        }
    ]
}
```

### Imports

Import APIs to be used by the notebook. For almost all of the tasks presented in this notebook, we'll be using [SageMaker Python SDK](https://sagemaker.readthedocs.io/en/stable/)

In [None]:
import copy
import json
import random
import time
import boto3
import pandas as pd
import numpy as np

from datetime import datetime, timedelta
from sagemaker.session import production_variant
from sagemaker import get_execution_role, image_uris, Session
from sagemaker.serializers import CSVSerializer
from sagemaker.clarify import (
    BiasConfig,
    DataConfig,
    ModelConfig,
    ModelPredictedLabelConfig,
    SHAPConfig,
)
from sagemaker.model import Model
from sagemaker.model_monitor import (
    BiasAnalysisConfig,
    CronExpressionGenerator,
    DataCaptureConfig,
    EndpointInput,
    ExplainabilityAnalysisConfig,
    ModelBiasMonitor,
    ModelExplainabilityMonitor,
)
from sagemaker.s3 import S3Downloader, S3Uploader

### Handful of configuration
Here, we are configuring the execution role that we'll be using for deploying everything, the bucket where we'll save artifacts and data and some prefixes to save data in separate folders according to our needs.

In [None]:
role = get_execution_role()
print(f"RoleArn: {role}")

boto_session = boto3.session.Session()

sagemaker_session = Session(boto_session)
sagemaker_client = sagemaker_session.sagemaker_client
sagemaker_runtime_client = sagemaker_session.sagemaker_runtime_client

region = sagemaker_session.boto_region_name
print(f"AWS region: {region}")

In [None]:
# A different bucket can be used, but make sure the role for this notebook has
# the s3:PutObject permissions. This is the bucket into which the data is captured
bucket = sagemaker_session.default_bucket()
print(f"Demo Bucket: {bucket}")
prefix = "sagemaker/DEMO-ClarifyModelMonitor-20200901"
s3_key = f"s3://{bucket}/{prefix}"
print(f"S3 key: {s3_key}")

s3_capture_upload_path = f"{s3_key}/datacapture"
ground_truth_upload_path = f"{s3_key}/ground_truth_data/{datetime.now():%Y-%m-%d-%H-%M-%S}"
s3_report_path = f"{s3_key}/reports"

print(f"Capture path: {s3_capture_upload_path}")
print(f"Ground truth path: {ground_truth_upload_path}")
print(f"Report path: {s3_report_path}")

baseline_results_uri = f"{s3_key}/baselining"
print(f"Baseline results uri: {baseline_results_uri}")

endpoint_instance_count = 1
endpoint_instance_type = "ml.m5.large"
schedule_expression = CronExpressionGenerator.hourly()

### Model files and data files

In [None]:
model_file = "model/xgb-churn-prediction-model.tar.gz"
test_file = "test_data/test-file.txt"
test_dataset = "test_data/test.csv"
validation_dataset = "test_data/validation-dataset-with-header.csv"
dataset_type = "text/csv"

In [None]:
with open(validation_dataset) as f:
    headers_line = f.readline().rstrip()
all_headers = headers_line.split(",")
label_header = all_headers[0]

To verify that the execution role for this notebook has the necessary permissions to proceed. Put a simple test object into the S3 bucket speciﬁed above. If this command fails, update the role to have `s3:PutObject` permission on the bucket and try again.

In [None]:
# Upload a test file
S3Uploader.upload(test_file, f"s3://{bucket}/test_upload", sagemaker_session=sagemaker_session)
print("Success! We are all set to proceed.")

# LAB 1: Deploying a real-time endpoint on Amazon SageMaker (~20-25 min)

Here, we trained a model beforehand for you, so we'll be using the resulting artifact here to save some time. If you want to know how this model was trained, please refer to the [training notebook](../training/xgboost_customer_churn.ipynb).

## Upload the pre-trained model to Amazon S3
As an example, this code uploads a pre-trained XGBoost model that is ready for to be deployed. This model was trained using the [code that you can find on training folder](../training/xgboost_customer_churn.ipynb) in SageMaker. In order to deploy an endpoint, we will need to first upload the model artifact (the serialized object) to S3.

In [None]:
model_url = S3Uploader.upload(local_path=model_file, desired_s3_uri=s3_key, sagemaker_session=sagemaker_session)
print(f"Model file has been uploaded to {model_url}")

## Deploy the model with Amazon SageMaker

Start with deploying a pre-trained churn prediction model. Here, create the SageMaker `Model` object with the image and model data.

In [None]:
model_name = f"DEMO-xgb-churn-pred-model-monitor-{datetime.utcnow():%Y-%m-%d-%H%M%S}"
print("Model name: ", model_name)
endpoint_name = f"DEMO-xgb-churn-model-monitor-{datetime.utcnow():%Y-%m-%d-%H%M%S}"
print("Endpoint name: ", endpoint_name)

Once you have a trained model, you can include it in a Docker container that runs your inference code. A container provides an effectively isolated environment, ensuring a consistent runtime regardless of where the container is deployed. Containerizing your model and code enables fast and reliable deployment of your model.

We are going to use an already pre-built docker image with `xgboost 0.90-1` version by SageMaker. In order to do this, we will retrieve the ECR docker image URL from Amazon ECR.

In [None]:
image_uri = image_uris.retrieve("xgboost", region, "0.90-1")
print(f"XGBoost image uri: {image_uri}")

Amazon SageMaker already has multiple pre-built docker images for you to use or extend. For more info on this please refer to these links:

- [Deep Learning Docker Images](https://docs.aws.amazon.com/sagemaker/latest/dg/pre-built-containers-frameworks-deep-learning.html)
- [Sklearn and Spark ML](https://docs.aws.amazon.com/sagemaker/latest/dg/pre-built-docker-containers-scikit-learn-spark.html)

Also, Sagemaker Python SDK has already a high-level interface called `Estimator` to handle end-to-end training and deployment of most common ML and Deep Learning frameworks that you can find out there. Also, if __you already have a model that you trained somewhere else__, you can use `Model` interface to deploy a model as an endpoint in SageMaker. Refer to this links if you want to go deeper on your framework of interest.

- [Scikit-learn](https://sagemaker.readthedocs.io/en/stable/frameworks/sklearn/index.html)
- [SparkML](https://sagemaker.readthedocs.io/en/stable/frameworks/sparkml/index.html)
- [Tensorflow](https://sagemaker.readthedocs.io/en/stable/frameworks/tensorflow/index.html)
- [PyTorch](https://sagemaker.readthedocs.io/en/stable/frameworks/pytorch/index.html)
- [HuggingFace](https://sagemaker.readthedocs.io/en/stable/frameworks/huggingface/index.html)
- [MXNet](https://sagemaker.readthedocs.io/en/stable/frameworks/mxnet/index.html)


To create the model, we need to pass the mentioned Docker ECR uri, an IAM role used to access the data on S3 and create the endpoint on SageMaker, the model url on S3. Also in order to deploy an endpoint, we will to configure the instance (count and type) and a serializer that will define how the data will be encoded.

In [None]:
model = Model(
    # CODE STARTS HERE

    # CODE ENDS HERE
)
print(f"Deploying model {model_name} to endpoint {endpoint_name}")
model.deploy(
    # CODE STARTS HERE
    
    # CODE ENDS HERE
)

## Invoke the endpoint

Here, we are using the `SageMaker runtime client` to send some data to our realtime endpoint.

In [None]:
print(f"Sending test traffic to the endpoint {endpoint_name}. \nPlease wait", end="")
test_dataset_size = 0  # record the number of rows in data we're sending for inference
count = 0
with open(test_dataset, "r") as f:
    for row in f:
        if test_dataset_size < 10:
            payload = row.rstrip("\n")
            response = sagemaker_runtime_client.invoke_endpoint(
                # CODE STARTS HERE
                # CODE ENDS HERE
            )
            prediction = response["Body"].read()
            print(prediction)
            
            time.sleep(0.5)
        test_dataset_size += 1

print()
print("Done!")

Predictions correspond here to the actual probability of a client to churn or not, which goes from 0 to 1.

# LAB 2: Deploy a model with a custom docker image (~15-20 min)

There are some cases where you want to use a custom docker image environment to deploy your models. For those, you can actually build your own customized docker image to deploy your model with SageMaker. If you want to know how this model was trained, please refer to the [training notebook](../training/scikitlearn_churn_prediction.py.ipynb).

This will cover:
- Building and pushing a docker image to Amazon Elastic Container Registry (Amazon ECR).
- Using that image for deploying the model with SageMaker.
- Invoking the endpoint using the SageMaker runtime client.

The docker image that we will be using has the following files:

- __Dockerfile__: Specification for building your docker image.
- __nginx.conf__: Nginx configuration file.
- __wsgi.py__: Wrapper for gunicorn.
- __serve__: Entrypoint for sagemaker to start the gunicorn server and nginx proxy.
- __sklearn_model.joblib__: Model artifact result of [training notebook](../training/scikitlearn_churn_prediction.py.ipynb).
- __predictor.py__: Inference code, flask simple rest api with 2 endpoints, `/ping` and `/invocations`.
- __requirements.txt__: Python requirements.
- __build_and_push.sh__: Utility script for build and pushing your ECR image locally. This can used in replacement of `sm-docker`.

This files can be found inside the [custom_container folder](custom_container/).

You might encounter another examples that use a most updated way for deploying your custom containers called __sagemaker inference toolkit__ which is the recommended. However, you can still use both without problems. For more info on this please refer to this [link](https://github.com/aws/sagemaker-inference-toolkit)

## Build and push your docker image to ECR

We will be using a docker image for packaging our new model trained with the same dataset the first model was trained on but with a different algorithm, this case implemented with scikit-learn. 

The following will:
- Create an ECR repository if does not exist.
- Build a docker image and tag it accordingly
- Push the docker image that has been built to the created ECR repo.

In order to do this from SageMaker studio notebook we need to use `sm-docker`. If you want to replicate this running locally, please refer to [sagemaker-inference-immersion-day-local-ver.ipynb](local_notebook/sagemaker-inference-immersion-day-local-ver.ipynb) notebook.

In [None]:
!conda update setuptools -y && pip install sagemaker-studio-image-build

In [None]:
!cd custom_container && sm-docker build . --repository sagemaker-studio-sklearn-custom:latest

## Create the endpoint using the custom image

__Note__: Before executing next cells, assign the image uri/repository that you got from last cell in the next one to the `image_uri_custom` variable.

In [None]:
image_uri_custom = ""

In [None]:
model_name_custom = f"DEMO-sklearn-churn-predictor-{datetime.utcnow():%Y-%m-%d-%H%M%S}"
print("Model name: ", model_name_custom)
endpoint_name_custom = f"DEMO-sklearn-churn-predictor-{datetime.utcnow():%Y-%m-%d-%H%M%S}"
print("Endpoint name: ", endpoint_name_custom)

In [None]:
model = Model(
    role=role,
    name=model_name_custom,
    image_uri=image_uri_custom,
    sagemaker_session=sagemaker_session,
)
print(f"Deploying model {model_name_custom} to endpoint {endpoint_name_custom}")
model.deploy(
    initial_instance_count=endpoint_instance_count,
    instance_type=endpoint_instance_type,
    endpoint_name=endpoint_name_custom
)

## Invoke the endpoint

As we did before, for invoking the endpoint, we use the `sagemaker runtime client`.

In [None]:
print(f"Sending test traffic to the endpoint {endpoint_name_custom}. \nPlease wait", end="")
test_dataset_size = 0  # record the number of rows in data we're sending for inference
count = 0
with open(test_dataset, "r") as f:
    for row in f:
        if test_dataset_size < 10:
            payload = row.rstrip("\n")
            response = sagemaker_runtime_client.invoke_endpoint(
                EndpointName=endpoint_name_custom,
                Body=payload[2:],
                ContentType=dataset_type,
            )
            prediction = response["Body"].read()
            print(prediction)
            
            time.sleep(0.5)
        test_dataset_size += 1

print()
print("Done!")

# LAB 3: Production Variants and A/B Testing (~15-20 min)

Amazon SageMaker enables you to test multiple models or model versions behind the same endpoint using production variants. Each production variant identifies a machine learning (ML) model and the resources deployed for hosting the model. You can distribute endpoint invocation requests across multiple production variants by providing the traffic distribution for each variant, or you can invoke a specific variant directly for each request.

## Deploy a real-time endpoint with 2 production variants

For this case, we'll be using both models that were already configured in previous steps. Each one is created as a production variant of the endpoint:
- XGBoost Model with 60% of the traffic.
- ScikitLearn Model with 40% left.

In [None]:
production_variants = [
    production_variant(
        # CODE STARTS HERE
        
        # CODE ENDS HERE
    ),
    production_variant(
        # CODE STARTS HERE
        
        # CODE ENDS HERE
    )
]

In [None]:
production_variant_endpoint = sagemaker_session.endpoint_from_production_variants(
    name=f"DEMO-production-variant-endpoint-{datetime.utcnow():%Y-%m-%d-%H%M%S}",
    production_variants=production_variants,
    wait=True,
)

## Invoke the endpoint

In [None]:
# Each endpoint variant will receive a proportion of the calls, defined by the weight. 
# A specific variant can be called by passing the 'TargetVariant' parameter

def invoke_endpoint(payload, **kwargs):
    response = sagemaker_runtime_client.invoke_endpoint(Body=payload, **kwargs)
    prediction = response["Body"].read()
    variant = response['ResponseMetadata']['HTTPHeaders']['x-amzn-invoked-production-variant']
    return prediction, variant

print(f"Sending test traffic to the endpoint {production_variant_endpoint}. \nPlease wait\n", end="")
#params = {'EndpointName':production_variant_endpoint, 'ContentType':dataset_type, 'TargetVariant':"sklearn-variant",} # You can pass the endpoint variant
params = {'EndpointName':production_variant_endpoint, 'ContentType':dataset_type,}
with open(test_dataset, "r") as f:
    for i, row in enumerate(f):
        if i < 15:
            response, variant = invoke_endpoint(row.rstrip("\n")[2:], **params)
            print('Received prediction :' + str(response) + ' from variant ' + variant)
            time.sleep(0.1)


## A/B Testing

In many cases, such as e-commerce applications, offline model evaluation isn’t sufficient, and you need to A/B test models in production before making the decision of updating models. With Amazon SageMaker, you can easily perform A/B testing on ML models by running multiple production variants on an endpoint. You can use production variants to test ML models that have been trained using different training datasets, algorithms, and ML frameworks; test how they perform on different instance types; or a combination of all of the above.

In A/B testing, you test different variants of your models and compare how each variant performs relative to each other. You then choose the best-performing model to replace a previously-existing model new version delivers better performance than the previously-existing version.

In [None]:
# Lets see how often each variant is called.

cw = boto_session.client("cloudwatch")

def get_invocation_metrics_for_endpoint_variant(endpoint_name, variant_name, start_time, end_time):
    metrics = cw.get_metric_statistics(
        Namespace="AWS/SageMaker",
        MetricName="Invocations",
        StartTime=start_time,
        EndTime=end_time,
        Period=60,
        Statistics=["Sum"],
        Dimensions=[
            {"Name": "EndpointName", "Value": endpoint_name},
            {"Name": "VariantName", "Value": variant_name},
        ],
    )
    return (
        pd.DataFrame(metrics["Datapoints"])
        .sort_values("Timestamp")
        .set_index("Timestamp")
        .drop("Unit", axis=1)
        .rename(columns={"Sum": variant_name})
    )


def plot_endpoint_metrics(start_time=None):
    start_time = start_time or datetime.now() - timedelta(minutes=60)
    end_time = datetime.now()
    metrics_variant1 = get_invocation_metrics_for_endpoint_variant(
        production_variant_endpoint, "xgboost-variant", start_time, end_time
    )
    metrics_variant2 = get_invocation_metrics_for_endpoint_variant(
        production_variant_endpoint, "sklearn-variant", start_time, end_time
    )
    metrics_variants = metrics_variant1.join(metrics_variant2, how="outer")
    metrics_variants.plot()
    return metrics_variants

In [None]:
print("Waiting a few seconds for initial metric creation...")
time.sleep(30)
m = plot_endpoint_metrics()

In [None]:
# Make a few calls to a specific variant and re-check the stats

params = {'EndpointName':production_variant_endpoint, 'ContentType':dataset_type, 'TargetVariant':"sklearn-variant",} # You can pass the endpoint variant
with open(test_dataset, "r") as f:
    for i, row in enumerate(f):
        if i < 15:
            response, _ = invoke_endpoint(row.rstrip("\n")[2:], **params)
            time.sleep(0.1)
time.sleep(10)
m = plot_endpoint_metrics()

In [None]:
# Let's evaluate the performance of each endpoint by calling it with test data

params_sklearn = {'EndpointName':production_variant_endpoint, 'ContentType':dataset_type, 'TargetVariant':"sklearn-variant",}
params_xgboost = {'EndpointName':production_variant_endpoint, 'ContentType':dataset_type, 'TargetVariant':"xgboost-variant",}
label = []
predict_sklearn = []
predict_xgboost = []
with open(test_dataset, "r") as f:
    for i, row in enumerate(f):
        if i < 100:
            label.append(int(row.rstrip("\n")[0]))
            response, _ = invoke_endpoint(row.rstrip("\n")[2:], **params_sklearn)
            predict_sklearn.append(round(eval(response)['pred']))
            response, _ = invoke_endpoint(row.rstrip("\n")[2:], **params_xgboost)
            predict_xgboost.append(round(eval(response)))

We will get then some metrics for each of the endpoint with predictions you got from both variants to evaluate both models.

In [None]:
label = np.array(label)
predict_sklearn = np.array(predict_sklearn)
predict_xgboost = np.array(predict_xgboost)


sklearn_accuracy = sum(predict_sklearn == label) / len(label)
xgboost_accuracy = sum(predict_xgboost == label) / len(label)
print('Accuracy -> sklearn: {}, xgboost: {}'.format(sklearn_accuracy, xgboost_accuracy))

# Calculate precision
sklearn_precision = round(sum(predict_sklearn[predict_sklearn == 1] == label[predict_sklearn == 1]) / len(predict_sklearn[predict_sklearn == 1]), 2)
xgboost_precision = round(sum(predict_xgboost[predict_xgboost == 1] == label[predict_xgboost == 1]) / len(predict_xgboost[predict_xgboost == 1]), 2)
print('Precision -> sklearn: {}, xgboost: {}'.format(sklearn_precision, xgboost_precision))

# Calculate recall
sklearn_recall = round(sum(predict_sklearn[predict_sklearn == 1] == label[predict_sklearn == 1]) / len(label[label == 1]), 2)
xgboost_recall = round(sum(predict_xgboost[predict_xgboost == 1] == label[predict_xgboost == 1]) / len(label[label == 1]), 2)
print('Recall -> sklearn: {}, xgboost: {}'.format(sklearn_recall, xgboost_precision))

# Calculate F1 score
sklearn_f1_score = round(2 * (sklearn_precision * sklearn_recall) / (sklearn_precision + sklearn_recall), 2)
xgboost_f1_score = round(2 * (xgboost_precision * xgboost_recall) / (xgboost_precision + xgboost_recall), 2)
print('F1 Score -> sklearn: {}, xgboost: {}'.format(sklearn_f1_score, xgboost_f1_score))


Now let's update the traffic weights accordingly in our endpoint.

In [None]:
# We see that the xgboost variant is performing better, so lets increase the weight given to this variant
sagemaker_client.update_endpoint_weights_and_capacities(
    EndpointName=production_variant_endpoint,
    DesiredWeightsAndCapacities=[
        {"DesiredWeight": 20, "VariantName": "sklearn-variant"},
        {"DesiredWeight": 80, "VariantName": "xgboost-variant"},
    ],
)

In [None]:
# We can confirm that the change was succesfull by describing the enpoint
{
    variant["VariantName"]: variant["CurrentWeight"]
    for variant in sagemaker_client.describe_endpoint(EndpointName=production_variant_endpoint)["ProductionVariants"]
}