# Module 5: Deploy the model


Now that you have built and trained the models for feature engineering (using Amazon SageMaker Processing and SKLearn) and binary classification (using the XGBoost open-source container for Amazon SageMaker), you deploy them as an Amazon SageMaker Inference Pipeline endpoint. The endpoint will consists of a Feature Transformer and an XGBoost steps, deployed as a serial [inference pipeline](https://docs.aws.amazon.com/sagemaker/latest/dg/inference-pipelines.html) behind one endpoint for real-time inference. 

Import the modules and define session variables.

In [None]:
import sagemaker
import boto3

role = sagemaker.get_execution_role()
region = boto3.Session().region_name
sagemaker_session = sagemaker.Session()
bucket_name = sagemaker_session.default_bucket()
prefix = 'end-to-end-ml'

print(region)
print(role)
print(bucket_name)

## Retrieve model artifacts

First, you need to create two Amazon SageMaker **Model** objects, which associate the serialized training artifacts to the Docker container used for inference. To do that, you need to provide the paths to the serialized models in Amazon S3:
<ul>
    <li>For the SKLearn transform model, in Step 02 (Feature Engineering), you defined the path where the model artifacts are saved.</li>
    <li>For the XGBoost model, you need the find the path using Amazon SageMaker's naming convention, so you use a utility function to get the model artifacts of the last training job matching a specific base job name.</li>
</ul>

In [None]:
from notebook_utilities import get_latest_training_job_name, get_training_job_s3_model_artifacts

# SKLearn model artifacts path.
sklearn_model_path = 's3://{0}/{1}/output/sklearn/model.tar.gz'.format(bucket_name, prefix)

# XGBoost model artifacts path.
training_base_job_name = 'end-to-end-ml-sm-xgb'
latest_training_job_name = get_latest_training_job_name(training_base_job_name)
xgboost_model_path = get_training_job_s3_model_artifacts(latest_training_job_name)

print('SKLearn model path: ' + sklearn_model_path)
print('XGBoost model path: ' + xgboost_model_path)

## SKLearn Featurizer Model

Let's build the model object for the SKLearn model. When building this model object, you provide a custom inference script that processes the inputs and outputs and execute the transform.

The custom inference scrip, `sklearn_source_dir/inference.py`, defines:

- a custom `input_fn` for pre-processing inference requests. The input function accepts CSV input, loads the input in a Pandas dataframe, and assigns feature column names to the dataframe
- a custom `predict_fn` for running the transform over the inputs
- a custom `output_fn` for returning either JSON or CSV
- a custom `model_fn` for deserializing the model

In [None]:
!pygmentize sklearn_source_dir/inference.py

Now, let's create the `SKLearnModel` object by providing the custom script and the path to S3 model artifacts as input.

In [None]:
import time
from sagemaker.sklearn import SKLearnModel

code_location = 's3://{0}/{1}/code'.format(bucket_name, prefix)

sklearn_model = SKLearnModel(name='end-to-end-ml-sm-skl-model-{0}'.format(str(int(time.time()))),
                             model_data=sklearn_model_path,
                             entry_point='inference.py',
                             source_dir='sklearn_source_dir/',
                             code_location=code_location,
                             role=role,
                             sagemaker_session=sagemaker_session,
                             framework_version='0.20.0',
                             py_version='py3')

## XGBoost Model

Like the previous step, create an `XGBoost` model object and provide a custom inference script.

The inference script, `xgboost_source_dir/inference.py`, defines:

- a custom `input_fn` for pre-processing inference requests. This input function can handle JSON requests plus all content types supported by the default XGBoost container. For additional information please visit: https://github.com/aws/sagemaker-xgboost-container/blob/master/src/sagemaker_xgboost_container/encoder.py. The reason for adding the JSON content type is that the container-to-container default request content type in an inference pipeline is JSON.
- a custom `model_fn` for deserializing the model

In [None]:
!pygmentize xgboost_source_dir/inference.py

Now, let's create the `XGBoostModel` object by providing the custom script and the path to the S3 model artifacts as input.

In [None]:
import time
from sagemaker.xgboost import XGBoostModel

code_location = 's3://{0}/{1}/code'.format(bucket_name, prefix)

xgboost_model = XGBoostModel(name='end-to-end-ml-sm-xgb-model-{0}'.format(str(int(time.time()))),
                             model_data=xgboost_model_path,
                             entry_point='inference.py',
                             source_dir='xgboost_source_dir/',
                             code_location=code_location,
                             framework_version='0.90-2',
                             py_version='py3',
                             role=role, 
                             sagemaker_session=sagemaker_session)

## Pipeline Model

After creating the model objects for the two models, you deploy them in a pipeline by building a `PipelineModel` object and calling the `deploy()` method. The data capture configuration instructs the pipeline to collect the input to the endpoint and the output from the endpoint for every inference and store it in S3. You will need the collected data in the optional model monitoring section in this notebook.

In [None]:
import sagemaker
import time
from sagemaker.pipeline import PipelineModel
from sagemaker.model_monitor import DataCaptureConfig

s3_capture_upload_path = 's3://{}/{}/monitoring/datacapture'.format(bucket_name, prefix)
print(s3_capture_upload_path)

pipeline_model_name = 'end-to-end-ml-sm-xgb-skl-pipeline-{0}'.format(str(int(time.time())))

pipeline_model = PipelineModel(
    name=pipeline_model_name, 
    role=role,
    models=[
        sklearn_model, 
        xgboost_model],
    sagemaker_session=sagemaker_session)

endpoint_name = 'end-to-end-ml-sm-pipeline-endpoint-{0}'.format(str(int(time.time())))
print(endpoint_name)

pipeline_model.deploy(initial_instance_count=1, 
                      instance_type='ml.m5.xlarge', 
                      endpoint_name=endpoint_name,
                      data_capture_config=DataCaptureConfig(
                          enable_capture=True,
                          sampling_percentage=100,
                          destination_s3_uri=s3_capture_upload_path))

<span style="color: red; font-weight:bold">Please take note of the endpoint name, since it will be used in the next workshop module.</span>

## Inference

You can now invoke the pipeline to perform inference on example input values:

In [None]:
from sagemaker.serializers import CSVSerializer
from sagemaker.deserializers import CSVDeserializer
from sagemaker.predictor import Predictor

predictor = Predictor(
    endpoint_name=endpoint_name,
    sagemaker_session=sagemaker_session,
    serializer=CSVSerializer(),
    deserializer=CSVDeserializer())

#'Type', 'Air temperature [K]', 'Process temperature [K]', 'Rotational speed [rpm]', 'Torque [Nm]', 'Tool wear [min]'
payload = "L,298.4,308.2,1582,70.7,216"
print(predictor.predict(payload))

payload = "M,298.4,308.2,1582,30.2,214"
print(predictor.predict(payload))

payload = "L,298.4,308.2,30,70.7,216"
print(predictor.predict(payload))

### View captured data

The delivery of capture data to Amazon S3 can take a couple of minutes, so wait for two minutes in the next cell. If an error occurs in the subsequent cell, please retry after a minute or so.

In [None]:
import time
time.sleep(120)

The data capture files from different time periods are organized based on the hour in which the invocation occurred. List the captured data files stored in S3.

In [None]:
s3_client = boto3.Session().client('s3')
current_endpoint_capture_prefix = '{}/monitoring/datacapture/{}'.format(prefix, endpoint_name)

result = s3_client.list_objects(Bucket=bucket_name, Prefix=current_endpoint_capture_prefix)
capture_files = ['s3://{0}/{1}'.format(bucket_name, capture_file.get("Key")) for capture_file in result.get('Contents')]

print("Capture Files: ")
print("\n ".join(capture_files))

Read the contents of one of these files and see how captured records are organized in JSON lines format.

In [None]:
!aws s3 cp {capture_files[0]} datacapture/captured_data_example.jsonl
!head datacapture/captured_data_example.jsonl

Better understand the content of each JSON line. Note that for each inference request, you have access to input data, output data and some metadata like the inference time.

In [None]:
import json
with open ("datacapture/captured_data_example.jsonl", "r") as myfile:
    data=myfile.read()

print(json.dumps(json.loads(data.split('\n')[0]), indent=2))

Stop the execution of the notebook if the user has chosen to run all cells, as the rest of the notebook is optional.

In [None]:
class StopExecution(Exception):
    def _render_traceback_(self):
        pass

raise StopExecution

## You have completed the Model Deployment

You have completed the model deployment step. The model endpoint is now deployed and is ready to serve inference requests.

If you want to continue with the optional part for model monitoring, continue through the cells in this notebook. 

Otherwise, open **README.md** in module 6.

# (Optional) Model monitoring

The rest of this note book is optional and shows how to spot drifts. If you are not interested in the model monitoring, you can proceed to the next module.

## Baselining

You select the relevant attributes from the training dataset and generate a dataset for baselining. You then use Amazon SageMaker Model Monitor to suggest a set of baseline constraints and descriptive statistics. 

In [None]:
import pandas as pd

raw_data = 's3://{0}/{1}/data/raw/predictive_maintenance_raw_data_header.csv'.format(bucket_name, prefix)
baseline_data = 's3://{0}/{1}/data/baseline/baseline_data.csv'.format(bucket_name, prefix)
columns = ['Type', 'Air temperature [K]', 'Process temperature [K]', 
           'Rotational speed [rpm]', 'Torque [Nm]', 'Tool wear [min]', 'Machine failure']

df = pd.read_csv(raw_data, usecols=columns)
df.to_csv(baseline_data, index=None)

In [None]:
baseline_data_path = 's3://{0}/{1}/data/baseline'.format(bucket_name, prefix)
baseline_results_path = 's3://{0}/{1}/monitoring/baselining/results'.format(bucket_name, prefix)

print(baseline_data_path)
print(baseline_results_path)

Please note that the baselining job will require 8-10 minutes. In the meantime, take a look at the [Deequ library](https://github.com/awslabs/deequ), which the default Model Monitor container uses to perform this analysis.

In [None]:
from sagemaker.model_monitor import DefaultModelMonitor
from sagemaker.model_monitor.dataset_format import DatasetFormat

my_default_monitor = DefaultModelMonitor(
    role=role,
    instance_count=1,
    instance_type='ml.c5.4xlarge',
    volume_size_in_gb=20,
    max_runtime_in_seconds=3600,
)

In [None]:
my_default_monitor.suggest_baseline(
    baseline_dataset=baseline_data_path,
    dataset_format=DatasetFormat.csv(header=True),
    output_s3_uri=baseline_results_path,
    wait=True
)



Display the statistics generated by the baselining job.


In [None]:
import pandas as pd

baseline_job = my_default_monitor.latest_baselining_job
schema_df = pd.json_normalize(baseline_job.baseline_statistics().body_dict["features"])
schema_df

Visualize the constraints.

In [None]:
constraints_df = pd.json_normalize(baseline_job.suggested_constraints().body_dict["features"])
constraints_df

### Switching order of target variable

Amazon SageMaker Model Monitor expects the target variable to be the first feature of the dataset when comparing captured data with the baseline.
However, since the dataset you used for baselining had the 'Machine failure' variable as the last feature, you should switch its order in the generated statistics and constraints file.

In [None]:
statistics_path = baseline_results_path + '/statistics.json'
constraints_path = baseline_results_path + '/constraints.json'

!aws s3 cp {statistics_path} baseline/
!aws s3 cp {constraints_path} baseline/

In [None]:
import json

with open('baseline/statistics.json', 'r') as statistics_file:
    loaded_statistics = json.load(statistics_file)

loaded_statistics['features'].insert(0, loaded_statistics['features'][-1])
del loaded_statistics['features'][-1]

with open('baseline/statistics.json', 'w') as statistics_file:
    json.dump(loaded_statistics, statistics_file)

In [None]:
!aws s3 cp baseline/statistics.json {statistics_path} 

In [None]:
with open('baseline/constraints.json', 'r') as constraints_file:
    loaded_constraints = json.load(constraints_file)

loaded_constraints['features'].insert(0, loaded_constraints['features'][-1])
del loaded_constraints['features'][-1]

with open('baseline/constraints.json', 'w') as constraints_file:
    json.dump(loaded_constraints, constraints_file)

In [None]:
!aws s3 cp baseline/constraints.json {constraints_path} 

### Results

The baselining job has inspected the validation dataset and generated constraints and statistics, that will be used to monitor the endpoint.
### Generating violations artificially

In order to get some result relevant to monitoring analysis, generate artificially several inferences with feature values causing specific violations, then invoke the endpoint with this data.

In [None]:
import time

#'Type', 'Air temperature [K]', 'Process temperature [K]', 'Rotational speed [rpm]', 'Torque [Nm]', 'Tool wear [min]', 'Machine failure'
# Air temperature [K] -> set to an integral instead of fractional
# Rotational speed [rpm] -> set to a large value
# Tool wear [min] -> set to a large value
artificial_values = "L,248,308.2,{0},70.7,{1}"
for i in range(200):
    predictor.predict(artificial_values.format(str(2500 + i), str(200+i)))
    time.sleep(0.15)
print('Executed 200 inferences.')


## Monitoring

Once you have built the baseline for our data, you can enable endpoint monitoring by creating a monitoring schedule. When the schedule fires, a monitoring job will be kicked-off to compare the data captured by the endpoint with the baseline. It will then generate report files you can use to analyze monitoring results.
### Create Monitoring Schedule

Create a monitoring schedule for the previously created endpoint. When you create the schedule, you can specify two scripts that will preprocess the records before the analysis takes place and execute post-processing at the end. In this example, you will not use a record preprocessor. You will specify a post-processor that outputs some text for demonstration purposes.


In [None]:
!pygmentize postprocessor.py

In [None]:
import boto3

monitoring_code_prefix = '{0}/monitoring/code'.format(prefix)
print(monitoring_code_prefix)

boto3.Session().resource('s3').Bucket(bucket_name).Object(monitoring_code_prefix + '/postprocessor.py').upload_file('postprocessor.py')
postprocessor_path = 's3://{0}/{1}/monitoring/code/postprocessor.py'.format(bucket_name, prefix)
print(postprocessor_path)

reports_path = 's3://{0}/{1}/monitoring/reports'.format(bucket_name, prefix)
print(reports_path)

You now create the monitoring schedule with an hourly schedule.

In [None]:
from sagemaker.model_monitor import CronExpressionGenerator
from time import gmtime, strftime

endpoint_name = predictor.endpoint_name

mon_schedule_name = 'end-to-end-ml-sm-mon-sch-' + strftime("%Y-%m-%d-%H-%M-%S", gmtime())
my_default_monitor.create_monitoring_schedule(
    monitor_schedule_name=mon_schedule_name,
    endpoint_input=endpoint_name,
    post_analytics_processor_script=postprocessor_path,
    output_s3_uri=reports_path,
    statistics=my_default_monitor.baseline_statistics(),
    constraints=my_default_monitor.suggested_constraints(),
    schedule_cron_expression=CronExpressionGenerator.hourly(),
    enable_cloudwatch_metrics=True
)

### Describe Monitoring Schedule

In [None]:
desc_schedule_result = my_default_monitor.describe_schedule()
desc_schedule_result


### Delete Monitoring Schedule

Once the schedule is created, it will kick of jobs at specified intervals. If you are kicking this off after creating the hourly schedule, you might find the executions empty. You might have to wait till you cross the hour boundary (in UTC) to see executions kick off. Since you don't want to wait for the hour in this example, delete the schedule and use the code in the following step to simulate what will happen when a schedule is triggered by running an Amazon SageMaker Processing Job.


In [None]:
# Note: this is just for the purpose of running this example.
my_default_monitor.delete_monitoring_schedule()

## Triggering execution manually

In oder to trigger the execution manually, find the paths to data capture, baseline statistics, and baseline constraints. Then, use a utility fuction in `monitoringjob_utils.py` to run the processing job.

In [None]:
result = s3_client.list_objects(Bucket=bucket_name, Prefix=current_endpoint_capture_prefix)
capture_files = ['s3://{0}/{1}'.format(bucket_name, capture_file.get("Key")) for capture_file in result.get('Contents')]

print("Capture Files: ")
print("\n ".join(capture_files))

data_capture_path = capture_files[len(capture_files) - 1][: capture_files[len(capture_files) - 1].rfind('/')]
statistics_path = baseline_results_path + '/statistics.json'
constraints_path = baseline_results_path + '/constraints.json'

print(data_capture_path)
print(postprocessor_path)
print(statistics_path)
print(constraints_path)
print(reports_path)

In [None]:
from monitoringjob_utils import run_model_monitor_job_processor

run_model_monitor_job_processor(region, 'ml.m5.xlarge', role, data_capture_path, statistics_path, constraints_path, reports_path,
                                postprocessor_path=postprocessor_path)

## Analysis

When the monitoring job completes, monitoring reports are saved to Amazon S3. List the generated reports.

In [None]:
s3_client = boto3.Session().client('s3')
monitoring_reports_prefix = '{}/monitoring/reports/{}'.format(prefix, predictor.endpoint_name)

result = s3_client.list_objects(Bucket=bucket_name, Prefix=monitoring_reports_prefix)
try:
    monitoring_reports = ['s3://{0}/{1}'.format(bucket_name, capture_file.get("Key")) for capture_file in result.get('Contents')]
    print("Monitoring Reports Files: ")
    print("\n ".join(monitoring_reports))
except:
    print('No monitoring reports found.')

In [None]:
!aws s3 cp {monitoring_reports[0]} monitoring/
!aws s3 cp {monitoring_reports[1]} monitoring/
!aws s3 cp {monitoring_reports[2]} monitoring/

Display the violations identified by the monitoring execution.

In [None]:
import pandas as pd
pd.set_option('display.max_colwidth', None)

file = open('monitoring/constraint_violations.json', 'r')
data = file.read()

violations_df = pd.json_normalize(json.loads(data)['violations'])
violations_df


## Advanced Hints

You might be asking yourself what violations types are monitored and how drift from the baseline is computed.

The types of violations monitored are listed here: https://docs.aws.amazon.com/sagemaker/latest/dg/model-monitor-interpreting-violations.html. Most of them use configurable thresholds that are specified in the monitoring configuration section of the baseline constraints JSON. Take a look at this configuration from the baseline constraints file:


In [None]:
!aws s3 cp {statistics_path} baseline/
!aws s3 cp {constraints_path} baseline/

In [None]:
import json
with open ("baseline/constraints.json", "r") as myfile:
    data=myfile.read()

print(json.dumps(json.loads(data)['monitoring_config'], indent=2))



This configuration is intepreted when the monitoring job is executed and used to compare captured data to the baseline. If you want to customize this section, you will have to update the `constraints.json` file and upload it back to Amazon S3 before launching the monitoring job.

When data distributions are compared to detect potential drift, you can choose between the Simple and Robust comparison method, where the latter has to be preferred when dealing with small datasets. Additional info: https://docs.aws.amazon.com/sagemaker/latest/dg/model-monitor-byoc-constraints.html.


In [None]:
#predictor.delete_endpoint()

After testing the endpoint, you can move to the next workshop module. Please access the module <a href="https://github.com/aws-samples/amazon-sagemaker-build-train-deploy/tree/master/06_API_Gateway_and_Lambda" target="_blank">06_API_Gateway_and_Lambda</a> on GitHub to continue.

## You have completed Module 5

You have now completed the deployment of the model and learned how to monitor the model for possible drifts.

> :warning: **Module 6 does not have a Jupyter Notebook.**. 

Open **README.md** in module 6 to build an HTTP API endpoint and a Lambda function for performing inference against the model endpoint. 