#### PyTorch Complete Project Workflow in Amazon SageMaker
### Model Deployment
    
1. [Local Mode endpoint](#LocalModeEndpoint)
2. [SageMaker hosted endpoint](#SageMakerHostedEndpoint)
3. [Multi-Model endpoints](#MultiModelEndpoints)
4. [Production Variants with Model Monitor](#ProductionVariants)
5. [Invoking SageMaker endpoints](#InvokingSageMakerEndpoints)
6. [Clean up resources](#CleanUp)

## Local Mode endpoint <a class="anchor" id="LocalModeEndpoint">

While Amazon SageMaker’s Local Mode training is very useful to make sure your training code is working before moving on to full scale training, it also would be useful to have a convenient way to test your model locally before incurring the time and expense of deploying it to production. One possibility is to fetch the XGBoost artifact or a model checkpoint saved in Amazon S3, and load it in your notebook for testing. However, an even easier way to do this is to use the SageMaker Python SDK to do this work for you by setting up a Local Mode endpoint.

More specifically, the Estimator object from the Local Mode training job can be used to deploy a model locally. With one exception, this code is the same as the code you would use to deploy to production. In particular, all you need to do is invoke the local Estimator's deploy method, and similarly to Local Mode training, specify the instance type as either `local_gpu` or `local` depending on whether your notebook is on a GPU instance or CPU instance.  

First, we'll import the variables stored from previous notebooks.

In [None]:
!pip install sagemaker==1.72.0

In [None]:
from parameter_store import ParameterStore
import sagemaker
from sagemaker.session import s3_input
import numpy as np

ps = ParameterStore()
parameters = ps.read()

bucket = parameters['bucket']
s3_prefix = parameters['s3_prefix']
raw_s3 = parameters['raw_s3']
train_dir = parameters['train_dir']
test_dir = parameters['test_dir']
train_dir_csv = parameters['train_dir_csv']
test_dir_csv = parameters['test_dir_csv']
local_model_data = parameters['local_model_data']
remote_model_data = parameters['remote_model_data']
training_job_name = parameters['training_job_name']
tuning_job_name = parameters['tuning_job_name']
s3_input_train_uri = parameters['s3_input_train_uri']
s3_input_test_uri = parameters['s3_input_test_uri']
role = parameters['role']
sess = sagemaker.Session()

s3_input_train = s3_input(s3_input_train_uri, content_type='csv')
s3_input_test = s3_input(s3_input_test_uri, content_type='csv')
inputs = {'train': s3_input_train, 'test': s3_input_test}

x_test = np.load('./data/test/x_test.npy')
y_test = np.load('./data/test/y_test.npy')

The following single line of code deploys the model locally in the SageMaker XGBoost container using the model artifacts from our local training job:  

In [None]:
from sagemaker.xgboost.model import XGBoostModel

local_model = XGBoostModel(entry_point='train_deploy.py', model_data=local_model_data, role=role, framework_version='1.0-1')
local_predictor = local_model.deploy(initial_instance_count=1, instance_type='local')

To get predictions from the Local Mode endpoint, simply invoke the Predictor's predict method.

In [None]:
from sagemaker.predictor import json_deserializer
from sagemaker.predictor import csv_serializer

local_predictor.content_type = 'text/csv'
local_predictor.accept = 'text/csv'
local_predictor.serializer = csv_serializer
local_predictor.deserializer = json_deserializer

local_predictor.predict(x_test[0])

As a sanity check, the predictions can be compared against the actual target values.

In [None]:
local_results = [local_predictor.predict(x_test[i]) for i in range(0, 10)]
print(f'predictions: \t {local_results}')
print(f'target values: \t {y_test[:10]}')

We only trained the model for a few rounds, but the predictions so far should at least appear reasonably within the ballpark.

To avoid having the SageMaker TensorFlow Serving container indefinitely running locally, simply gracefully shut it down by calling the `delete_endpoint` method of the Predictor object.

In [None]:
local_predictor.delete_endpoint()

## SageMaker hosted endpoint <a class="anchor" id="SageMakerHostedEndpoint">

Assuming the best model from the tuning job is better than the model produced by the individual Hosted Training job above, we could now easily deploy that model to production.  A convenient option is to use a SageMaker hosted endpoint, which serves real time predictions from the trained model (Batch Transform jobs also are available for asynchronous, offline predictions on large datasets). The endpoint will retrieve the XGBoost saved model created during training and deploy it within a SageMaker XGBoost Serving container. This all can be accomplished with one line of code.  

More specifically, by calling the `deploy` method of the HyperparameterTuner object we instantiated above, we can directly deploy the best model from the tuning job to a SageMaker hosted endpoint.  It will take several minutes longer to deploy the model to the hosted endpoint compared to the Local Mode endpoint, which is more useful for fast prototyping of inference code.  

In [None]:
from sagemaker.amazon.amazon_estimator import get_image_uri
from sagemaker.estimator import Estimator

container = get_image_uri(sess.boto_region_name, 'xgboost')
train_instance_type = 'ml.m4.xlarge'
hyperparameters = {'num_round': 8}
model = Estimator.attach(training_job_name)
predictor = model.deploy(initial_instance_count=1, instance_type='ml.t2.medium', endpoint_name='xgboost-housing')

To get predictions from the hosted endpoint, simply invoke the Predictor's predict method.

In [None]:
predictor.content_type = 'text/csv'
predictor.accept = 'text/csv'
predictor.serializer = csv_serializer
predictor.deserializer = json_deserializer

predictor.predict(x_test[0])

We can compare the predictions generated by this endpoint with those generated locally by the Local Mode endpoint: 

In [None]:
hosted_results = [predictor.predict(x_test[i]) for i in range(0, 10)]
print(f'local predictions: \t {local_results}')
print(f'hosted predictions: \t {hosted_results}')

### SageMaker hosted endpoint with autotuned parameters

In [None]:
from sagemaker.tuner import HyperparameterTuner, IntegerParameter

# Parameters from last notebook
hyperparameter_ranges = {
  'num_round': IntegerParameter(2, 10)
}

tuner_parameters = {'estimator':model,
                    'objective_metric_name':'validation:aucpr',
                    'hyperparameter_ranges':hyperparameter_ranges,
                    #'metric_definitions':metric_definitions,
                    'max_jobs':4,
                    'max_parallel_jobs':2}
tuner_parameters['estimator'] = model

tuner = HyperparameterTuner(**tuner_parameters)
tuner = tuner.attach(tuning_job_name)
tuning_predictor = tuner.deploy(initial_instance_count=1, instance_type='ml.t2.medium',
                                endpoint_name='xgboost-housing-auto')

We can compare the predictions generated by this endpoint with those generated locally by the Local Mode endpoint: 

In [None]:
tuning_predictor.content_type = 'text/csv'
tuning_predictor.accept = 'text/csv'
tuning_predictor.serializer = csv_serializer
tuning_predictor.deserializer = json_deserializer

In [None]:
hosted_results = [tuning_predictor.predict(x_test[i]) for i in range(0, 10)]
print(f'local predictions: \t {local_results}')
print(f'tuner predictions: \t {hosted_results}')

## Invoking SageMaker Endpoints <a class="anchor" id="InvokingSageMakerEndpoints">

Let's restore the endpoint names we created from our parameters file just in case you decided to shut down the kernel or notebook.

In the code so far, we've seen examples of training a model, deploying it as an endpoint, then using that deployed model object to do predictions. But what if we want to call an existing SageMaker endpoint? Well, there are a couple ways to do this. The first is with SageMaker's Python SDK and the second with boto3.

Calling an endpoint with SageMaker's Python SDK:

In [None]:
import boto3
import sagemaker
from sagemaker.predictor import RealTimePredictor

sess = sagemaker.Session()

predictor = RealTimePredictor(endpoint='xgboost-housing',
                              sagemaker_session=sess,
                              serializer=csv_serializer,
                              deserializer=json_deserializer)

predictor.predict(x_test[0])

Or call an endpoint using boto3

In [None]:
import json

sm_runtime = boto3.client('sagemaker-runtime')
# Create a CSV string from the numpy array
payload = ', '.join([str(each) for each in x_test[0]])
prediction = sm_runtime.invoke_endpoint(EndpointName='xgboost-housing',
                                        ContentType='text/csv',
                                        Body=payload)
prediction = json.loads(prediction['Body'].read())
prediction

## Clean Up <a class="anchor" id="CleanUp">

To avoid billing charges from stray resources, you can delete the prediction endpoint to release its associated instance(s).

In [None]:
predictor.delete_endpoint(delete_endpoint_config=True)
tuning_predictor.delete_endpoint(delete_endpoint_config=True)