## TensorFlow 2 Complete Project Workflow in Amazon SageMaker
### Model Deployment
    
1. [Local Mode endpoint](#LocalModeEndpoint)
2. [SageMaker hosted endpoint](#SageMakerHostedEndpoint)

## Local Mode endpoint <a class="anchor" id="LocalModeEndpoint">

While Amazon SageMaker’s Local Mode training is very useful to make sure your training code is working before moving on to full scale training, it also would be useful to have a convenient way to test your model locally before incurring the time and expense of deploying it to production. One possibility is to fetch the TensorFlow SavedModel artifact or a model checkpoint saved in Amazon S3, and load it in your notebook for testing. However, an even easier way to do this is to use the SageMaker Python SDK to do this work for you by setting up a Local Mode endpoint.

More specifically, the Estimator object from the Local Mode training job can be used to deploy a model locally. With one exception, this code is the same as the code you would use to deploy to production. In particular, all you need to do is invoke the local Estimator's deploy method, and similarly to Local Mode training, specify the instance type as either `local_gpu` or `local` depending on whether your notebook is on a GPU instance or CPU instance.  

First, we'll import the variables stored from previous notebooks.

In [None]:
%store -r

The following single line of code deploys the model locally in the SageMaker TensorFlow Serving container using the model artifacts from our local training job:  

In [None]:
from sagemaker.tensorflow.serving import Model

model = Model(model_data=local_model_data, role=role, framework_version='2.1')

local_predictor = model.deploy(initial_instance_count=1, instance_type='local')

To get predictions from the Local Mode endpoint, simply invoke the Predictor's predict method.

In [None]:
local_results = local_predictor.predict(x_test[:10])['predictions']

As a sanity check, the predictions can be compared against the actual target values.

In [None]:
import numpy as np

local_preds_flat_list = [float('%.1f'%(item)) for sublist in local_results for item in sublist]
print('predictions: \t{}'.format(np.array(local_preds_flat_list)))
print('target values: \t{}'.format(y_test[:10].round(decimals=1)))

We only trained the model for a few epochs and there is much room for improvement, but the predictions so far should at least appear reasonably within the ballpark.  

To avoid having the SageMaker TensorFlow Serving container indefinitely running locally, simply gracefully shut it down by calling the `delete_endpoint` method of the Predictor object.

In [None]:
local_predictor.delete_endpoint()

##  SageMaker hosted endpoint <a class="anchor" id="SageMakerHostedEndpoint">

Assuming the best model from the tuning job is better than the model produced by the individual Hosted Training job above, we could now easily deploy that model to production.  A convenient option is to use a SageMaker hosted endpoint, which serves real time predictions from the trained model (Batch Transform jobs also are available for asynchronous, offline predictions on large datasets). The endpoint will retrieve the TensorFlow SavedModel created during training and deploy it within a SageMaker TensorFlow Serving container. This all can be accomplished with one line of code.  

More specifically, by calling the `deploy` method of the HyperparameterTuner object we instantiated above, we can directly deploy the best model from the tuning job to a SageMaker hosted endpoint.  It will take several minutes longer to deploy the model to the hosted endpoint compared to the Local Mode endpoint, which is more useful for fast prototyping of inference code.  

In [None]:
from sagemaker.tensorflow import TensorFlow
from sagemaker.tuner import HyperparameterTuner

estimator = TensorFlow(**estimator_parameters)
tuner_parameters['estimator'] = estimator

tuner = HyperparameterTuner(**tuner_parameters)
tuner = tuner.attach(tuning_job_name)
tuning_predictor = tuner.deploy(initial_instance_count=1, instance_type='ml.t2.medium')

We can compare the predictions generated by this endpoint with those generated locally by the Local Mode endpoint: 

In [None]:
results = tuning_predictor.predict(x_test[:10])['predictions'] 
flat_list = [float('%.1f'%(item)) for sublist in results for item in sublist]
print('predictions: \t{}'.format(np.array(flat_list)))
print('target values: \t{}'.format(y_test[:10].round(decimals=1)))

To avoid billing charges from stray resources, you can delete the prediction endpoint to release its associated instance(s).

In [None]:
tuning_predictor.delete_endpoint(delete_endpoint_config=True)