# SageMaker Endpoint
To deploy the model you [previously trained](01_train_pytorch.ipynb), you need to create a SageMaker Endpoint. This is hosted prediction
service that you can use to perform inference.

## Finding the model
This notebook uses a stored model. If you recently ran a training example uses %store% magic, it will be restored in the next cell.

Otherwise, you can pass the URI to the model file (a .tar.gz file) in the `model_data` variable.

You can find your model file through [SageMaker Console](https://us-west-2.console.aws.amazon.com/sagemaker/home?region=us-west-2#/dashboard)
by choosing **Training > Training jobs** in the left navigation pane. Find your current training job, choose it and then look for the s3:// link in the **Output** pane. If you have not run the [training notebook](01_train_pytorch.ipynb) you can still run this notebook by using a model artifact we have pretrained. To do so, uncomment the `model_data` line in the next cell that manually sets the model's URI. 

In [None]:
# Retrieve a saved model from a previous notebook run's stored variable
%store -r model_data

# If no model was found, set it manually here.
# model_data = 's3://sagemaker-us-west-2-688520471316/pytorch-herring-mnist-2020-10-16-17-15-16-419/output/model.tar.gz'

print("Using this model: {}".format(model_data))

## Create a model object
You define the model object by using SageMaker SDK's `PyTorchModel` and pass in the model from the `model_data` and `entry_point`. 
The endpoint's entry point for inference is defined by `model_fn` as seen in the following code block that prints out `inference.py`. 
The function loads the model and sets it to use GPU, if available.

In [None]:
!pygmentize code/inference.py

In [None]:
import sagemaker

role = sagemaker.get_execution_role()

from sagemaker.pytorch import PyTorchModel
model = PyTorchModel(
    model_data=model_data,
    source_dir='code',
    entry_point='inference.py',
    role=role,
    framework_version='1.6.0',
    py_version='py3')


## Deploy the model on an endpoint
You create a `predictor` by using the `model.deploy` function. You can optionally change both the instance count and instance type.

In [None]:
predictor = model.deploy(
    initial_instance_count=1,
    instance_type='local'
)

In [None]:
# Download the test set
import numpy as np
import random
import matplotlib.pyplot as plt
%matplotlib inline

from code.utils import mnist_to_numpy

data_dir = './data'
X, Y = mnist_to_numpy(data) # test images and labels

# randomly sample 16 images from the test set
mask = random.sample(range(X.shape[0]), 16)
samples = X[mask]

# plot the images for inspect
fig, axs = plt.subplots(nrows=1, ncols=16, figsize=(16, 1))

for i, splt in enumerate(axs):
    splt.imshow(samples[i])

In [None]:
from code.utils import adjust_to_framework, normalize

# Send the sampled images to the endpoint for inference
samples = adjust_to_framework(
    normalize(samples, axis=(1, 2)), framework='pytorch')

# depolyed model only accept 32-bit floating pt as input
outputs = predictor.predict(samples.astype(np.float32))

predictions = np.argmax(np.array(outputs, np.float32), axis=1)
print("Predictions: \n", predictions)

### Clean-up
If you don't intend to do anything else with the endpoint you should delete it. You can also
free some disk space by deleting the MNIST data.

In [None]:
import shutil
predictor.delete_endpoint()
shutil.rmtree(data_dir)