# SageMaker endpoint

To deploy the model you previously trained, you need to create a Sagemaker Endpoint. This is a hosted prediction service that you can use to perform inference.

## Finding the model

This notebook uses a stored model if it exists. If you recently ran a training example that use the `%store%` magic, it will be restored in the next cell.

Otherwise, you can pass the URI to the model file (a .tar.gz file) in the `model_data` variable.

You can find your model files through the [SageMaker console](https://console.aws.amazon.com/sagemaker/home) by choosing **Training > Training jobs** in the left navigation pane. Find your recent training job, choose it, and then look for the `s3://` link in the **Output** pane. Uncomment the model_data line in the next cell that manually sets the model's URI.

In [None]:
# Retrieve a saved model from a previous notebook run's stored variable
%store -r model_data

# If no model was found, set it manually here.
# model_data = 's3://sagemaker-us-west-2-688520471316/pytorch-herring-mnist-2020-10-16-17-15-16-419/output/model.tar.gz'

print("Using this model: {}".format(model_data))

## Create a model object

You define the model object by using SageMaker SDK's `TensorFlowModel` and pass in the model from the `estimator` and the `entry_point`. The function loads the model and sets it to use a GPU, if available.

In [None]:
import sagemaker
role = sagemaker.get_execution_role()

from sagemaker.tensorflow import TensorFlowModel
model = TensorFlowModel(model_data=model_data, role=role, framework_version='2.3')

### Deploy the model on an endpoint

You create a `predictor` by using the `model.deploy` function. You can optionally change both the instance count and instance type.

In [None]:
predictor = model.deploy(initial_instance_count=1, instance_type='ml.m4.xlarge')

### Test the endpoint

In [None]:
# Download the test set
import tensorflow as tf

dataset = tf.keras.datasets.mnist.load_data(
    path='mnist.npz'
)

_, (test_imgs, test_labels) = dataset

# Randomly select 16 images from the test images
import numpy as np
import random

mask = random.sample(range(0, len(test_imgs)), 16)
mask = np.array(mask, dtype=np.int8)
samples = test_imgs[mask]

# Inspect sample images
import matplotlib.pyplot as plt
%matplotlib inline

fig, ax = plt.subplots(nrows=2, ncols=8, sharex=True, sharey=True)
for i, row in enumerate(ax):
    for j, col in enumerate(row):
        col.imshow(samples[8*i+j].reshape(28, 28))

In [None]:
# Send the samples to the endpoint for inference
samples = np.expand_dims(samples, axis=3)
outputs = predictor.predict(samples)['predictions']
outputs = np.array(outputs, dtype=np.float32)


print("Predictions: ")
print(np.argmax(outputs, axis=1))


## Cleanup

If you don't intend to try out inference or to do anything else with the endpoint, you should delete the endpoint.

In [None]:
predictor.delete_endpoint()