# k-means Bring Your Own Model
_**Hosting a Pre-Trained Model in Amazon SageMaker Algorithm Containers**_

---

---

## Contents

1. [Background](#Background)
1. [Setup](#Setup)
1. [(Optional)](#Optional)
  1. [Data](#Data)
  1. [Train Locally](#Train Locally)
1. [Convert](#Convert)
1. [Host](#Host)
  1. [Confirm](#Confirm)

---
## Background

Amazon SageMaker includes functionality to support a hosted notebook environment, distributed, serverless training, and real-time, autoscaling hosting.  We think it works best when all three of these services are used together, but they can also be used independently.  Some use cases may only require hosting.  Maybe the model was trained prior to Amazon SageMaker existing, in a different service.

This notebook shows how to use a pre-existing model with an Amazon SageMaker Algorithm container to quickly create a hosted endpoint for that model.

---
## Setup

Let's start by specifying:

* AWS region.
* The IAM role arn used to give learning and hosting access to your data. See the documentation for how to specify these.
* The S3 bucket that you want to use for training and model data.

In [None]:
import os
import boto3

os.environ['AWS_DEFAULT_REGION'] = 'us-west-2'
role = boto3.client('iam').list_instance_profiles()['InstanceProfiles'][0]['Roles'][0]['Arn']

bucket = '<your_s3_bucket_here>'
prefix = 'kmeans_bring_your_own_model'

Now we'll import the Python libraries we'll need.

In [None]:
import numpy as np
import sklearn.cluster
import pickle
import gzip
import urllib.request
import json
import mxnet as mx
import boto3
import time
import io

## (Optional)

_This section is only included for illustration purposes.  In a real use case, you'd be bringing your model from an existing process and not need to complete these steps._

### Data

For simplicity, we'll utilize the MNIST handwritten digit dataset.

In [None]:
# Load the dataset
urllib.request.urlretrieve("http://deeplearning.net/data/mnist/mnist.pkl.gz", "mnist.pkl.gz")
f = gzip.open('mnist.pkl.gz', 'rb')
train_set, valid_set, test_set = pickle.load(f, encoding='latin1')
f.close()

### Train Locally

Again for simplicity, let's stick with the k-means algorithm.

In [None]:
kmeans = sklearn.cluster.KMeans(n_clusters=10).fit(train_set[0])

---
## Convert

The model format that Amazon SageMaker's k-means container expects is an MXNet NDArray with dimensions (num_clusters, feature_dim) that contains the cluster centroids.  For our current example, the 10 centroids for the MNIST digits are stored in a (10, 784) dim NumPy array called `kmeans.cluster_centers_`.

_Note: model formats will differ across algorithms, but this concept is generalizable.  Documentation, or just running a toy example and interrogating the resulting model artifact is the best way to understand the specific model format required for different algorithms._

Let's:
- Convert to a MXNet NDArray
- Save to a file `model_algo-1`

In [None]:
centroids = mx.ndarray.array(kmeans.cluster_centers_)
mx.ndarray.save('model_algo-1', [centroids])

- tar and gzip the model array

In [None]:
!tar czvf model.tar.gz model_algo-1

- Load to s3

In [None]:
boto3.Session().resource('s3').Bucket(bucket).Object(os.path.join(prefix, 'model.tar.gz')).upload_file('model.tar.gz')

---
## Host

Stary by defining our model to hosting.

In [None]:
kmeans_model = 'kmeans-scikit-' + time.strftime("%Y-%m-%d-%H-%M-%S", time.gmtime())

im = boto3.client('im')

kmeans_hosting_container = {
    'Image': "900597767885.dkr.ecr.us-east-1.amazonaws.com/aialgorithmskmeanswebscalecontainer:latest",
    'ModelDataUrl': 's3://{}/{}/model.tar.gz'.format(bucket, prefix)
}

create_model_response = im.create_model(
    ModelName=kmeans_model,
    ExecutionRoleArn=role,
    PrimaryContainer=kmeans_hosting_container)

print(create_model_response['ModelArn'])

Then setup our endpoint configuration.

In [None]:
kmeans_endpoint_config = 'kmeans-poc-endpoint-config-' + time.strftime("%Y-%m-%d-%H-%M-%S", time.gmtime())
print(kmeans_endpoint_config)
create_endpoint_config_response = im.create_endpoint_config(
    EndpointConfigName=kmeans_endpoint_config,
    ProductionVariants=[{
        'InstanceType': 'c4.xlarge',
        'MaxInstanceCount': 3,
        'MinInstanceCount': 1,
        'ModelName': kmeans_model,
        'VariantName': 'AllTraffic'}])

print("Endpoint Config Arn: " + create_endpoint_config_response['EndpointConfigArn'])

Finally, initiate our endpoints.

In [None]:
%%time

kmeans_endpoint = 'kmeans-poc-endpoint-' + time.strftime("%Y%m%d%H%M", time.gmtime())
print(kmeans_endpoint)
create_endpoint_response = im.create_endpoint(
    EndpointName=kmeans_endpoint,
    EndpointConfigName=kmeans_endpoint_config)
print(create_endpoint_response['EndpointArn'])

resp = im.describe_endpoint(EndpointName=kmeans_endpoint)
status = resp['EndpointStatus']
print("Status: " + status)

im.get_waiter('Endpoint_Created').wait(EndpointName=kmeans_endpoint)

resp = im.describe_endpoint(EndpointName=kmeans_endpoint)
status = resp['EndpointStatus']
print("Arn: " + resp['EndpointArn'])
print("Status: " + status)

if status != 'InService':
    raise Exception('Endpoint creation did not succeed')

### Confirm
Let's confirm that our model is producing the same results.  We'll take the first 100 records from our training dataset, score them in our hosted endpoint...

In [None]:
def np2csv(arr):
    csv = io.BytesIO()
    np.savetxt(csv, arr, delimiter=',', fmt='%g')
    return csv.getvalue().decode().rstrip()

In [None]:
runtime = boto3.Session().client(service_name='runtime.maeve', endpoint_url='https://maeveruntime.prod.us-west-2.ml-platform.aws.a2z.com')

payload = np2csv(train_set[0][0:100])
response = runtime.invoke_endpoint(EndpointName=kmeans_endpoint,
                                   ContentType='text/csv',
                                   Body=payload)
result = json.loads(response['Body'].read().decode())
scored_labels = np.array([r['closest_cluster'] for r in result['predictions']])

... And then compare them to the model labels from our k-means example.

In [None]:
scored_labels == kmeans.labels_[0:100]