# SageMaker Inference Recommender for Deployment Recommendation

## Contents
[1. Introduction](#1.-Introduction)  
[2. Download the Model & Payload](#2.-Download-the-Model-&-Payload)  
[3. Create the Model](#3.-Create-the-Model)  
[4. Describe the Model to Inspect Deployment Recommendations](#4.-Describe-the-Model-to-Inspect-Deployment-Recommendations)  
[5. Deploy the Model to Endpoint with Python SDK](#5:-Deploy-the-Model-to-Endpoint-with-Python-SDK)   
[6. Invoke the Endpoint & Produce Inference](#6.-Invoke-the-Endpoint-&-Produce-Inference)

## 1. Introduction

SageMaker Inference Recommender is a new capability of SageMaker that reduces the time required to get machine learning (ML) models in production by automating performance benchmarking and load testing models across SageMaker ML instances. You can use Inference Recommender to deploy your model to a real-time inference endpoint that delivers the best performance at the lowest cost. 

deployment Recommendation is a new data-driven machine learning based capability proposed for Inference Recommender that will provide a recommendation without running benchmarks. This means you don’t have to wait approximately 20-40 minutes for a benchmark to run before getting a recommendation.


To begin, let's update the required packages i.e. SageMaker Python SDK, `boto3`, `botocore` and `awscli`

In [None]:
!pip install -U sagemaker

In [None]:
import sys

!{sys.executable} -m pip install sagemaker botocore boto3 awscli --upgrade
!pip install --upgrade pip awscli botocore boto3  --quiet

### Set up Client and Session

In [None]:
import sagemaker
import boto3

region = boto3.Session().region_name
role = sagemaker.get_execution_role()
sm_client = boto3.client("sagemaker", region_name=region)
sagemaker_session = sagemaker.Session()

## 2. Download the Model & Payload

In this example, we are using a pre-trained scikit-learn model, trained on the California Housing dataset, present in Scikit-Learn: https://scikit-learn.org/stable/modules/generated/sklearn.datasets.fetch_california_housing.html. The California Housing dataset was originally published in:

> Pace, R. Kelley, and Ronald Barry. "Sparse spatial auto-regressions." Statistics & Probability Letters 33.3 (1997): 291-297.

### Download the Model

In [None]:
import os

export_dir = "./model/"

if not os.path.exists(export_dir):
    os.makedirs(export_dir)
    print("Directory ", export_dir, " Created ")
else:
    print("Directory ", export_dir, " already exists")

model_archive_name = "sk-model.tar.gz"

In [None]:
!aws s3 cp s3://sagemaker-sample-files/models/california-housing/model.joblib {export_dir}

### Tar the model and code

In [None]:
!tar -cvpzf {model_archive_name} -C ./model "model.joblib" -C ../code "inference.py"

### Download the payload 

In [None]:
payload_location = "./sample-payload/"

if not os.path.exists(payload_location):
    os.makedirs(payload_location)
    print("Directory ", payload_location, " Created ")
else:
    print("Directory ", payload_location, " already exists")

payload_archive_name = "sk_payload.tar.gz"

In [None]:
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
import pandas as pd

data = fetch_california_housing()

X_train, X_test, y_train, y_test = train_test_split(
    data.data, data.target, test_size=0.25, random_state=42
)

# we don't train a model, so we will need only the testing data
testX = pd.DataFrame(X_test, columns=data.feature_names)
# Save testing data to CSV
testX[data.feature_names].head(10).to_csv(
    os.path.join(payload_location, "test_data.csv"), header=False, index=False
)

### Tar the payload

In [None]:
!cd ./sample-payload/ && tar czvf ../{payload_archive_name} *

### Upload Your model and payload to S3

We will be uploading the pretrained model and corresponding test set as `sk-model.tar.gz` and as `sk_payload.tar.gz` to S3.

In [None]:
prefix = "sagemaker/scikit-learn-inference-recommender"

model_url = sagemaker_session.upload_data(model_archive_name, key_prefix=prefix)
sample_payload_url = sagemaker_session.upload_data(payload_archive_name, key_prefix=prefix)

print("model uploaded to: {}".format(model_url))
print("sample payload uploaded to: {}".format(sample_payload_url))

## 3. Create the Model

In this example we will create the model with SageMaker hosting `create_model` API, which will initiate an asynchronous workflow in the existing Inference Recommender stack. This workflow will generate recommendations.

In [None]:
import time
from sagemaker import image_uris

model_name = "sklearn-" + str(round(time.time()))

image = image_uris.retrieve(
    framework="sklearn", region=region, version="1.0-1", image_scope="inference"
)
primary_container = {"Image": image, "ModelDataUrl": model_url}

create_model_response = sm_client.create_model(
    ModelName=model_name, ExecutionRoleArn=role, PrimaryContainer=primary_container
)

print(create_model_response)

## 4. Describe the Model to Inspect Deployment Recommendations

Deployment recommendations emitted in the SageMaker hosting `describe_model` API response itself

In [None]:
describe_model_response = sm_client.describe_model(ModelName=model_name)

print(describe_model_response)
print(describe_model_response.deployment_recommendations)

## 5. Deploy the Model to Endpoint with Python SDK
Each deployment recommendation is uniquely identified by `RecommendationId`. You can deploy specific recommendation with this ID with Python SDK.

In [None]:
import uuid
from sagemaker.sklearn.model import SKLearnModel
from sagemaker.sklearn.model import SKLearnPredictor

model = SKLearnModel(
    model_data=model_url,
    role=role,
    image_uri=image,
    entry_point="./code/inference.py",
    framework_version="1.0-1",
)

# Substitue recommendation_id with the one you want to deploy
# Here we choose the first recommendation to deploy
recommendation_id = describe_model_response["DeploymentRecommendation"][
    "RealTimeInferenceRecommendations"
][0]["RecommendationId"]
model.predictor_cls = SKLearnPredictor

endpoint_name = "notebook-test-" + str(uuid.uuid4())
predictor = model.deploy(recommendation_id=recommendation_id, endpoint_name=endpoint_name)

## 6. Invoke the Endpoint & Produce Inference

In [None]:
import pandas as pd

payload = pd.read_csv("./sample-payload/test_data.csv")

inference = predictor.predict(payload)

print(inference)

## 7. Clean up the resources if needed

In [None]:
# Delete model and endpoint
predictor.delete_model()
predictor.delete_endpoint()

## 8. Conclusion
This notebook illustrates how to use SageMaker Inference Recommender's new feature `DeploymentRecommendation` to get recommendations without running benchmarks, and use Python SDK `model.deploy` method to deploy the deployment recommendation with recommendation ID specified.

The notebook works you through downloading a pre-trained scikit-learn model, creating the model which triggers deployment recommendation workflow, inspecting recommendations and deploying it, invoking endpoint to produce inference and cleaning up the resources created.