##MLflow quickstart part 2: serving models using Amazon SageMaker

The [first part of this guide](https://docs.databricks.com/applications/mlflow/tracking-ex-scikit.html), **MLflow quickstart: model training and logging**, focuses on training a model and logging the training metrics, parameters, and model to the MLflow tracking server. 

##### NOTE: Do not use *Run All* with this notebook. It takes several minutes to deploy and update models in SageMaker, and models cannot be queried until they are active.

This part of the guide consists of the following sections:

#### Setup
* Select a model to deploy using the MLflow tracking UI

#### Deploy a model
* Deploy the selected model to SageMaker using the MLflow API
* Check the status and health of the deployed model
  * Determine if the deployed model is active and ready to be queried

#### Query the deployed model
* Load an input vector that the deployed model can evaluate
* Query the deployed model using the input

#### Manage the deployment
* Update the deployed model using the MLflow API
* Query the updated model

#### Clean up the deployment
* Delete the model deployment using the MLflow API

As in the first part of the quickstart tutorial, this notebook uses ElasticNet models trained on the `diabetes` dataset in scikit-learn.

## Prerequisites

ElasticNet models from the MLflow quickstart notebook in [part 1 of the quickstart guide](https://docs.databricks.com/applications/mlflow/tracking-ex-scikit.html).

### Setup

1. Ensure you are using or create a cluster specifying: 
  * **Python Version:** Python 3
  * An attached IAM role that supports SageMaker deployment. For information about setting up a cluster IAM role for SageMaker deployment, see the [SageMaker deployment guide](https://docs.databricks.com/administration-guide/cloud-configurations/aws/sagemaker.html).
1. If you are running Databricks Runtime, uncomment and run Cmd 5 to install the required libraries. If you are running Databricks Runtime for Machine Learning, you can skip this step as the required libraries are already installed. 
1. Attach this notebook to the cluster.

In [None]:
#dbutils.library.installPyPI("mlflow", version="1.12.0", extras="extras")
#dbutils.library.restartPython()

Choose a run ID associated with an ElasticNet training run from [part 1 of the quickstart guide](https://docs.databricks.com/applications/mlflow/tracking-ex-scikit.html). You can find a run ID and model path from the experiment run, which can be found on the MLflow UI run details page:

![image](https://docs.databricks.com/_static/images/mlflow/mlflow-deployment-example-run-info.png)

### Set region, run ID, model URI

**Note**: You must create a new SageMaker endpoint for each new region.

In [4]:
region = "us-east-2"
# run_id1 = "a9ef3c342fd1463db688d1025f378695"
model_uri = "mlruns/0/4079c93d54ec4d7a81aa0a73c8e72593/artifacts/model"

### Deploy a model

In this section, deploy the model you selected during **Setup** to SageMaker.

Specify a Docker image in Amazon's Elastic Container Registry (ECR). SageMaker uses this image to serve the model.  
To obtain the container URL, build the `mlflow-pyfunc` image and upload it to an ECR repository using the MLflow CLI: `mlflow sagemaker build-and-push-container`.

Define the ECR URL for the `mlflow-pyfunc` image that will be passed as an argument to MLflow's `deploy` function.

In [5]:
# Replace <ECR-URL> in the following line with the URL for your ECR docker image
# The ECR URL should have the following format: {account_id}.dkr.ecr.{region}.amazonaws.com/{repo_name}:{tag}
image_ecr_url = "240487350066.dkr.ecr.us-east-2.amazonaws.com/mlflow-pyfunc:1.18.0"

Use MLflow's SageMaker API to deploy your trained model to SageMaker. The `mlflow.sagemaker.deploy()` function creates a SageMaker endpoint as well as all intermediate SageMaker objects required for the endpoint.

In [6]:
import mlflow.sagemaker as mfs
app_name = "diabetes-class"
role = "arn:aws:iam::240487350066:role/databricks-model-deployment"
mfs.deploy(app_name=app_name, model_uri=model_uri, image_url=image_ecr_url, region_name=region, mode="create", execution_role_arn=role, instance_type='ml.m5.xlarge', instance_count=1)

2021/06/30 13:14:16 INFO mlflow.sagemaker: Using the python_function flavor for deployment!
2021/06/30 13:14:16 INFO mlflow.sagemaker: No model data bucket specified, using the default bucket
2021/06/30 13:14:17 INFO mlflow.sagemaker: Default bucket `mlflow-sagemaker-us-east-2-240487350066` not found. Creating...
2021/06/30 13:14:18 INFO mlflow.sagemaker: Bucket creation response: {'ResponseMetadata': {'RequestId': 'QKG222R4N8J494CX', 'HostId': '9cA52CO70RAjLyiOMCEksKomYz5wM/lJXUxSjiel2Njhbe4sTXKuuM0VaO5Yem8KHVKXtMygpJg=', 'HTTPStatusCode': 200, 'HTTPHeaders': {'x-amz-id-2': '9cA52CO70RAjLyiOMCEksKomYz5wM/lJXUxSjiel2Njhbe4sTXKuuM0VaO5Yem8KHVKXtMygpJg=', 'x-amz-request-id': 'QKG222R4N8J494CX', 'date': 'Wed, 30 Jun 2021 17:14:18 GMT', 'location': 'http://mlflow-sagemaker-us-east-2-240487350066.s3.amazonaws.com/', 'server': 'AmazonS3', 'content-length': '0'}, 'RetryAttempts': 0}, 'Location': 'http://mlflow-sagemaker-us-east-2-240487350066.s3.amazonaws.com/'}
2021/06/30 13:14:18 INFO mlflo

#### Using a single function, your model has now been deployed to SageMaker.

Check the status of your new SageMaker endpoint by running the following cell.

**Note**: The application status should be **Creating**. Wait until the status is **InService**; until then, query requests will fail.

In [7]:
import boto3

def check_status(app_name):
  sage_client = boto3.client('sagemaker', region_name=region)
  endpoint_description = sage_client.describe_endpoint(EndpointName=app_name)
  endpoint_status = endpoint_description["EndpointStatus"]
  return endpoint_status

print("Application status is: {}".format(check_status(app_name)))

Application status is: InService


### Query the deployed model

#### Load sample input from the `diabetes` dataset

In [8]:
import numpy as np
import pandas as pd
from sklearn import datasets

# Load diabetes datasets
diabetes = datasets.load_diabetes()
X = diabetes.data
y = diabetes.target

# Create a pandas DataFrame that serves as sample input for the deployed ElasticNet model
Y = np.array([y]).transpose()
d = np.concatenate((X, Y), axis=1)
cols = ['age', 'sex', 'bmi', 'bp', 's1', 's2', 's3', 's4', 's5', 's6', 'progression']
data = pd.DataFrame(d, columns=cols)
query_df = data.drop(["progression"], axis=1).iloc[[0]]

# Convert the sample input dataframe into a JSON-serialized pandas dataframe using the `split` orientation
input_json = query_df.to_json(orient="split")

In [9]:
print("Using input dataframe JSON: {}".format(input_json))

Using input dataframe JSON: {"columns":["age","sex","bmi","bp","s1","s2","s3","s4","s5","s6"],"index":[0],"data":[[0.0380759064,0.0506801187,0.0616962065,0.021872355,-0.0442234984,-0.0348207628,-0.0434008457,-0.002592262,0.0199084209,-0.0176461252]]}


#### Evaluate the sample input by sending an HTTP request
Query the SageMaker endpoint REST API using the `sagemaker-runtime` API provided in `boto3`.

In [10]:
import json

def query_endpoint(app_name, input_json):
  client = boto3.session.Session().client("sagemaker-runtime", region)
  
  response = client.invoke_endpoint(
      EndpointName=app_name,
      Body=input_json,
      ContentType='application/json; format=pandas-split',
  )
  preds = response['Body'].read().decode("ascii")
  preds = json.loads(preds)
  print("Received response: {}".format(preds))
  return preds

print("Sending batch prediction request with input dataframe json: {}".format(input_json))

# Evaluate the input by posting it to the deployed model
prediction1 = query_endpoint(app_name=app_name, input_json=input_json)

Sending batch prediction request with input dataframe json: {"columns":["age","sex","bmi","bp","s1","s2","s3","s4","s5","s6"],"index":[0],"data":[[0.0380759064,0.0506801187,0.0616962065,0.021872355,-0.0442234984,-0.0348207628,-0.0434008457,-0.002592262,0.0199084209,-0.0176461252]]}
Received response: [164.63574590039448]


### Manage the deployment

You can update the deployed model by replacing it with the output of a different run. Specify the run ID associated with a different ElasticNet training run.

In [0]:
run_id2 = "<run-id2>"
model_uri = "runs:/" + run_id2 + "/model"

Call `mlflow.sagemaker.deploy()` in `replace` mode. This updates the `diabetes-class` application endpoint with the model corresponding to the new run ID.

In [0]:
mfs.deploy(app_name=app_name, model_uri=model_uri, image_url=image_ecr_url, region_name=region, mode="replace")

**Note**: The endpoint status should be **Updating**. Only after the endpoint status changes to **InService** do query requests use the updated model.

In [0]:
print("Application status is: {}".format(check_status(app_name)))

Query the updated model. You should get a different prediction.

In [0]:
prediction2 = query_endpoint(app_name=app_name, input_json=input_json)

Compare the predictions.

In [0]:
print("Run ID: {} Prediction: {}".format(run_id1, prediction1)) 
print("Run ID: {} Prediction: {}".format(run_id2, prediction2))

### Clean up the deployment

When the model deployment is no longer needed, run the `mlflow.sagemaker.delete()` function to delete it.

In [0]:
# Specify the archive=False option to delete any SageMaker models and configurations
# associated with the specified application
mfs.delete(app_name=app_name, region_name=region, archive=False)

Verify that the SageMaker endpoint associated with the application has been deleted.

In [0]:
def get_active_endpoints(app_name):
  sage_client = boto3.client('sagemaker', region_name=region)
  app_endpoints = sage_client.list_endpoints(NameContains=app_name)["Endpoints"]
  return list(filter(lambda en : en == app_name, [str(endpoint["EndpointName"]) for endpoint in app_endpoints]))
  
print("The following endpoints exist for the `{an}` application: {eps}".format(an=app_name, eps=get_active_endpoints(app_name)))