The dataset used for this example is Bank marketing. Given a set of features about a customer can we predict whether the person will open a term deposit account.

Original Source: [UCI Machine Learning Repository 
Bank Marketing Data Set](https://archive.ics.uci.edu/ml/datasets/bank+marketing)
[Moro et al., 2014] S. Moro, P. Cortez and P. Rita. A Data-Driven Approach to Predict the Success of Bank Telemarketing. Decision Support Systems, Elsevier, 62:22-31, June 2014

### Deploying our model to AWS Sagemaker
In this notebook we will deploy our model from "production" stage in model registry to AWS Sagemaker. In order for this notebook to work, a few prerequisites must be met:
* First we need to build a docker image that will work with our MLFlow model. To do this:
  * Install docker and MLFlow on your local machine
  * use `mlflow sagemaker build-and-push --build` to create an MLFlow image that will work with Sagemaker
* Secondly, the docker image must be uploaded to AWS Elastic Container Registry (ECR)
  * In your AWS account, create a new ECR repository
  * [Install the AWS CLI](https://docs.aws.amazon.com/cli/latest/userguide/install-cliv2.html), and subsequently [configure credentials](https://docs.aws.amazon.com/cli/latest/reference/configure/)
  * Subsequently, [you must authenticate docker with your ECR repository](https://docs.aws.amazon.com/AmazonECR/latest/userguide/Registries.html#registry_auth)
  * You must also make sure the AWS [user used for AWS CLI has access to ECR](https://docs.aws.amazon.com/AmazonECR/latest/userguide/security_iam_id-based-policy-examples.html).
  * Then, use `docker tag` to rename your image to the ECR repository link.
  * Finally, use `docker push` to push the image to your ECR repository
* Third, we need to make sure our Databricks cluster is authenticated with AWS Sagemaker:
  * To do this, get or create an IAM role that has sagemaker permissions. AmazonSageMakerFullAccess role in AWS IAM is readily available
  * Secondly, add this role to the cross-account role that has been defined when setting up Databricks.
    * [this link](https://docs.databricks.com/administration-guide/cloud-configurations/aws/instance-profiles.html) roughly shows the steps. Instead of adding an S3 IAM role, you instead add the AmazonSageMakerFullAccess role.
  * Lastly, we need to add the role ARN to "instance profiles" in the databricks admin console.
  * Now we can launch a databricks cluster with our AWS Sagemaker, which means that our cluster has access to Sagemaker resources.
  
While this is a lot to set up, remember that all the above actions have to be executed only once. After all, we only really need one MLFlow image in ECR, and authentication also only has to be set-up once

### Step 1: From Model Registry, get the URI of our production model

In [0]:
import mlflow
from mlflow.tracking import MlflowClient
client = MlflowClient()

In [0]:
#dbutils.widgets.text("modelRegistryName","bankXGBoost")
modelRegistryName = dbutils.widgets.get("modelRegistryName")

In [0]:
def getProdModelURI(modelRegistryName):
  models = client.search_model_versions("name='%s'" % modelRegistryName)
  source = [model for model in models if model.current_stage == "Production"][0].source
  return source

modelURI = getProdModelURI(modelRegistryName)
# latest_prod_model_detail = client.get_latest_versions(model_name, stages=['Production'])[0]

### Step 2: Deploy our model to Sagemaker

In the cell below we define what we want to call our Sagemaker app, and we get the mlflow image that has been registered to AWS ECR. We use mode "replace" so that we can overwrite our model in subsequent iterations.

In [0]:
app_name = "xgboostBank"
image_url = "*******.dkr.ecr.eu-west-1.amazonaws.com/mthone:latest"

import mlflow.sagemaker as mfs
mfs.deploy(app_name=app_name, model_uri=modelURI, image_url = image_url, region_name="eu-west-1", mode="replace")

### Step 3: Querying our deployed model

In [0]:
df = spark.sql("select * from max_db.bank_marketing_train_set")

In [0]:
train = df.toPandas()
train_x = train.drop(["label"], axis=1)
sample = train_x.iloc[[0]]
#sample = train_x.iloc[:, :]
sample_json = sample.to_json(orient="split")

In [0]:
sample_json

In [0]:
import json
import boto3
def query_endpoint_example(app_name, input_json):
  print("Sending batch prediction request with inputs: {}".format(input_json))
  client = boto3.session.Session().client("sagemaker-runtime", "eu-west-1")
  
  response = client.invoke_endpoint(
      EndpointName=app_name,
      Body=input_json,
      ContentType='application/json; format=pandas-split',
  )
  preds = response['Body'].read().decode("ascii")
  preds = json.loads(preds)
  print("Received response: {}".format(preds))
  return preds

import pandas as pd
#input_df = pd.DataFrame([query_input])
#input_json = input_df.to_json(orient='split')

prediction1 = query_endpoint_example(app_name=app_name, input_json=sample_json)

### Step 4: Clean up our Sagemaker deployment
It is important to delete your Sagemaker deployment when you no longer use it, as it makes use of permanently running EC2 instances which will incur costs. For more information see [this link](https://aws.amazon.com/sagemaker/pricing/).

In [0]:
mfs.delete(app_name=app_name, region_name="eu-west-1", archive=False)

### Extra: A note on scheduling with a sagemaker deployment
* Scheduling can be done similar to notebook 3.1, using the Databricks scheduler. Once a deployment has been done, all you need to run is the Step 3 section.