## Arize AI Quick Start: Training and Serving Models with Microsoft Azure ML

##### NOTE: We do not recommend using *Run All* because it takes several minutes to deploy and update models; models cannot be queried until they are active.

This part of the guide consists of the following sections:

#### Setup
* Launch an Azure Databricks cluster
* Install Arize SDK
* Install MLflow
* Install the Azure ML SDK
* Create or load an Azure ML Workspace


#### Training a Model
* Use MLflow Tracking to track experiment

#### Building an Azure Container Image for model deployment
* Use MLflow to build a Container Image for the trained model

#### Deploy the model to expose a consumable API using Azure Container Instances (ACI)
* Create an ACI webservice deployment using the model's Container Image

#### Querying the deployed model in ACI
* Load a sample input vector from the diabetes dataset
* Evaluate the sample input vector by sending an HTTP request

#### Publishing prediction results to Arize
* Log resulting prediction output along with input vector using Arize's SDK

#### Alternatively, if using Kubernetes: Deploy the model using Azure Kubernetes Service (AKS)
* Option 1: Create a new AKS cluster
* Option 2: Connect to an existing AKS cluster
* Deploy to the model's image to the specified AKS cluster

#### Querying the deployed model in AKS
* Load a sample input vector from the wine dataset
* Evaluate the sample input vector by sending an HTTP request

#### Updating the AKS deployment
* Build an Azure Container Image for another model
* Deploy the new model's image to the AKS cluster
* Query the updated model

#### Cleaning up the deployments
* Terminate the ACI webservice
* Terminate the AKS webservice
* Remove the AKS cluster from the Azure ML Workspace

This notebook uses the `diabetes` dataset in scikit-learn and predicts the progression metric (a quantitative measure of disease progression after one year after) based on BMI, blood pressure, etc. It uses the scikit-learn ElasticNet linear regression model, where we vary the `alpha` and `l1_ratio` parameters for tuning. For more information on ElasticNet, refer to:
  * [Elastic net regularization](https://en.wikipedia.org/wiki/Elastic_net_regularization)
  * [Regularization and Variable Selection via the Elastic Net](https://web.stanford.edu/~hastie/TALKS/enet_talk.pdf)

**Note:** This notebook expects that you use a Databricks hosted MLflow tracking server. If you would like to preview the Databricks MLflow tracking server, contact your Databricks sales representative to request access. To set up your own tracking server, see the instructions in [MLflow Tracking Servers](https://www.mlflow.org/docs/latest/tracking.html#mlflow-tracking-servers) and configure your connection to your tracking server by running [mlflow.set_tracking_uri](https://www.mlflow.org/docs/latest/python_api/mlflow.html#mlflow.set_tracking_uri).

## Setup

1. Ensure you are using or create a cluster specifying
  * **Databricks Runtime Version:** Databricks Runtime 5.0 or above
  * **Python Version:** Python > 3.5.3
1. Install required libraries or if using Databricks Runtime 5.1 or above, run Cmd 5.
   1. Create required libraries.
    * Source **PyPI** and enter `arize`.
    * Source **PyPI** and enter `mlflow[extras]`. This installs mlflow and all its dependencies.
    * Source **PyPI** and enter `azureml-sdk[databricks]`.
   1. Install the libraries into the cluster.
1. Attach this notebook to the cluster.

In [5]:
!pip install arize mlflow[extras] azureml-sdk[databricks]

#### Write Your ML Code Based on the`train_diabetes.py` Code
This tutorial is based on the MLflow's [train_diabetes.py](https://github.com/mlflow/mlflow/blob/master/examples/sklearn_elasticnet_diabetes/train_diabetes.py) example, which uses the `sklearn.diabetes` built-in dataset to predict disease progression based on various factors.

In [7]:
# Import various libraries including matplotlib, sklearn, mlflow
import os
import warnings
import sys

import pandas as pd
import numpy as np
from itertools import cycle
import matplotlib.pyplot as plt
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score
from sklearn.model_selection import train_test_split
from sklearn.linear_model import ElasticNet
from sklearn.linear_model import lasso_path, enet_path
from sklearn import datasets

# Import mlflow
import mlflow
import mlflow.sklearn

# Load Diabetes datasets
diabetes = datasets.load_diabetes()
X = diabetes.data
y = diabetes.target

# Create pandas DataFrame for sklearn ElasticNet linear_model
Y = np.array([y]).transpose()
d = np.concatenate((X, Y), axis=1)
cols = ['age', 'sex', 'bmi', 'bp', 's1', 's2', 's3', 's4', 's5', 's6', 'progression']
data = pd.DataFrame(d, columns=cols)

#### Train the Diabetes Model
The next function trains ElasticNet linear regression based on the input parameters of `alpha (in_alpha)` and `l1_ratio (in_l1_ratio)`.

In addition, this function uses MLflow Tracking to record its
* parameters
* metrics
* model

**Tip:** Use `with mlflow.start_run:` in the Python code to create a new MLflow run. This is the recommended way to use MLflow in notebook cells. Whether your code completes or exits with an error, the `with` context will make sure to close the MLflow run, so you don't have to call `mlflow.end_run`.

In [9]:
# train_diabetes
#   Uses the sklearn Diabetes dataset to predict diabetes progression using ElasticNet
#       The predicted "progression" column is a quantitative measure of disease progression one year after baseline
#       http://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_diabetes.html
def train_diabetes(data, in_alpha, in_l1_ratio):
  # Evaluate metrics
  def eval_metrics(actual, pred):
      rmse = np.sqrt(mean_squared_error(actual, pred))
      mae = mean_absolute_error(actual, pred)
      r2 = r2_score(actual, pred)
      return rmse, mae, r2

  warnings.filterwarnings('ignore')
  np.random.seed(40)

  # Split the data into training and test sets. (0.75, 0.25) split.
  train, test = train_test_split(data)

  # The predicted column is "progression" which is a quantitative measure of disease progression one year after baseline
  train_x = train.drop(['progression'], axis=1)
  test_x = test.drop(['progression'], axis=1)
  train_y = train[['progression']]
  test_y = test[['progression']]

  if float(in_alpha) is None:
    alpha = 0.05
  else:
    alpha = float(in_alpha)

  if float(in_l1_ratio) is None:
    l1_ratio = 0.05
  else:
    l1_ratio = float(in_l1_ratio)

  # Start an MLflow run; the "with" keyword ensures we'll close the run even if this cell crashes
  with mlflow.start_run():
    lr = ElasticNet(alpha=alpha, l1_ratio=l1_ratio, random_state=42)
    lr.fit(train_x, train_y)

    predicted_qualities = lr.predict(test_x)

    (rmse, mae, r2) = eval_metrics(test_y, predicted_qualities)

    # Print out ElasticNet model metrics
    print('Elasticnet model (alpha=%f, l1_ratio=%f):' % (alpha, l1_ratio))
    print('  RMSE: %s' % rmse)
    print('  MAE: %s' % mae)
    print('  R2: %s' % r2)

    # Set tracking_URI first and then reset it back to not specifying port
    # Note, we had specified this in an earlier cell
    #mlflow.set_tracking_uri(mlflow_tracking_URI)

    # Log mlflow attributes for mlflow UI
    mlflow.log_param('alpha', alpha)
    mlflow.log_param('l1_ratio', l1_ratio)
    mlflow.log_metric('rmse', rmse)
    mlflow.log_metric('r2', r2)
    mlflow.log_metric('mae', mae)
    mlflow.sklearn.log_model(lr, "model")
    modelpath = "/dbfs/mlflow/test_diabetes/model-%f-%f" % (alpha, l1_ratio)
    mlflow.sklearn.save_model(lr, modelpath)

#### Experiment with Different Parameters

Call `train_diabetes` with different parameters. Later, you'll be able to visualize all these runs in the MLflow experiment.

In [11]:
%fs rm -r dbfs:/mlflow/test_diabetes

In [12]:
# alpha and l1_ratio values of 0.01, 0.01
train_diabetes(data, 0.01, 0.01)

# alpha and l1_ratio values of 0.01, 0.75
train_diabetes(data, 0.01, 0.75)

# alpha and l1_ratio values of 0.01, .5
train_diabetes(data, 0.01, .5)

# alpha and l1_ratio values of 0.01, 1
train_diabetes(data, 0.01, 1)

## View the run, experiment, run details, and notebook revision

1. Click the **Runs** icon in the notebook context bar to display the Runs sidebar. In the sidebar, you can view the run parameters and metrics. For example: <img src="https://docs.databricks.com/_static/images/mlflow/mlflow-notebook-experiments.gif"/>

1. Click the External Link icon <img src="https://docs.databricks.com/_static/images/external-link.png"/> in the Runs context bar to view the notebook experiment. For example: <img src="https://docs.databricks.com/_static/images/mlflow/quick-start-nb-experiment.png"/>

### Create or load an Azure ML Workspace

Before models can be deployed to Azure ML, you must create or obtain an Azure ML Workspace. The `azureml.core.Workspace.create()` function will load a workspace of a specified name or create one if it does not already exist. For more information about creating an Azure ML Workspace, see the [Azure ML Workspace management documentation](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-manage-workspace).

In [16]:
import azureml
from azureml.core import Workspace

workspace_name = 'YOUR WORKSPACE NAME'
workspace_location = 'YOUR WORKSPACE LOCATION'
resource_group = 'YOUR RESOURCE GROUP'
subscription_id = 'YOUR SUBSCRIPTION ID'

workspace = Workspace.create(name = workspace_name,
                             location = workspace_location,
                             resource_group = resource_group,
                             subscription_id = subscription_id,
                             exist_ok=True)

## Build an Azure Container Image for model deployment

### Use MLflow to build a Container Image for the trained model

Use the `mlflow.azuereml.build_image` function to build an Azure Container Image for the trained MLflow model. This function also registers the MLflow model with a specified Azure ML workspace. The resulting image can be deployed to Azure Container Instances (ACI) or Azure Kubernetes Service (AKS) for real-time serving.

Specify the run ID associated with an ElasticNet training run from. You can find a run ID and model path from the experiment run, which can be found on the run details page:

![image](https://docs.azuredatabricks.net/_static/images/mlflow/mlflow-deployment-example-run-info.png)

In [20]:
run_id = 'YOU RUN ID OF CHOICE'
model_uri = 'runs:/' + run_id + '/model'

In [21]:
import mlflow.azureml

model_image, azure_model = mlflow.azureml.build_image(model_uri=model_uri,
                                                      workspace=workspace,
                                                      model_name='model',
                                                      image_name='model',
                                                      description='Sklearn ElasticNet image for predicting diabetes progression',
                                                      synchronous=False)

In [22]:
model_image.wait_for_creation(show_output=True)

## Deploy the model to expose a consumable API using [Azure Container Instances (ACI)](https://docs.microsoft.com/en-us/azure/container-instances/)

Using the Azure ML SDK, deploy the Container Image for the trained MLflow model to ACI.

In [24]:
from azureml.core.webservice import AciWebservice, Webservice

dev_webservice_name = 'diabetes-model'
dev_webservice_deployment_config = AciWebservice.deploy_configuration()
dev_webservice = Webservice.deploy_from_image(name=dev_webservice_name, image=model_image, deployment_config=dev_webservice_deployment_config, workspace=workspace)

dev_webservice.wait_for_deployment()

## Query the deployed model in AzureML

In [26]:
import pandas as pd
import numpy as np
from sklearn import datasets

#### Load diabetes dataset
diabetes = datasets.load_diabetes()

#### Create sample input vector
X = diabetes.data
y = diabetes.target
Y = np.array([y]).transpose()
d = np.concatenate((X, Y), axis=1)
cols = ['age', 'sex', 'bmi', 'bp', 's1', 's2', 's3', 's4', 's5', 's6', 'progression']
data = pd.DataFrame(d, columns=cols)
sample = data.drop(["progression"], axis=1).iloc[[0]]

query_input = sample.to_json(orient='split')
query_input = eval(query_input)
query_input.pop('index', None)

#print(query_input)

#### Evaluate the sample input vector by sending an HTTP request
Query the ACI webservice's scoring endpoint by sending an HTTP POST request that contains the input vector.

In [28]:
import requests
import json
import uuid

def query_endpoint_example(scoring_uri, inputs, service_key=None):
  headers = {
    'Content-Type': 'application/json',
  }
  if service_key is not None:
    headers['Authorization'] = 'Bearer {service_key}'.format(service_key=service_key)

  print('Sending batch prediction request with inputs: {}'.format(inputs))
  response = requests.post(scoring_uri, data=json.dumps(inputs), headers=headers)
  preds = json.loads(response.text)
  print('Received response: {}'.format(preds))
  return preds

In [29]:
dev_webservice.scoring_uri
model_id='diabetes-model'
model_version='2cdc865cd53a420cb12036ea08c62083'

In [30]:
### Arize Helper Utility
def construct_feature_map(inputs):
  keys = inputs['columns']
  values = inputs['data'][0]
  print(f'keys: {keys}')
  print(f'values: {values}')
  features = {}
  for i, key in enumerate(keys):
    features[key] = str(values[i])
  return features


In [31]:
from arize.api import Client as client
from arize.utils.types import ModelTypes

## Instantiate arize client
SPACE_KEY = 'YOUR ARIZE SPACE KEY'
API_KEY = 'YOUR ARIZE API KEY'
arize = Client(space_key=SPACE_KEY, api_key=API_KEY)

dev_prediction = query_endpoint_example(scoring_uri=dev_webservice.scoring_uri, inputs=query_input)

prediction_id=str(uuid.uuid4())

print('Logging prediction to arize: {}'.format(prediction_id))
arize.log_prediction(
    model_id=model_id,
    model_version=model_version,
    model_type=ModelTypes.NUMERIC,
    prediction_id=prediction_id,
    prediction_label=dev_prediction[0],
    features=construct_feature_map(query_input))

## Alternatively, if using Kubernetes: Deploy the model using [Azure Kubernetes Service (AKS)](https://azure.microsoft.com/en-us/services/kubernetes-service/). (Do Option 1 or Option 2)

### Option 1: Create a new AKS cluster

If you do not have an active AKS cluster for model deployment, create one using the Azure ML SDK.

In [34]:
from azureml.core.compute import AksCompute, ComputeTarget

# Use the default configuration (you can also provide parameters to customize this)
prov_config = AksCompute.provisioning_configuration(vm_size='Standard_D16_v3')

aks_cluster_name = 'diabetes-cluster'
# Create the cluster
aks_target = ComputeTarget.create(workspace = workspace,
                                  name = aks_cluster_name,
                                  provisioning_configuration = prov_config)

# Wait for the create process to complete
aks_target.wait_for_completion(show_output = True)
print(aks_target.provisioning_state)
print(aks_target.provisioning_errors)

### Option 2: Connect to an existing AKS cluster

If you already have an active AKS cluster running, you can add it to your Workspace using the Azure ML SDK.

In [36]:
from azureml.core.compute import AksCompute, ComputeTarget

# Get the resource group from https://porta..azure.com -> Find your resource group
resource_group = '<resource-group>'

# Give the cluster a local name
aks_cluster_name = 'diabetes-cluster'

# Attatch the cluster to your workgroup
attach_config = AksCompute.attach_configuration(resource_group=resource_group, cluster_name=aks_cluster_name)
aks_target = ComputeTarget.attach(workspace, name='diabetes-compute', attach_config)

# Wait for the operation to complete
aks_target.wait_for_completion(True)
print(aks_target.provisioning_state)
print(aks_target.provisioning_errors)

### Deploy to the model's image to the specified AKS cluster

In [38]:
from azureml.core.webservice import Webservice, AksWebservice

# Set configuration and service name
prod_webservice_name = 'diabetes-model-prod'
prod_webservice_deployment_config = AksWebservice.deploy_configuration()

# Deploy from image
prod_webservice = Webservice.deploy_from_image(workspace = workspace,
                                               name = prod_webservice_name,
                                               image = model_image,
                                               deployment_config = prod_webservice_deployment_config,
                                               deployment_target = aks_target)

In [39]:
# Wait for the deployment to complete
prod_webservice.wait_for_deployment(show_output = True)

## Query the deployed model in production

#### Evaluate the sample input vector by sending an HTTP request
Query the AKS webservice's scoring endpoint by sending an HTTP POST request that includes the input vector. The production AKS deployment may require an authorization token (service key) for queries. Include this key in the HTTP request header.

In [42]:
import requests
import json

def query_endpoint_example(scoring_uri, inputs, service_key=None):
  headers = {
    'Content-Type': 'application/json',
  }
  if service_key is not None:
    headers['Authorization'] = 'Bearer {service_key}'.format(service_key=service_key)

  print('Sending batch prediction request with inputs: {}'.format(inputs))
  response = requests.post(scoring_uri, data=json.dumps(inputs), headers=headers)
  preds = json.loads(response.text)
  print('Received response: {}'.format(preds))
  return preds

In [43]:
prod_scoring_uri = prod_webservice.scoring_uri
prod_service_key = prod_webservice.get_keys()[0] if len(prod_webservice.get_keys()) > 0 else None

In [44]:
prod_prediction1 = query_endpoint_example(scoring_uri=prod_scoring_uri, service_key=prod_service_key, inputs=query_input)

## Update the production deployment

### Build an Azure Container Image for the new model

In [47]:
run_id2 = '<run-id2>'
model_uri = 'runs:/' + run_id2 + '/model'

In [48]:
import mlflow.azureml

model_image_updated, azure_model_updated = mlflow.azureml.build_image(
    model_uri=model_uri,
    workspace=workspace,
    model_name='model-updated',
    image_name='model-updated',
    description='Sklearn ElasticNet image for predicting diabetes progression',
    synchronous=False)

In [49]:
model_image_updated.wait_for_creation(show_output=True)

### Deploy the new model's image to the AKS cluster

Using the [`azureml.core.webservice.AksWebservice.update()`](https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.webservice.akswebservice?view=azure-ml-py#update) function, replace the deployment's existing model image with the new model image.

In [51]:
prod_webservice.update(image=model_image_updated)

In [52]:
prod_webservice.wait_for_deployment(show_output = True)

### Query the updated model

In [54]:
prod_prediction2 = query_endpoint_example(scoring_uri=prod_scoring_uri, service_key=prod_service_key, inputs=query_input)

## Compare the predictions

In [56]:
print('Run ID: {} Prediction: {}'.format(run_id1, prod_prediction1))
print('Run ID: {} Prediction: {}'.format(run_id2, prod_prediction2))

## Clean up the deployments

### Terminate the ACI webservice

Because ACI manages compute resources on your behalf, deleting the "dev" ACI webservice will remove all resources associated with the "dev" model deployment

In [59]:
dev_webservice.delete()

### Terminate the AKS webservice

This terminates the real-time serving webservice running on the specified AKS cluster. It **does not** terminate the AKS cluster.

In [61]:
prod_webservice.delete()

### Remove the AKS cluster from the Azure ML Workspace

If the cluster was created using the Azure ML SDK (see **Option 1: Create a new AKS cluster**), remove it from the Azure ML Workspace will terminate the cluster, including all of its compute resources and deployments.

If the cluster was created independently (see **Option 2: Connect to an existing AKS cluster**), it will remain active after removal from the Azure ML Workspace.

In [63]:
aks_target.delete()

### Overview
Arize is an end-to-end ML observability and model monitoring platform. The platform is designed to help ML engineers and data science practitioners surface and fix issues with ML models in production faster with:
- Automated ML monitoring and model monitoring
- Workflows to troubleshoot model performance
- Real-time visualizations for model performance monitoring, data quality monitoring, and drift monitoring
- Model prediction cohort analysis
- Pre-deployment model validation
- Integrated model explainability

### Website
Visit Us At: https://arize.com/model-monitoring/

### Additional Resources
- [What is ML observability?](https://arize.com/what-is-ml-observability/)
- [Playbook to model monitoring in production](https://arize.com/the-playbook-to-monitor-your-models-performance-in-production/)
- [Using statistical distance metrics for ML monitoring and observability](https://arize.com/using-statistical-distance-metrics-for-machine-learning-observability/)
- [ML infrastructure tools for data preparation](https://arize.com/ml-infrastructure-tools-for-data-preparation/)
- [ML infrastructure tools for model building](https://arize.com/ml-infrastructure-tools-for-model-building/)
- [ML infrastructure tools for production](https://arize.com/ml-infrastructure-tools-for-production-part-1/)
- [ML infrastructure tools for model deployment and model serving](https://arize.com/ml-infrastructure-tools-for-production-part-2-model-deployment-and-serving/)
- [ML infrastructure tools for ML monitoring and observability](https://arize.com/ml-infrastructure-tools-ml-observability/)

Visit the [Arize Blog](https://arize.com/blog) and [Resource Center](https://arize.com/resource-hub/) for more resources on ML observability and model monitoring.
