## End to End ML Demo: Read Data, Build ML Model, Track with MLflow, ONNX, Deploy to ACI/AKS with MLflow

In this tutorial, we will use MLflow to train a model for rating wines and deploy it to Azure ML for real-time serving.

This guide consists of the following sections:

#### Setup
* Launch an Azure Databricks cluster
* Install MLflow
* Install the Azure ML SDK
* Create or load an Azure ML Workspace
* (Optional) Connect to an MLflow tracking server

#### Training a model
* Download training data
* In an MLflow run, train and save an ElasticNet model for rating wines

#### Building an Azure Container Image for model deployment
* Use MLflow to build a Container Image for the trained model

#### Deploying the model to "dev" using Azure Container Instances (ACI)
* Create an ACI webservice deployment using the model's Container Image

#### Querying the deployed model in "dev"
* Load a sample input vector from the wine dataset
* Evaluate the sample input vector by sending an HTTP request

#### Deploying the model to production using Azure Kubernetes Service (AKS)
* Option 1: Create a new AKS cluster
* Option 2: Connect to an existing AKS cluster
* Deploy to the model's image to the specified AKS cluster

#### Querying the deployed model in production
* Load a sample input vector from the wine dataset
* Evaluate the sample input vector by sending an HTTP request

#### Updating the production deployment
* Train a new model
* Build an Azure Container Image for the new model
* Deploy the new model's image to the AKS cluster
* Query the updated model

#### Cleaning up the deployments
* Terminate the "dev" ACI webservice
* Terminate the production AKS webservice
* Remove the AKS cluster from the Azure ML Workspace

### Note that the notebook will not run on the training environment as we don't have the ACS, AKS etc services available

The dataset used for this example is the UCI ML repository wine ratings dataset on the quality of wines.

Original Source: [UCI Machine Learning Repository 
Wine Ratings Data Set](https://archive.ics.uci.edu/ml/datasets/wine+quality)
[Cortez et al., 2009] 
P. Cortez, A. Cerdeira, F. Almeida, T. Matos and J. Reis.
Modeling wine preferences by data mining from physicochemical properties. In Decision Support Systems, Elsevier, 47(4):547-553, 2009.

## Setup

### Launch an Azure Databricks cluster

In order to run this notebook properly, it must be attached to an Azure Databricks cluster that satisfies the following requirements:
  
- Use a DBR with Python 3, this notebook is demoed with DBR 6.5

### Install MLflow

Next, install the MLflow Python library using the following steps:

1. Create the library with the Source `Upload Python Egg or PyPI` and the versioned Pip library name:
  - `mlflow` - if you want to always use the latest, else use syntax `mlflow=1.7.0` to get a dedicated version of MLflow
  
2. Attach the library to the cluster.

### Install the Azure ML MLflow SDK, ONNX

Once a cluster has been launched with the configuration described in **Launch an Azure Databricks cluster**, install the Azure Machine Learning SDK using the following steps:

1. Create the library with the Source ``Upload Python Egg or PyPI`` and the Pip library name:
  - `azureml-mlflow`, `skl2onnx`, `onnxruntime`
     
2. Attach the library to the cluster.

### Create or load an Azure ML Workspace

Before models can be deployed to Azure ML, an Azure ML Workspace must be created or obtained. The `azureml.core.Workspace.create()` function will load a workspace of a specified name or create one if it does not already exist. For more information about creating an Azure ML Workspace, see the [Azure ML Workspace management documentation](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-manage-workspace).

In [0]:
import azureml
from azureml.core import Workspace

workspace_name = "<WORKSPACE_NAME>"
workspace_location="<WORKSPACE_LOCATION>"
resource_group = "<RESOURCE_GROUP>"
subscription_id = "<SUBSCRIPTION_ID>"

workspace = Workspace.create(name = workspace_name,
                             subscription_id = subscription_id,
                             resource_group = resource_group,
                             location = workspace_location,
                             exist_ok=True)

### (Optional) Connect to an MLflow tracking server

MLflow can collect data about a model training session, such as validation accuracy. It can also save artifacts produced during the training session, such as a PySpark pipeline model.

By default, these data and artifacts are stored on the cluster's local filesystem. However, they can also be stored remotely using an [MLflow Tracking Server](https://mlflow.org/docs/latest/tracking.html).

In [0]:
import mlflow
mlflow.__version__

# We are using the hosted mlflow tracking server

# If we want to use Azure ML MLflow tracking server, set the tracking URI
azureml_mlflow_uri = workspace.get_mlflow_tracking_uri()
mlflow.set_tracking_uri(azureml_mlflow_uri)

In [0]:
experiment_name = "test"
mlflow.set_experiment(experiment_name)

## Training a model

### Download training data 

First, download the [wine qualities dataset (published by Cortez et al.)](https://archive.ics.uci.edu/ml/datasets/wine+quality) that will be used to train the model.

In [0]:
%sh wget https://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-red.csv

In [0]:
wine_data_path = "/databricks/driver/winequality-red.csv"

### In an MLflow run, train and save an ElasticNet model for rating wines

We will train a model using Scikit-learn's Elastic Net regression module. We will fit the model inside a new MLflow run (training session), allowing us to save performance metrics, hyperparameter data, and model artifacts for future reference. If MLflow has been connected to a tracking server, this data will be persisted to the tracking server's file and artifact stores, allowing other users to view and download it. For more information about model tracking in MLflow, see the [MLflow tracking reference](https://www.mlflow.org/docs/latest/tracking.html).

Later, we will use the saved MLflow model artifacts to deploy the trained model to Azure ML for real-time serving.

In [0]:
import os
import warnings
import sys

import pandas as pd
import numpy as np
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score
from sklearn.model_selection import train_test_split
from sklearn.linear_model import ElasticNet

import mlflow
import mlflow.sklearn
import mlflow.onnx
import onnx
import skl2onnx

def eval_metrics(actual, pred):
    rmse = np.sqrt(mean_squared_error(actual, pred))
    mae = mean_absolute_error(actual, pred)
    r2 = r2_score(actual, pred)
    return rmse, mae, r2


def train_model(wine_data_path, model_path, alpha, l1_ratio):
    warnings.filterwarnings("ignore")
    np.random.seed(40)

    # Read the wine-quality csv file (make sure you're running this from the root of MLflow!)
    data = pd.read_csv(wine_data_path, sep=None)

    # Split the data into training and test sets. (0.75, 0.25) split.
    train, test = train_test_split(data)

    # The predicted column is "quality" which is a scalar from [3, 9]
    train_x = train.drop(["quality"], axis=1)
    test_x = test.drop(["quality"], axis=1)
    train_y = train[["quality"]]
    test_y = test[["quality"]]

    # Start a new MLflow training run 
    with mlflow.start_run():
        # Fit the Scikit-learn ElasticNet model
        lr = ElasticNet(alpha=alpha, l1_ratio=l1_ratio, random_state=42)
        lr.fit(train_x, train_y)

        predicted_qualities = lr.predict(test_x)

        # Evaluate the performance of the model using several accuracy metrics
        (rmse, mae, r2) = eval_metrics(test_y, predicted_qualities)

        print("Elasticnet model (alpha=%f, l1_ratio=%f):" % (alpha, l1_ratio))
        print("  RMSE: %s" % rmse)
        print("  MAE: %s" % mae)
        print("  R2: %s" % r2)

        # Log model hyperparameters and performance metrics to the MLflow tracking server
        # (or to disk if no)
        mlflow.log_param("alpha", alpha)
        mlflow.log_param("l1_ratio", l1_ratio)
        mlflow.log_metric("rmse", rmse)
        mlflow.log_metric("r2", r2)
        mlflow.log_metric("mae", mae)

        mlflow.sklearn.log_model(lr, model_path)
        
        
        initial_type = [('float_input', skl2onnx.common.data_types.FloatTensorType([None, test_x.shape[1]]))]
        onnx_model = skl2onnx.convert_sklearn(lr, initial_types=initial_type)
        print("onnx_model.type:", type(onnx_model))
        mlflow.onnx.log_model(onnx_model, "onnx-model")
        mlflow.set_tag("onnx_version", onnx.__version__)
        
        return mlflow.active_run().info.run_uuid

In [0]:
alpha_1 = 0.75
l1_ratio_1 = 0.25
model_path = 'model'
run_id1 = train_model(wine_data_path=wine_data_path, model_path=model_path, alpha=alpha_1, l1_ratio=l1_ratio_1)
model_uri = "runs:/"+run_id1+"/model"

## Building an Azure Container Image for model deployment

### Use MLflow to build a Container Image for the trained model

We will use the `mlflow.azuereml.build_image` function to build an Azure Container Image for the trained MLflow model. This function also registers the MLflow model with a specified Azure ML workspace. The resulting image can be deployed to Azure Container Instances (ACI) or Azure Kubernetes Service (AKS) for real-time serving.

In [0]:
import mlflow.azureml

model_image, azure_model = mlflow.azureml.build_image(model_uri=model_uri, 
                                                      workspace=workspace, 
                                                      model_name="wine-rating-model",
                                                      image_name="wine-model-container-image",
                                                      description="Sklearn ElasticNet image for rating wines", 
                                                      tags={
                                                        "alpha": str(alpha_1),
                                                        "l1_ratio": str(l1_ratio_1),
                                                      },
                                                      synchronous=True)

In [0]:
model_image.wait_for_creation(show_output=True)

## Deploying the model to "dev" using [Azure Container Instances (ACI)](https://docs.microsoft.com/en-us/azure/container-instances/)

The [ACI platform](https://docs.microsoft.com/en-us/azure/container-instances/) is the recommended environment for staging and developmental model deployments.

### Create an ACI webservice deployment using the model's Container Image

Using the Azure ML SDK, we will deploy the Container Image that we built for the trained MLflow model to ACI.

In [0]:
from azureml.core.webservice import AciWebservice, Webservice

dev_webservice_name = "wine-model-dev" # make sure this name is unique and doesnt already exist, else need to replace
dev_webservice_deployment_config = AciWebservice.deploy_configuration()
dev_webservice = Webservice.deploy_from_image(name=dev_webservice_name, image=model_image, deployment_config=dev_webservice_deployment_config, workspace=workspace)

In [0]:
dev_webservice.wait_for_deployment()

## Querying the deployed model in "dev"

### Load a sample input vector from the wine dataset

In [0]:
import numpy as np
import pandas as pd
from sklearn import datasets

data = pd.read_csv(wine_data_path, sep=None)
train, _ = train_test_split(data)
train_x = train.drop(["quality"], axis=1)
sample = train_x.iloc[[0]]
query_input = list(sample.as_matrix().flatten())
sample_json = sample.to_json(orient="split")

#### Evaluate the sample input vector by sending an HTTP request
We will query the ACI webservice's scoring endpoint by sending an HTTP POST request that contains the input vector.

In [0]:
import requests
import json

def query_endpoint_example(scoring_uri, inputs, service_key=None):
  headers = {
    "Content-Type": "application/json",
  }
  if service_key is not None:
    headers["Authorization"] = "Bearer {service_key}".format(service_key=service_key)
    
  print("Sending batch prediction request with inputs: {}".format(inputs))
  response = requests.post(scoring_uri, data=inputs, headers=headers)
  print("Response: {}".format(response.text))
  preds = json.loads(response.text)
  print("Received response: {}".format(preds))
  return preds

In [0]:
dev_scoring_uri = dev_webservice.scoring_uri

In [0]:
dev_prediction = query_endpoint_example(scoring_uri=dev_scoring_uri, inputs=sample_json)

## Deploying the model to production using [Azure Kubernetes Service (AKS)](https://azure.microsoft.com/en-us/services/kubernetes-service/)

### Option 1: Create a new AKS cluster

If you do not have an active AKS cluster for model deployment, you can create one using the Azure ML SDK.

In [0]:
from azureml.core.compute import AksCompute, ComputeTarget

# Use the default configuration (you can also provide parameters to customize this)
prov_config = AksCompute.provisioning_configuration()

aks_cluster_name = "wine-prod" 
# Create the cluster
aks_target = ComputeTarget.create(workspace = workspace, 
                                  name = aks_cluster_name, 
                                  provisioning_configuration = prov_config)

# Wait for the create process to complete
aks_target.wait_for_completion(show_output = True)
print(aks_target.provisioning_state)
print(aks_target.provisioning_errors)

### Option 2: Connect to an existing AKS cluster

If you already have any active AKS cluster running, you can add it to your Workspace using the Azure ML SDK.

In [0]:
from azureml.core.compute import AksCompute, ComputeTarget

# Get the resource id from https://porta..azure.com -> Find your resource group -> click on the Kubernetes service -> Properties
resource_id = "/subscriptions/<your subscription id>/resourcegroups/<your resource group>/providers/Microsoft.ContainerService/managedClusters/<your aks service name>"


# Give the cluster a local name
cluster_name = "<CLUSTER_NAME>"

# Attatch the cluster to your workgroup
aks_target = AksCompute.attach(workspace=workspace, name=cluster_name, resource_id=resource_id)

compute = ComputeTarget.attach(workspace, cluster_name, attach_config)

# Wait for the operation to complete
compute.wait_for_completion(True)
print(compute.provisioning_state)
print(compute.provisioning_errors)

### Deploy to the model's image to the specified AKS cluster

In [0]:
from azureml.core.webservice import Webservice, AksWebservice

# Set configuration and service name
prod_webservice_name = "wine-model-prod"
prod_webservice_deployment_config = AksWebservice.deploy_configuration()

# Deploy from image
prod_webservice = Webservice.deploy_from_image(workspace = workspace, 
                                               name = prod_webservice_name,
                                               image = model_image,
                                               deployment_config = prod_webservice_deployment_config,
                                               deployment_target = aks_target)

In [0]:
# Wait for the deployment to complete
prod_webservice.wait_for_deployment(show_output = True)

## Querying the deployed model in production

### Load a sample input vector from the wine dataset

In [0]:
import numpy as np
import pandas as pd
from sklearn import datasets

data = pd.read_csv(wine_data_path, sep=None)
train, _ = train_test_split(data)
train_x = train.drop(["quality"], axis=1)
sample = train_x.iloc[[0]]
query_input = list(sample.as_matrix().flatten())
sample_json = sample.to_json(orient="split")

#### Evaluate the sample input vector by sending an HTTP request
We will query the AKS webservice's scoring endpoint by sending an HTTP POST request that includes the input vector. The production AKS deployment may require an authorization token (service key) for queries. We will include this key in the HTTP request header.

In [0]:
import requests
import json

def query_endpoint_example(scoring_uri, inputs, service_key=None):
  headers = {
    "Content-Type": "application/json",
  }
  if service_key is not None:
    headers["Authorization"] = "Bearer {service_key}".format(service_key=service_key)
    
  print("Sending batch prediction request with inputs: {}".format(inputs))
  response = requests.post(scoring_uri, data=inputs, headers=headers)
  preds = json.loads(response.text)
  print("Received response: {}".format(preds))
  return preds

In [0]:
prod_scoring_uri = prod_webservice.scoring_uri
prod_service_key = prod_webservice.get_keys()[0] if len(prod_webservice.get_keys()) > 0 else None

In [0]:
prod_prediction = query_endpoint_example(scoring_uri=prod_scoring_uri, service_key=prod_service_key, inputs=sample_json)

## Updating the production deployment

### Train a new model
First, we will train a new ElasticNet model with updated hyperparameters.

In [0]:
alpha_2 = 0.5
l1_ratio_2 = 0.8
run_id2 = train_model(wine_data_path=wine_data_path, model_path=model_path, alpha=alpha_2, l1_ratio=l1_ratio_2)
model_uri = "runs:/"+run_id2+"/model"

### Build an Azure Container Image for the new model

In [0]:
import mlflow.azureml

model_image_updated, azure_model_updated = mlflow.azureml.build_image(model_uri=model_uri,
                                                                      workspace=workspace, 
                                                                      model_name="wine-rating-model",
                                                                      image_name="wine-model-container-image",
                                                                      description="Sklearn ElasticNet image for rating wines", 
                                                                      tags={
                                                                        "alpha": str(alpha_2),
                                                                        "l1_ratio": str(l1_ratio_2),
                                                                      },
                                                                      synchronous=False)

In [0]:
model_image_updated.wait_for_creation(show_output=True)

### Deploy the new model's image to the AKS cluster

Using the [azureml.core.webservice.AksWebservice.update()](https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.webservice.akswebservice?view=azure-ml-py#update) function, we will replace the deployment's existing model image with the new model image.

In [0]:
prod_webservice.update(image=model_image_updated)

In [0]:
prod_webservice.wait_for_deployment(show_output = True)

### Query the updated model

In [0]:
prod_prediction_updated = query_endpoint_example(scoring_uri=prod_scoring_uri, service_key=prod_service_key, inputs=sample_json)

## Cleaning up the deployments

### Terminate the "dev" ACI webservice

Because ACI manages compute resources on your behalf, deleting the "dev" ACI webservice will remove all resources associated with the "dev" model deployment

In [0]:
dev_webservice.delete()

### Terminate the production AKS webservice

This terminates the real-time serving webservice running on the specified AKS cluster. It **does not** terminate the AKS cluster.

In [0]:
prod_webservice.delete()

### Remove the AKS cluster from the Azure ML Workspace

If the cluster was created using the Azure ML SDK (see **Option 1: Create a new AKS cluster**), removing it from the Azure ML Workspace will terminate the cluster, including all of its compute resources and deployments.

If the cluster was created independently (see **Option 2: Connect to an existing AKS cluster**), it will remain active after removal from the Azure ML Workspace.

In [0]:
aks_target.delete()