## Hands-On Workshop: Wine classifier with MLFlow, Seldon, and the v2 protocol

In this workshop, we will run an ML Flow model, save the artefacts to MinIO, and create a v2 protocol SeldonDeployment via the SDK

We will follow these steps:
1. Setup environment
2. Run training experiments using MLFlow
3. Conda pack the environment
4. Push artefacts to Google Storage
5. Deploy via the SDK
6. Using MLServer codecs
7. [Optional] Create a new MLFlow run and deploy a canary model
8. [Optional] Train and deploy a drift detector
9. [Optional] Train and deploy an explainer

### 1. Setup environment

In your terminal, create a new virtual environment:
`conda create -n mlflow-wine python=3.8 ipykernel -y`

Activate the new environment:
`conda activate mlflow-wine`

Install required dependencies into your environment

In [None]:
!pip install seldon_deploy_sdk
!pip install conda-pack
!pip install mlserver
!pip install mlserver-mlflow
!pip install sklearn
!pip install mlflow
!sudo apt install tree

In [None]:
from seldon_deploy_sdk import Configuration, PredictApi, ApiClient, SeldonDeploymentsApi, ModelMetadataServiceApi, DriftDetectorApi, BatchJobsApi, BatchJobDefinition
from seldon_deploy_sdk.auth import OIDCAuthenticator
from seldon_deploy_sdk.rest import ApiException
import mlflow
import pandas as pd
import json
import requests
import os
import glob

### 2. Run training experiments using MLFlow
For our example, we will use the elastic net wine example from [MLflow's tutorial](https://github.com/mlflow/mlflow/tree/master/examples/sklearn_elasticnet_wine).  We'll use Scikit-learn to predict the `quality` of the wine given certain attributes as features.

Let's load the data to see what's inside:

In [None]:
data = pd.read_csv("http://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-red.csv", sep=';')
data.head()

Below we'll define a `train` function that can be run with two hyperparameters as inputs (`alpha` and `l1_ratio`) to train an ElasticNet linear model from Scikit-Learn.  It will download the dataset, define the train/test split, start an MLFLow run, log params and metrics, and save the model. 

In [None]:
# Wine Quality Sample
def train(in_alpha, in_l1_ratio):
    import os
    import warnings
    import sys

    import pandas as pd
    import numpy as np
    from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score
    from sklearn.model_selection import train_test_split
    from sklearn.linear_model import ElasticNet

    import mlflow
    import mlflow.sklearn
    
    import logging
    logging.basicConfig(level=logging.WARN)
    logger = logging.getLogger(__name__)

    def eval_metrics(actual, pred):
        rmse = np.sqrt(mean_squared_error(actual, pred))
        mae = mean_absolute_error(actual, pred)
        r2 = r2_score(actual, pred)
        return rmse, mae, r2


    warnings.filterwarnings("ignore")
    np.random.seed(40)

    # Read the wine-quality csv file from the URL
    csv_url =\
        'http://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-red.csv'
    try:
        data = pd.read_csv(csv_url, sep=';')
    except Exception as e:
        logger.exception(
            "Unable to download training & test CSV, check your internet connection. Error: %s", e)

    # Split the data into training and test sets. (0.75, 0.25) split.
    train, test = train_test_split(data)

    # The predicted column is "quality" which is a scalar from [3, 9]
    train_x = train.drop(["quality"], axis=1)
    test_x = test.drop(["quality"], axis=1)
    train_y = train[["quality"]]
    test_y = test[["quality"]]

    # Set default values if no alpha is provided
    if float(in_alpha) is None:
        alpha = 0.5
    else:
        alpha = float(in_alpha)

    # Set default values if no l1_ratio is provided
    if float(in_l1_ratio) is None:
        l1_ratio = 0.5
    else:
        l1_ratio = float(in_l1_ratio)

    # Useful for multiple runs (only doing one run in this sample notebook)    
    with mlflow.start_run():
        # Execute ElasticNet
        lr = ElasticNet(alpha=alpha, l1_ratio=l1_ratio, random_state=42)
        lr.fit(train_x, train_y)

        # Evaluate Metrics
        predicted_qualities = lr.predict(test_x)
        (rmse, mae, r2) = eval_metrics(test_y, predicted_qualities)

        # Print out metrics
        print("Elasticnet model (alpha=%f, l1_ratio=%f):" % (alpha, l1_ratio))
        print("  RMSE: %s" % rmse)
        print("  MAE: %s" % mae)
        print("  R2: %s" % r2)

        # Log parameter, metrics, and model to MLflow
        mlflow.log_param("alpha", alpha)
        mlflow.log_param("l1_ratio", l1_ratio)
        mlflow.log_metric("rmse", rmse)
        mlflow.log_metric("r2", r2)
        mlflow.log_metric("mae", mae)

        mlflow.sklearn.log_model(lr, "model")

Now we'll train a few models with various combinations of hyperparameters

In [None]:
train(0.5, 0.5)

In [None]:
train(0.2, 0.2)

In [None]:
train(0.1, 0.1)

The model training runs have been saved in the `mlruns` folder, under the experiment `0`:

In [None]:
!tree -L 1 mlruns/0

Inside each of these folders, MLflow stores the parameters we used to train our model, any metric we logged during training, and a snapshot of our model. If we look into one of them, we can see the following structure:

In [None]:
!tree mlruns/0/$(ls mlruns/0 | head -1)

In particular, we are interested in `artifacts/model` directory in order to deploy your model.  The `MLmodel` file describes the "flavor" for that model, including details on how to serve the model.  This helps deployment tools like Seldon deploy MLFlow models in a consistent way, while retaining the ability to train using various frameworks in MLFlow.

In [None]:
!cat mlruns/0/$(ls mlruns/0 | head -1)/artifacts/model/MLmodel

We can also view the dependencies that will be required to serve the model (note that we Seldon will require the MLServer package as well when deploying using the v2 protocol)

In [None]:
!cat mlruns/0/$(ls mlruns/0 | head -1)/artifacts/model/conda.yaml

### 4. Push artefacts to Google Storage

We'll first select the model that scored the best in terms of `rmse`, and get the run ID

In [None]:
runs_df = mlflow.search_runs(filter_string="metrics.rmse < 1")
runs_df

In [None]:
run_id = runs_df.loc[runs_df['metrics.rmse'].idxmin()]['run_id']
run_id

In [None]:
!gsutil cp -r mlruns/0/$run_id/artifacts/model/ gs://andrew-seldon/mlflow/elastic_wine_$run_id

In [None]:
!gsutil ls gs://andrew-seldon/mlflow/elastic_wine_$run_id

### 5. Deploy via the SDK

We will now deploy the model to Seldon Deploy using the SDK. Because the MLFlow server will need to create the conda environment, we propose two alternative methods for deployment:
1. Manually set the liveness and readiness probes; deploy using the `seldon` protocol
2. Conda-pack the environment; deploy using the `v2 protocol`

Create a function to authenticate against the cluster.  Remember to replace `XXXXX` with your cluster IP.

In [None]:
SD_IP = "XXXXX"

config = Configuration()
config.host = f"http://{SD_IP}/seldon-deploy/api/v1alpha1"
config.oidc_client_id = "sd-api"
config.oidc_server = f"http://{SD_IP}/auth/realms/deploy-realm"
config.oidc_client_secret = "sd-api-secret"
config.auth_method = "client_credentials"

def auth():
    auth = OIDCAuthenticator(config)
    config.id_token = auth.authenticate()
    api_client = ApiClient(configuration=config, authenticator=auth)
    return api_client

#### Method 1: Deploy with the Seldon protocol

During initialisation, the built-in reusable server will create the Conda environment specified on your conda.yaml file.  However, because this build can take some time, we must extend the initialDelaySeconds and failureThreshold for the probes on our deployment.

To use the built-in MLflow server the following pre-requisites need to be met:

* Your MLmodel artifact folder needs to be accessible remotely (e.g. in Google Storage).

* Your model needs to be compatible with the python_function flavour.

* Your MLproject environment needs to be specified using Conda.

Note that your cluster will also need to be able to access the internet to download all the requirements in the conda environment.

Please set `your_name` below to avoid namespace clashes in deployment:

In [None]:
your_name = "name"

In [None]:
DEPLOYMENT_NAME = f"wine-{your_name}-seldon"
NAMESPACE = "seldon-demos"
MODEL_LOCATION = f"gs://andrew-seldon/mlflow/elastic_wine_{run_id}"

mldeployment = {
  "apiVersion": "machinelearning.seldon.io/v1alpha2",
  "kind": "SeldonDeployment",
  "metadata": {
    "name": f"{DEPLOYMENT_NAME}",
    "namespace": f"{NAMESPACE}"
  },
  "spec": {
    "name": f"{DEPLOYMENT_NAME}",
    "protocol":"seldon",
    "predictors": [
      {
        "componentSpecs": [
          {
            "spec": {
              "containers": [
                {
                  "name": f"{DEPLOYMENT_NAME}-container",
                  # We are setting high failureThreshold as installing conda dependencies
                  # can take a long time and we want to avoid k8s killing the container prematurely
                  "livenessProbe": {
                    "initialDelaySeconds": 100,
                    "failureThreshold": 300,
                    "periodSeconds": 5,
                    "successThreshold": 1,
                    "httpGet": {
                      "path": "/health/ping",
                      "port": "http",
                      "scheme": "HTTP"
                    }
                  },
                  "readinessProbe": {
                    "initialDelaySeconds": 100,
                    "failureThreshold": 300,
                    "periodSeconds": 5,
                    "successThreshold": 1,
                    "httpGet": {
                      "path": "/health/ping",
                      "port": "http",
                      "scheme": "HTTP"
                    }
                  }
                }
              ]
            }
          }
        ],
        "graph": {
          "children": [],
          "implementation": "MLFLOW_SERVER",
          "modelUri": f"{MODEL_LOCATION}",
          "name": f"{DEPLOYMENT_NAME}-container"
        },
        "name": "default",
        "replicas": 1
      }
    ]
  }
}

In [None]:
deployment_api = SeldonDeploymentsApi(auth())
deployment_api.create_seldon_deployment(namespace=NAMESPACE, mldeployment=mldeployment)

Wait for the deployment to become available:

In [None]:
print("Waiting for Seldon Deployment to become available")
deployment_status = 'Not Ready'
while deployment_status != "Available":
    try:
        api_response = deployment_api.read_seldon_deployment(DEPLOYMENT_NAME, NAMESPACE)
        print(f"Deployment Status: {api_response.status.state}")
        deployment_status = api_response.status.state
    except ApiException as e:
        print("Exception when calling SeldonDeploymentsApi->read_seldon_deployment: %s\n" % e)

You can now test a prediction in the Seldon Deploy UI by using the following payload:

```
{"data": {"names": [], "ndarray": [[7.0, 0.27, 0.36, 20.7, 0.045, 45.0, 170.0, 1.001, 3.0, 0.45, 8.8]]}}
```

You should see a 200 response

#### Method 2: Deploy with the v2 protocol
In order to use the v2 protocol, it is best to use [`conda-pack`](https://conda.github.io/conda-pack/) to locally save the conda environment (including mlserver) to a tar file.  The initialiser can then use this to install required dependencies into the container.  We are planning to simplify this workflow in future releases.

Note that if you want to pack a specific conda environment, you can specify that environment like so: 

`conda pack -n mlflow-wine -o mlruns/0/$run_id/artifacts/model/environment.tar.gz -f`

In [None]:
!conda pack -n mlflow-wine -o mlruns/0/$run_id/artifacts/model/environment.tar.gz -f

Push the environment.tar.gz file to Google Storage

In [None]:
!gsutil cp -r mlruns/0/$run_id/artifacts/model/environment.tar.gz gs://andrew-seldon/mlflow/elastic_wine_$run_id/

It should be enough to simply specify the v2 protocol.  You no longer need to adjust the liveness and readiness probes.  Note that this deployment can also be done via the UI in Seldon Deploy.

In [None]:
DEPLOYMENT_NAME = f"wine-{your_name}-v2"
NAMESPACE = "seldon-demos"
MODEL_LOCATION = f"gs://andrew-seldon/mlflow/elastic_wine_{run_id}"


mldeployment = {
  "apiVersion": "machinelearning.seldon.io/v1alpha2",
  "kind": "SeldonDeployment",
  "metadata": {
    "name": f"{DEPLOYMENT_NAME}",
    "namespace": f"{NAMESPACE}"
  },
  "spec": {
    "protocol": "v2",
    "name": f"{DEPLOYMENT_NAME}",
    "predictors": [
      {
        "graph": {
          "children": [],
          "implementation": "MLFLOW_SERVER",
          "modelUri": f"{MODEL_LOCATION}",
          "name": f"{DEPLOYMENT_NAME}-container"
        },
        "name": "default",
        "replicas": 1
      }
    ]
  }
}

In [None]:
deployment_api = SeldonDeploymentsApi(auth())
deployment_api.create_seldon_deployment(namespace=NAMESPACE, mldeployment=mldeployment)

Wait until the deployment becomes available:

In [None]:
print("Waiting for Seldon Deployment to become available")
deployment_status = 'Not Ready'
while deployment_status != "Available":
    try:
        api_response = deployment_api.read_seldon_deployment(DEPLOYMENT_NAME, NAMESPACE)
        print(f"Deployment Status: {api_response.status.state}")
        deployment_status = api_response.status.state
    except ApiException as e:
        print("Exception when calling SeldonDeploymentsApi->read_seldon_deployment: %s\n" % e)

Once the model is deployed, you can now make an inference request.

For the v2 protocol, the endpoint is constructed as follows: 

> http://`host`/seldon/`namespace`/`deployment-name`/v2/models/`model-name`/infer

In [None]:
endpoint = f"http://{SD_IP}/seldon/{NAMESPACE}/{DEPLOYMENT_NAME}/v2/models/{DEPLOYMENT_NAME}-container/infer"
endpoint

The v2 protocol is slightly more verbose than the seldon protocol.  The full API spec can be found [here](https://docs.seldon.io/projects/seldon-core/en/latest/reference/apis/v2-protocol.html).

An array of `inputs` in required, with each input requiring a `name`, `shape`, `datatype`, and `data` (optionally parameters to specify the `content_type`)

In [None]:
inference_request = {
    "parameters": {
        "content_type": "pd"
    },
    "inputs": [
        {
          "name": "fixed acidity",
          "shape": [1],
          "datatype": "FP32",
          "data": [7.4]
        },
        {
          "name": "volatile acidity",
          "shape": [1],
          "datatype": "FP32",
          "data": [0.7000]
        },
        {
          "name": "citric acidity",
          "shape": [1],
          "datatype": "FP32",
          "data": [0]
        },
        {
          "name": "residual sugar",
          "shape": [1],
          "datatype": "FP32",
          "data": [1.9]
        },
        {
          "name": "chlorides",
          "shape": [1],
          "datatype": "FP32",
          "data": [0.076]
        },
        {
          "name": "free sulfur dioxide",
          "shape": [1],
          "datatype": "FP32",
          "data": [11]
        },
        {
          "name": "total sulfur dioxide",
          "shape": [1],
          "datatype": "FP32",
          "data": [34]
        },
        {
          "name": "density",
          "shape": [1],
          "datatype": "FP32",
          "data": [0.9978]
        },
        {
          "name": "pH",
          "shape": [1],
          "datatype": "FP32",
          "data": [3.51]
        },
        {
          "name": "sulphates",
          "shape": [1],
          "datatype": "FP32",
          "data": [0.56]
        },
        {
          "name": "alcohol",
          "shape": [1],
          "datatype": "FP32",
          "data": [9.4]
        }
    ]
}

In [None]:
response = requests.post(endpoint, json=inference_request)
print(json.dumps(response.json(), indent=2))

### 7. Using MLServer Codecs

Crafting the payload is quite a lot of effort to do manually.  MLServer builds upon the v2 inference protocol and adds the concept of **codecs**.  This enables you to easily encode your data (e.g. pandas dataframes, numpy arrays, strings, etc.) into the v2 protocol, and decode from the v2 protocol back into your preferred data type.  More details on this can be found in the MLServer docs [here](https://mlserver.readthedocs.io/en/latest/user-guide/content-type.html).

In [None]:
from mlserver.types import InferenceRequest, InferenceResponse
from mlserver.codecs import PandasCodec, NumpyCodec

Let's grab the first 5 lines in our original wine data, removing the target `quality`:

In [None]:
pd_input = data.drop(["quality"], axis=1).head(5)
pd_input

Next we will encode the request from a pandas dataframe to the v2 protocol using the `PandasCodec`:

In [None]:
request = PandasCodec.encode_request(pd_input)
request.parameters = {"content_type": "pd"}
request.dict()

We can now make our request, as we did before:

In [None]:
response = requests.post(endpoint, json=request.dict())
print(json.dumps(response.json(), indent=2))

Finally, we parse the output response by turning it into an `InferenceResponse` object, and decoding it using the `NumpyCodec` 

In [None]:
response_payload = InferenceResponse.parse_raw(response.text)
print(NumpyCodec.decode_output(response_payload.outputs[0]))