## Hands-On Tabular Workshop: Wine classifier with MLFlow, MinIO, and Seldon

In this workshop, we will run an ML Flow model, save the artefacts to MinIO, and create a v2 protocol SeldonDeployment via the SDK

We will follow these steps:
1. Setup environment
2. Run training experiments using MLFlow
3. Conda pack the environment
4. Push artefacts to Google Storage
5. Deploy via the SDK
6. Using MLServer codecs
7. [Optional] Create a new MLFlow run with different hyperparametes, and deploy a canary model
8. [Optional] Train and deploy a drift detector
9. [Optional] Train and deploy an explainer

### 1. Setup environment

In your terminal, create a new virtual environment:
`conda create -n mlflow-wine python=3.8`

Activate the new environment:
`conda activate mlflow-wine`

Check that you are in the correct environment

Install required dependencies into your environment

In [2]:
!pip install seldon_deploy_sdk
!pip install conda-pack
!pip install mlserver
!pip install mlserver-mlflow
!pip install sklearn
!pip install mlflow
!sudo apt install tree

Reading package lists... Done
Building dependency tree       
Reading state information... Done
tree is already the newest version (1.8.0-1).
The following package was automatically installed and is no longer required:
  libfwupdplugin1
Use 'sudo apt autoremove' to remove it.
0 upgraded, 0 newly installed, 0 to remove and 0 not upgraded.


In [10]:
from seldon_deploy_sdk import Configuration, PredictApi, ApiClient, SeldonDeploymentsApi, ModelMetadataServiceApi, DriftDetectorApi, BatchJobsApi, BatchJobDefinition
from seldon_deploy_sdk.auth import OIDCAuthenticator
from seldon_deploy_sdk.rest import ApiException
import mlflow
import pandas as pd
import json
import requests
import os
import glob

### 2. Run training experiments using MLFlow
For our example, we will use the elastic net wine example from [MLflow's tutorial](https://github.com/mlflow/mlflow/tree/master/examples/sklearn_elasticnet_wine).  We'll use Scikit-learn to predict the `quality` of the wine given certain attributes as features.

Let's load the data to see what's inside:

In [54]:
data = pd.read_csv("http://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-red.csv", sep=';')
data.head()

Unnamed: 0,fixed acidity,volatile acidity,citric acid,residual sugar,chlorides,free sulfur dioxide,total sulfur dioxide,density,pH,sulphates,alcohol,quality
0,7.4,0.7,0.0,1.9,0.076,11.0,34.0,0.9978,3.51,0.56,9.4,5
1,7.8,0.88,0.0,2.6,0.098,25.0,67.0,0.9968,3.2,0.68,9.8,5
2,7.8,0.76,0.04,2.3,0.092,15.0,54.0,0.997,3.26,0.65,9.8,5
3,11.2,0.28,0.56,1.9,0.075,17.0,60.0,0.998,3.16,0.58,9.8,6
4,7.4,0.7,0.0,1.9,0.076,11.0,34.0,0.9978,3.51,0.56,9.4,5


Below we'll define a `train` function that can be run with two hyperparameters as inputs (`alpha` and `l1_ratio`) to train an ElasticNet linear model from Scikit-Learn.  It will download the dataset, define the train/test split, start an MLFLow run, log params and metrics, and save the model. 

In [5]:
# Wine Quality Sample
def train(in_alpha, in_l1_ratio):
    import os
    import warnings
    import sys

    import pandas as pd
    import numpy as np
    from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score
    from sklearn.model_selection import train_test_split
    from sklearn.linear_model import ElasticNet

    import mlflow
    import mlflow.sklearn
    
    import logging
    logging.basicConfig(level=logging.WARN)
    logger = logging.getLogger(__name__)

    def eval_metrics(actual, pred):
        rmse = np.sqrt(mean_squared_error(actual, pred))
        mae = mean_absolute_error(actual, pred)
        r2 = r2_score(actual, pred)
        return rmse, mae, r2


    warnings.filterwarnings("ignore")
    np.random.seed(40)

    # Read the wine-quality csv file from the URL
    csv_url =\
        'http://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-red.csv'
    try:
        data = pd.read_csv(csv_url, sep=';')
    except Exception as e:
        logger.exception(
            "Unable to download training & test CSV, check your internet connection. Error: %s", e)

    # Split the data into training and test sets. (0.75, 0.25) split.
    train, test = train_test_split(data)

    # The predicted column is "quality" which is a scalar from [3, 9]
    train_x = train.drop(["quality"], axis=1)
    test_x = test.drop(["quality"], axis=1)
    train_y = train[["quality"]]
    test_y = test[["quality"]]

    # Set default values if no alpha is provided
    if float(in_alpha) is None:
        alpha = 0.5
    else:
        alpha = float(in_alpha)

    # Set default values if no l1_ratio is provided
    if float(in_l1_ratio) is None:
        l1_ratio = 0.5
    else:
        l1_ratio = float(in_l1_ratio)

    # Useful for multiple runs (only doing one run in this sample notebook)    
    with mlflow.start_run():
        # Execute ElasticNet
        lr = ElasticNet(alpha=alpha, l1_ratio=l1_ratio, random_state=42)
        lr.fit(train_x, train_y)

        # Evaluate Metrics
        predicted_qualities = lr.predict(test_x)
        (rmse, mae, r2) = eval_metrics(test_y, predicted_qualities)

        # Print out metrics
        print("Elasticnet model (alpha=%f, l1_ratio=%f):" % (alpha, l1_ratio))
        print("  RMSE: %s" % rmse)
        print("  MAE: %s" % mae)
        print("  R2: %s" % r2)

        # Log parameter, metrics, and model to MLflow
        mlflow.log_param("alpha", alpha)
        mlflow.log_param("l1_ratio", l1_ratio)
        mlflow.log_metric("rmse", rmse)
        mlflow.log_metric("r2", r2)
        mlflow.log_metric("mae", mae)

        mlflow.sklearn.log_model(lr, "model")

Now we'll train a few models with various combinations of hyperparameters

In [6]:
train(0.5, 0.5)

Elasticnet model (alpha=0.500000, l1_ratio=0.500000):
  RMSE: 0.793164022927685
  MAE: 0.6271946374319586
  R2: 0.10862644997792636


In [13]:
train(0.2, 0.2)

Elasticnet model (alpha=0.200000, l1_ratio=0.200000):
  RMSE: 0.7336400911821402
  MAE: 0.5643841279275427
  R2: 0.2373946606358417


In [14]:
train(0.1, 0.1)

Elasticnet model (alpha=0.100000, l1_ratio=0.100000):
  RMSE: 0.7128829045893679
  MAE: 0.5462202174984665
  R2: 0.2799376066653344


The model training runs have been saved in the `mlruns` folder, under the experiment `0`:

In [3]:
!tree -L 1 mlruns/0

[01;34mmlruns/0[00m
├── [01;34m234a355b97144ff78e05d6871ff8812c[00m
├── [01;34m3239a7e5a42141138b1afcf65a0390f5[00m
├── [01;34m5492ed0a8953405b82003a5eccf6550a[00m
├── [01;34mdffb498913d8423a93998d93b6f85817[00m
└── meta.yaml

4 directories, 1 file


Inside each of these folders, MLflow stores the parameters we used to train our model, any metric we logged during training, and a snapshot of our model. If we look into one of them, we can see the following structure:

In [4]:
!tree mlruns/0/$(ls mlruns/0 | head -1)

[01;34mmlruns/0/234a355b97144ff78e05d6871ff8812c[00m
├── [01;34martifacts[00m
│   └── [01;34mmodel[00m
│       ├── MLmodel
│       ├── conda.yaml
│       ├── model.pkl
│       ├── python_env.yaml
│       └── requirements.txt
├── meta.yaml
├── [01;34mmetrics[00m
│   ├── mae
│   ├── r2
│   └── rmse
├── [01;34mparams[00m
│   ├── alpha
│   └── l1_ratio
└── [01;34mtags[00m
    ├── mlflow.log-model.history
    ├── mlflow.runName
    ├── mlflow.source.git.commit
    ├── mlflow.source.name
    ├── mlflow.source.type
    └── mlflow.user

5 directories, 17 files


In particular, we are interested in `artifacts/model` directory in order to deploy your model.  The `MLmodel` file describes the "flavor" for that model, including details on how to serve the model.  This helps deployment tools like Seldon deploy MLFlow models in a consistent way, while retaining the ability to train using various frameworks in MLFlow.

In [6]:
!cat mlruns/0/$(ls mlruns/0 | head -1)/artifacts/model/MLmodel

artifact_path: model
flavors:
  python_function:
    env: conda.yaml
    loader_module: mlflow.sklearn
    model_path: model.pkl
    predict_fn: predict
    python_version: 3.8.13
  sklearn:
    code: null
    pickled_model: model.pkl
    serialization_format: cloudpickle
    sklearn_version: 1.1.2
mlflow_version: 1.29.0
model_uuid: ff02293d1cad453b982d8576bf4e2ca1
run_id: 234a355b97144ff78e05d6871ff8812c
utc_time_created: '2022-10-17 10:20:02.111134'


We can also view the dependencies that will be required to serve the model

In [7]:
!cat mlruns/0/$(ls mlruns/0 | head -1)/artifacts/model/conda.yaml

channels:
- conda-forge
dependencies:
- python=3.8.13
- pip<=22.2.2
- pip:
  - mlflow
  - cloudpickle==2.2.0
  - psutil==5.9.2
  - scikit-learn==1.1.2
  - typing-extensions==4.4.0
name: mlflow-env


This file stores the details of how the model was stored. With this information (plus the other files in the folder), we are able to load the model back. Seldon’s MLflow server will use this information to serve this model.

### 4. Push artefacts to Google Storage

We'll first select the model that scored the best in terms of `rmse`, and get the run ID

In [17]:
runs_df = mlflow.search_runs(filter_string="metrics.rmse < 1")
runs_df

Unnamed: 0,run_id,experiment_id,status,artifact_uri,start_time,end_time,metrics.r2,metrics.rmse,metrics.mae,params.alpha,params.l1_ratio,tags.mlflow.source.type,tags.mlflow.source.name,tags.mlflow.user,tags.mlflow.log-model.history,tags.mlflow.source.git.commit,tags.mlflow.runName
0,dffb498913d8423a93998d93b6f85817,0,FINISHED,file:///home/andrew/seldon/repos/deploy-worksh...,2022-10-17 10:23:14.444000+00:00,2022-10-17 10:23:15.448000+00:00,0.279938,0.712883,0.54622,0.1,0.1,LOCAL,/home/andrew/.asdf/installs/python/miniconda3-...,andrew,"[{""run_id"": ""dffb498913d8423a93998d93b6f85817""...",7e7a1faa70e1aa58c5321be0f6044d19eaf4dfc6,adventurous-ray-888
1,3239a7e5a42141138b1afcf65a0390f5,0,FINISHED,file:///home/andrew/seldon/repos/deploy-worksh...,2022-10-17 10:23:02.334000+00:00,2022-10-17 10:23:03.423000+00:00,0.237395,0.73364,0.564384,0.2,0.2,LOCAL,/home/andrew/.asdf/installs/python/miniconda3-...,andrew,"[{""run_id"": ""3239a7e5a42141138b1afcf65a0390f5""...",7e7a1faa70e1aa58c5321be0f6044d19eaf4dfc6,efficient-eel-612
2,234a355b97144ff78e05d6871ff8812c,0,FINISHED,file:///home/andrew/seldon/repos/deploy-worksh...,2022-10-17 10:20:02.083000+00:00,2022-10-17 10:20:03.512000+00:00,0.108626,0.793164,0.627195,0.5,0.5,LOCAL,/home/andrew/.asdf/installs/python/miniconda3-...,andrew,"[{""run_id"": ""234a355b97144ff78e05d6871ff8812c""...",7e7a1faa70e1aa58c5321be0f6044d19eaf4dfc6,fortunate-owl-638


In [19]:
run_id = runs_df.loc[runs_df['metrics.rmse'].idxmin()]['run_id']
run_id

'dffb498913d8423a93998d93b6f85817'

In [23]:
!gsutil cp -r mlruns/0/$run_id/artifacts/model/ gs://andrew-seldon/mlflow/elastic_wine_$run_id

Copying file://./mlruns/0/dffb498913d8423a93998d93b6f85817/artifacts/model/model.pkl [Content-Type=application/octet-stream]...
Copying file://./mlruns/0/dffb498913d8423a93998d93b6f85817/artifacts/model/conda.yaml [Content-Type=application/octet-stream]...
Copying file://./mlruns/0/dffb498913d8423a93998d93b6f85817/artifacts/model/MLmodel [Content-Type=application/octet-stream]...
Copying file://./mlruns/0/dffb498913d8423a93998d93b6f85817/artifacts/model/requirements.txt [Content-Type=text/plain]...
- [4 files][  1.6 KiB/  1.6 KiB]                                                
==> NOTE: You are performing a sequence of gsutil operations that may
run significantly faster if you instead use gsutil -m cp ... Please
see the -m section under "gsutil help options" for further information
about when gsutil -m can be advantageous.

Copying file://./mlruns/0/dffb498913d8423a93998d93b6f85817/artifacts/model/python_env.yaml [Content-Type=application/octet-stream]...
- [5 files][  1.7 KiB/  1.7 K

In [25]:
!gsutil ls gs://andrew-seldon/mlflow/elastic_wine_$run_id

gs://andrew-seldon/mlflow/elastic_wine_dffb498913d8423a93998d93b6f85817/MLmodel
gs://andrew-seldon/mlflow/elastic_wine_dffb498913d8423a93998d93b6f85817/conda.yaml
gs://andrew-seldon/mlflow/elastic_wine_dffb498913d8423a93998d93b6f85817/model.pkl
gs://andrew-seldon/mlflow/elastic_wine_dffb498913d8423a93998d93b6f85817/python_env.yaml
gs://andrew-seldon/mlflow/elastic_wine_dffb498913d8423a93998d93b6f85817/requirements.txt


### 5. Deploy via the SDK

We will now deploy the model to Seldon Deploy using the SDK. Because the MLFlow server will need to create the conda environment, we propose two alternative methods for deployment:
1. Manually set the liveness and readiness probes; deploy using the `seldon` protocol
2. Conda-pack the environment; deploy using the `v2 protocol`

Create a function to authenticate against the cluster.  Remember to replace `XXXXX` with your cluster IP.

In [26]:
SD_IP = "34.74.124.230"

config = Configuration()
config.host = f"http://{SD_IP}/seldon-deploy/api/v1alpha1"
config.oidc_client_id = "sd-api"
config.oidc_server = f"http://{SD_IP}/auth/realms/deploy-realm"
config.oidc_client_secret = "sd-api-secret"
config.auth_method = "client_credentials"

def auth():
    auth = OIDCAuthenticator(config)
    config.id_token = auth.authenticate()
    api_client = ApiClient(configuration=config, authenticator=auth)
    return api_client

#### Method 1: Deploy with the Seldon protocol

During initialisation, the built-in reusable server will create the Conda environment specified on your conda.yaml file.  However, because this build can take some time, we must extend the initialDelaySeconds and failureThreshold for the probes on our deployment.

To use the built-in MLflow server the following pre-requisites need to be met:

* Your MLmodel artifact folder needs to be accessible remotely (e.g. in Google Storage).

* Your model needs to be compatible with the python_function flavour.

* Your MLproject environment needs to be specified using Conda.

Note that your cluster will also need to be able to access the internet to download all the requirements in the conda environment.

Please set your name below to avoid namespace clashes in deployment:

In [27]:
your_name = "andrew"

In [28]:
DEPLOYMENT_NAME = f"wine-{your_name}-seldon"
NAMESPACE = "seldon-demos"
MODEL_LOCATION = f"gs://andrew-seldon/mlflow/elastic_wine_{run_id}"

mldeployment = {
  "apiVersion": "machinelearning.seldon.io/v1alpha2",
  "kind": "SeldonDeployment",
  "metadata": {
    "name": f"{DEPLOYMENT_NAME}",
    "namespace": f"{NAMESPACE}"
  },
  "spec": {
    "name": f"{DEPLOYMENT_NAME}",
    "protocol":"seldon",
    "predictors": [
      {
        "componentSpecs": [
          {
            "spec": {
              "containers": [
                {
                  "name": f"{DEPLOYMENT_NAME}-container",
                  # We are setting high failureThreshold as installing conda dependencies
                  # can take a long time and we want to avoid k8s killing the container prematurely
                  "livenessProbe": {
                    "initialDelaySeconds": 100,
                    "failureThreshold": 300,
                    "periodSeconds": 5,
                    "successThreshold": 1,
                    "httpGet": {
                      "path": "/health/ping",
                      "port": "http",
                      "scheme": "HTTP"
                    }
                  },
                  "readinessProbe": {
                    "initialDelaySeconds": 100,
                    "failureThreshold": 300,
                    "periodSeconds": 5,
                    "successThreshold": 1,
                    "httpGet": {
                      "path": "/health/ping",
                      "port": "http",
                      "scheme": "HTTP"
                    }
                  }
                }
              ]
            }
          }
        ],
        "graph": {
          "children": [],
          "implementation": "MLFLOW_SERVER",
          "modelUri": f"{MODEL_LOCATION}",
          "name": f"{DEPLOYMENT_NAME}-container"
        },
        "name": "default",
        "replicas": 1
      }
    ]
  }
}

In [29]:
deployment_api = SeldonDeploymentsApi(auth())
deployment_api.create_seldon_deployment(namespace=NAMESPACE, mldeployment=mldeployment)

{'api_version': 'machinelearning.seldon.io/v1alpha2',
 'kind': 'SeldonDeployment',
 'metadata': {'annotations': None,
              'cluster_name': None,
              'creation_timestamp': None,
              'deletion_grace_period_seconds': None,
              'deletion_timestamp': None,
              'finalizers': None,
              'generate_name': None,
              'generation': None,
              'labels': None,
              'managed_fields': None,
              'name': 'wine-andrew-seldon',
              'namespace': 'seldon-demos',
              'owner_references': None,
              'resource_version': None,
              'self_link': None,
              'uid': None},
 'spec': {'annotations': None,
          'name': 'wine-andrew-seldon',
          'oauth_key': None,
          'oauth_secret': None,
          'predictors': [{'annotations': None,
                          'component_specs': [{'hpa_spec': None,
                                               'keda_spec': None

Wait for the deployment to become available:

In [None]:
print("Waiting for Seldon Deployment to become available")
deployment_status = 'Not Ready'
while deployment_status != "Available":
    try:
        api_response = deployment_api.read_seldon_deployment(DEPLOYMENT_NAME, NAMESPACE)
        print(f"Deployment Status: {api_response.status.state}")
        deployment_status = api_response.status.state
    except ApiException as e:
        print("Exception when calling SeldonDeploymentsApi->read_seldon_deployment: %s\n" % e)

You can now test a prediction in the Seldon Deploy UI by using the following payload:

```
{"data": {"names": [], "ndarray": [[7.0, 0.27, 0.36, 20.7, 0.045, 45.0, 170.0, 1.001, 3.0, 0.45, 8.8]]}}
```

You should see a 200 response

#### Method 2: Deploy with the v2 protocol
In order to use the v2 protocol, it is best to use [`conda-pack`](https://conda.github.io/conda-pack/) to locally save the conda environment (including mlserver) to a tar file.  The initialiser can then use this to install required dependencies into the container.  We are planning to simplify this workflow in future releases.

Note that if you want to pack a specific conda environment, you can specify that environment like so: 

`conda pack -n my_env -o mlruns/0/$run_id/artifacts/model/environment.tar.gz -f`

In [30]:
!conda pack -o mlruns/0/$run_id/artifacts/model/environment.tar.gz -f

Collecting packages...
Packing environment at '/home/andrew/.asdf/installs/python/miniconda3-4.7.12/envs/mlflow-test' to 'mlruns/0/dffb498913d8423a93998d93b6f85817/artifacts/model/environment.tar.gz'
[########################################] | 100% Completed | 18.7s


Push the environment.tar.gz file to Google Storage

In [31]:
!gsutil cp -r mlruns/0/$run_id/artifacts/model/environment.tar.gz gs://andrew-seldon/mlflow/elastic_wine_$run_id/

Copying file://mlruns/0/dffb498913d8423a93998d93b6f85817/artifacts/model/environment.tar.gz [Content-Type=application/x-tar]...
==> NOTE: You are uploading one or more large file(s), which would run          
significantly faster if you enable parallel composite uploads. This
feature can be enabled by editing the
"parallel_composite_upload_threshold" value in your .boto
configuration file. However, note that if you do this large files will
be uploaded as `composite objects
<https://cloud.google.com/storage/docs/composite-objects>`_,which
means that any user who downloads such objects will need to have a
compiled crcmod installed (see "gsutil help crcmod"). This is because
without a compiled crcmod, computing checksums on composite objects is
so slow that gsutil disables downloads of composite objects.

/ [1 files][217.6 MiB/217.6 MiB]                                                
Operation completed over 1 objects/217.6 MiB.                                    


It should be enough to simply specify the v2 protocol.  You no longer need to adjust the liveness and readiness probes.  Note that this deployment can also be done via the UI in Seldon Deploy.

In [32]:
DEPLOYMENT_NAME = f"wine-{your_name}-v2"
NAMESPACE = "seldon-demos"
MODEL_LOCATION = f"gs://andrew-seldon/mlflow/elastic_wine_{run_id}"


mldeployment = {
  "apiVersion": "machinelearning.seldon.io/v1alpha2",
  "kind": "SeldonDeployment",
  "metadata": {
    "name": f"{DEPLOYMENT_NAME}",
    "namespace": f"{NAMESPACE}"
  },
  "spec": {
    "protocol": "v2",
    "name": f"{DEPLOYMENT_NAME}",
    "predictors": [
      {
        "graph": {
          "children": [],
          "implementation": "MLFLOW_SERVER",
          "modelUri": f"{MODEL_LOCATION}",
          "name": f"{DEPLOYMENT_NAME}-container"
        },
        "name": "default",
        "replicas": 1
      }
    ]
  }
}

In [33]:
deployment_api = SeldonDeploymentsApi(auth())
deployment_api.create_seldon_deployment(namespace=NAMESPACE, mldeployment=mldeployment)

{'api_version': 'machinelearning.seldon.io/v1alpha2',
 'kind': 'SeldonDeployment',
 'metadata': {'annotations': None,
              'cluster_name': None,
              'creation_timestamp': None,
              'deletion_grace_period_seconds': None,
              'deletion_timestamp': None,
              'finalizers': None,
              'generate_name': None,
              'generation': None,
              'labels': None,
              'managed_fields': None,
              'name': 'wine-andrew-v2',
              'namespace': 'seldon-demos',
              'owner_references': None,
              'resource_version': None,
              'self_link': None,
              'uid': None},
 'spec': {'annotations': None,
          'name': 'wine-andrew-v2',
          'oauth_key': None,
          'oauth_secret': None,
          'predictors': [{'annotations': None,
                          'component_specs': None,
                          'engine_resources': {'limits': None,
                       

Wait until the deployment becomes available:

In [36]:
print("Waiting for Seldon Deployment to become available")
deployment_status = 'Not Ready'
while deployment_status != "Available":
    try:
        api_response = deployment_api.read_seldon_deployment(DEPLOYMENT_NAME, NAMESPACE)
        print(f"Deployment Status: {api_response.status.state}")
        deployment_status = api_response.status.state
    except ApiException as e:
        print("Exception when calling SeldonDeploymentsApi->read_seldon_deployment: %s\n" % e)

Waiting for Seldon Deployment to become available
Deployment Status: Available


Once the model is deployed, you can now make an inference request.

For the v2 protocol, the endpoint is constructed as follows: 

> http://`host`/seldon/`namespace`/`deployment-name`/v2/models/`model-name`/infer

In [44]:
endpoint = f"http://{SD_IP}/seldon/{NAMESPACE}/{DEPLOYMENT_NAME}/v2/models/{DEPLOYMENT_NAME}-container/infer"
endpoint

'http://34.74.124.230/seldon/seldon-demos/wine-andrew-v2/v2/models/wine-andrew-v2-container/infer'

The v2 protocol is slightly more verbose than the seldon protocol.  The full API spec can be found [here](https://docs.seldon.io/projects/seldon-core/en/latest/reference/apis/v2-protocol.html).

An array of `inputs` in required, with each input requiring a `name`, `shape`, `datatype`, and `data` (optionally parameters to specify the `content_type`)

In [135]:
inference_request = {
    "parameters": {
        "content_type": "pd"
    },
    "inputs": [
        {
          "name": "fixed acidity",
          "shape": [1],
          "datatype": "FP32",
          "data": [7.4]
        },
        {
          "name": "volatile acidity",
          "shape": [1],
          "datatype": "FP32",
          "data": [0.7000]
        },
        {
          "name": "citric acidity",
          "shape": [1],
          "datatype": "FP32",
          "data": [0]
        },
        {
          "name": "residual sugar",
          "shape": [1],
          "datatype": "FP32",
          "data": [1.9]
        },
        {
          "name": "chlorides",
          "shape": [1],
          "datatype": "FP32",
          "data": [0.076]
        },
        {
          "name": "free sulfur dioxide",
          "shape": [1],
          "datatype": "FP32",
          "data": [11]
        },
        {
          "name": "total sulfur dioxide",
          "shape": [1],
          "datatype": "FP32",
          "data": [34]
        },
        {
          "name": "density",
          "shape": [1],
          "datatype": "FP32",
          "data": [0.9978]
        },
        {
          "name": "pH",
          "shape": [1],
          "datatype": "FP32",
          "data": [3.51]
        },
        {
          "name": "sulphates",
          "shape": [1],
          "datatype": "FP32",
          "data": [0.56]
        },
        {
          "name": "alcohol",
          "shape": [1],
          "datatype": "FP32",
          "data": [9.4]
        }
    ]
}

In [136]:
response = requests.post(endpoint, json=inference_request)
print(json.dumps(response.json(), indent=2))

{
  "model_name": "wine-andrew-v2-container",
  "model_version": "v1",
  "id": "c734ed0c-4bb4-4346-babe-74a2fba1b139",
  "parameters": {
    "content_type": null,
    "headers": null
  },
  "outputs": [
    {
      "name": "output-1",
      "shape": [
        1
      ],
      "datatype": "FP64",
      "parameters": null,
      "data": [
        5.224716827480625
      ]
    }
  ]
}


### 7. Using MLServer Codecs

Crafting the payload is quite a lot of effort to do manually.  MLServer builds upon the v2 inference protocol and adds the concept of **codecs**.  This enables you to easily encode your data (e.g. pandas dataframes, numpy arrays, strings, etc.) into the v2 protocol, and decode from the v2 protocol back into your preferred data type.  More details on this can be found in the MLServer docs [here](https://mlserver.readthedocs.io/en/latest/user-guide/content-type.html).

In [150]:
from mlserver.types import InferenceRequest, InferenceResponse
from mlserver.codecs import PandasCodec, NumpyCodec

Let's grab the first 5 lines in our original wine data, removing the target `quality`:

In [145]:
pd_input = data.drop(["quality"], axis=1).head(5)
pd_input

Unnamed: 0,fixed acidity,volatile acidity,citric acid,residual sugar,chlorides,free sulfur dioxide,total sulfur dioxide,density,pH,sulphates,alcohol
0,7.4,0.7,0.0,1.9,0.076,11.0,34.0,0.9978,3.51,0.56,9.4
1,7.8,0.88,0.0,2.6,0.098,25.0,67.0,0.9968,3.2,0.68,9.8
2,7.8,0.76,0.04,2.3,0.092,15.0,54.0,0.997,3.26,0.65,9.8
3,11.2,0.28,0.56,1.9,0.075,17.0,60.0,0.998,3.16,0.58,9.8
4,7.4,0.7,0.0,1.9,0.076,11.0,34.0,0.9978,3.51,0.56,9.4


Next we will encode the request from a pandas dataframe to the v2 protocol using the `PandasCodec`:

In [152]:
request = PandasCodec.encode_request(pd_input)
request.parameters = {"content_type": "pd"}
request.dict()

{'id': None,
 'parameters': {'content_type': 'pd'},
 'inputs': [{'name': 'fixed acidity',
   'shape': [5],
   'datatype': 'FP64',
   'parameters': None,
   'data': [7.4, 7.8, 7.8, 11.2, 7.4]},
  {'name': 'volatile acidity',
   'shape': [5],
   'datatype': 'FP64',
   'parameters': None,
   'data': [0.7, 0.88, 0.76, 0.28, 0.7]},
  {'name': 'citric acid',
   'shape': [5],
   'datatype': 'FP64',
   'parameters': None,
   'data': [0.0, 0.0, 0.04, 0.56, 0.0]},
  {'name': 'residual sugar',
   'shape': [5],
   'datatype': 'FP64',
   'parameters': None,
   'data': [1.9, 2.6, 2.3, 1.9, 1.9]},
  {'name': 'chlorides',
   'shape': [5],
   'datatype': 'FP64',
   'parameters': None,
   'data': [0.076, 0.098, 0.092, 0.075, 0.076]},
  {'name': 'free sulfur dioxide',
   'shape': [5],
   'datatype': 'FP64',
   'parameters': None,
   'data': [11.0, 25.0, 15.0, 17.0, 11.0]},
  {'name': 'total sulfur dioxide',
   'shape': [5],
   'datatype': 'FP64',
   'parameters': None,
   'data': [34.0, 67.0, 54.0, 60.0,

We can now make our request, as we did before:

In [147]:
response = requests.post(endpoint, json=request.dict())
print(json.dumps(response.json(), indent=2))

{
  "model_name": "wine-andrew-v2-container",
  "model_version": "v1",
  "id": "be95825b-462f-4794-b250-24167b184490",
  "parameters": {
    "content_type": null,
    "headers": null
  },
  "outputs": [
    {
      "name": "output-1",
      "shape": [
        5
      ],
      "datatype": "FP64",
      "parameters": null,
      "data": [
        5.224716933810146,
        5.319611374845643,
        5.317555257829017,
        5.612144611164206,
        5.224716933810146
      ]
    }
  ]
}


Finally, we parse the output response by turning it into an `InferenceResponse` object, and decoding it using the `NumpyCodec` 

In [151]:
response_payload = InferenceResponse.parse_raw(response.text)
print(NumpyCodec.decode_output(response_payload.outputs[0]))

[5.22471693 5.31961137 5.31755526 5.61214461 5.22471693]
