# Serving Alibi-Detect models

Out of the box, `mlserver` supports the deployment and serving of `alibi-detect` models.

In this example, we will cover how we can create a detector configuration to then serve it using `mlserver`.

## Training

The first step will be to train a simple `alibi-detect` model.
For that, we will use the [income Classifier example from the `alibi-detect` documentation](https://docs.seldon.io/projects/alibi-detect/en/latest/examples/cd_chi2ks_adult.html) which trains a drift detector.

In [188]:
import alibi
import matplotlib.pyplot as plt
import numpy as np

In [189]:
adult = alibi.datasets.fetch_adult()
X, y = adult.data, adult.target
feature_names = adult.feature_names
category_map = adult.category_map
X.shape, y.shape

((32561, 12), (32561,))

In [190]:
n_ref = 10000
n_test = 10000

X_ref, X_t0, X_t1 = X[:n_ref], X[n_ref:n_ref + n_test], X[n_ref + n_test:n_ref + 2 * n_test]
X_ref.shape, X_t0.shape, X_t1.shape

((10000, 12), (10000, 12), (10000, 12))

In [191]:
categories_per_feature = {f: None for f in list(category_map.keys())}

### Saving our reference data

In [192]:
import pickle
filepath = 'alibi-detector-artifacts/ref_data.pkl'  # change to directory where detector is saved
pickle.dump(X_ref, open(filepath,"wb"))

## Serving

Now that we have trained and saved our model, the next step will be to serve it using `mlserver`. 
For that, we will need to create 2 configuration files: 

- `settings.json`: holds the configuration of our server (e.g. ports, log level, etc.).
- `model-settings.json`: holds the configuration of our model (e.g. input type, runtime to use, etc.).

### `settings.json`

In [193]:
%%writefile settings.json
{
    "debug": "true"
}

Overwriting settings.json


### `model-settings.json`

In [194]:
%%writefile model-settings.json
{
  "name": "income-classifier-cd",
  "implementation": "mlserver_alibi_detect.TabularDriftDetector",
  "parameters": {
    "uri": "./alibi-detector-artifacts/ref_data.pkl",
    "version": "v0.1.0",
    "initParameters": {
      "protocol": "kfserving.http",
      "p_val": 0.05,
      "categories_per_feature": {
        "1": null,
        "2": null,
        "3": null,
        "4": null,
        "5": null,
        "6": null,
        "7": null,
        "11": null
      }
    },
    "predictParameters": {
      "drift_type": "feature"
    }
  }
}

Overwriting model-settings.json


### Start serving our model

Now that we have our config in-place, we can start the server by running `mlserver start .`. This needs to either be ran from the same directory where our config files are or pointing to the folder where they are.

```shell
mlserver start .
```

Since this command will start the server and block the terminal, waiting for requests, this will need to be ran in the background on a separate terminal.

### Send test inference request

We now have our model being served by `mlserver`.
To make sure that everything is working as expected, let's send a request from our test set.

For that, we can use the Python types that `mlserver` provides out of box, or we can build our request manually.

# TabularDrift

In [195]:
from alibi_detect.cd import ChiSquareDrift, TabularDrift
cd = TabularDrift(X_ref, p_val=.05, categories_per_feature=categories_per_feature)
y = cd.predict(X_t0,drift_type="feature")

In [197]:
import requests

inference_request = {
    "parameters": {"drift_type": "feature",},
    "inputs": [
        {
            "name": "predict",
            "shape": X_t0.shape,
            "datatype": "FP32",
            "data": X_t0.tolist(),
        }
    ],
}

endpoint = "http://localhost:8080/v2/models/income-classifier-cd/versions/v0.1.0/infer"
response = requests.post(endpoint, json=inference_request)

In [198]:
import json
response_dict = json.loads(response.text)
print(response_dict,"\n")

labels = ['No!', 'Yes!']
for f in range(cd.n_features):
    stat = 'Chi2' if f in list(categories_per_feature.keys()) else 'K-S'
    fname = feature_names[f]
    is_drift = response_dict['outputs'][0]['data'][f]
    print(f'{fname} -- Drift? {labels[is_drift]}')

{'model_name': 'income-classifier-cd', 'model_version': 'v0.1.0', 'id': '67cb8c4d-3f71-4046-881e-f9fd19c0112d', 'parameters': {'content_type': None, 'detector_type': 'offline', 'data_type': None, 'name': 'TabularDrift'}, 'outputs': [{'name': 'detect', 'shape': [12], 'datatype': 'INT64', 'parameters': None, 'data': [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]}]} 

Age -- Drift? No!
Workclass -- Drift? No!
Education -- Drift? No!
Marital Status -- Drift? No!
Occupation -- Drift? No!
Relationship -- Drift? No!
Race -- Drift? No!
Sex -- Drift? No!
Capital Gain -- Drift? No!
Capital Loss -- Drift? No!
Hours per week -- Drift? No!
Country -- Drift? No!


In [199]:
import requests

inference_request = {
    "inputs": [
        {
            "name": "predict",
            "shape": X_t0.shape,
            "datatype": "FP32",
            "data": X_t0.tolist(),
        }
    ],
}

endpoint = "http://localhost:8080/"
response = requests.post(endpoint, json=inference_request)

In [200]:
import json
response_dict = json.loads(response.text)
print(response_dict,"\n")

labels = ['No!', 'Yes!']
for f in range(cd.n_features):
    stat = 'Chi2' if f in list(categories_per_feature.keys()) else 'K-S'
    fname = feature_names[f]
    is_drift = response_dict['data']['is_drift'][f]
    stat_val, p_val = response_dict['data']['distance'][f], response_dict['data']['p_val'][f]
    print(f'{fname} -- Drift? {labels[is_drift]} -- {stat} {stat_val:.3f} -- p-value {p_val:.3f}')

{'data': {'is_drift': [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], 'distance': [0.011599999852478504, 8.486513137817383, 4.752931594848633, 3.1599440574645996, 8.194136619567871, 0.4845852553844452, 0.5865231156349182, 0.21689055860042572, 0.002400000113993883, 0.0015999999595806003, 0.011599999852478504, 9.991032600402832], 'p_val': [0.5078648328781128, 0.3874434530735016, 0.5758693218231201, 0.3676162362098694, 0.4147398769855499, 0.992676854133606, 0.9645494222640991, 0.6414194703102112, 1.0, 1.0, 0.5078648328781128, 0.4412803649902344], 'threshold': 0.05}, 'meta': {'name': 'TabularDrift', 'detector_type': 'offline', 'data_type': None}} 

Age -- Drift? No! -- K-S 0.012 -- p-value 0.508
Workclass -- Drift? No! -- Chi2 8.487 -- p-value 0.387
Education -- Drift? No! -- Chi2 4.753 -- p-value 0.576
Marital Status -- Drift? No! -- Chi2 3.160 -- p-value 0.368
Occupation -- Drift? No! -- Chi2 8.194 -- p-value 0.415
Relationship -- Drift? No! -- Chi2 0.485 -- p-value 0.993
Race -- Drift? No! -- Chi

In [201]:
%%writefile model-settings.json
{
  "name": "income-classifier-cd",
  "implementation": "mlserver_alibi_detect.TabularDriftDetector",
  "parameters": {
    "uri": "./alibi-detector-artifacts/ref_data.pkl",
    "version": "v0.1.0",
    "initParameters": {
      "protocol": "tensorflow.http",
      "p_val": 0.05,
      "categories_per_feature": {
        "1": null,
        "2": null,
        "3": null,
        "4": null,
        "5": null,
        "6": null,
        "7": null,
        "11": null
      }
    },
    "predictParameters": {
      "drift_type": "feature"
    }
  }
}

Overwriting model-settings.json


In [204]:
import requests

inference_request = {
    "instances": X_t1.tolist()
}

endpoint = "http://localhost:8080/"
response = requests.post(endpoint, json=inference_request)

In [205]:
import json
response_dict = json.loads(response.text)
print(response_dict,"\n")

labels = ['No!', 'Yes!']
for f in range(cd.n_features):
    stat = 'Chi2' if f in list(categories_per_feature.keys()) else 'K-S'
    fname = feature_names[f]
    is_drift = response_dict['data']['is_drift'][f]
    stat_val, p_val = response_dict['data']['distance'][f], response_dict['data']['p_val'][f]
    print(f'{fname} -- Drift? {labels[is_drift]} -- {stat} {stat_val:.3f} -- p-value {p_val:.3f}')

{'data': {'is_drift': [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], 'distance': [0.007000000216066837, 5.799564838409424, 5.412807464599609, 1.1667115688323975, 12.295819282531738, 6.5198540687561035, 1.4171992540359497, 0.008140486665070057, 0.005400000140070915, 0.0032999999821186066, 0.0044999998062849045, 11.255668640136719], 'p_val': [0.9656426310539246, 0.6696720719337463, 0.49205708503723145, 0.7609988451004028, 0.1384851485490799, 0.2588665187358856, 0.8412002921104431, 0.9281086921691895, 0.998472273349762, 1.0, 0.9999523758888245, 0.33794912695884705], 'threshold': 0.05}, 'meta': {'name': 'TabularDrift', 'detector_type': 'offline', 'data_type': None}} 

Age -- Drift? No! -- K-S 0.007 -- p-value 0.966
Workclass -- Drift? No! -- Chi2 5.800 -- p-value 0.670
Education -- Drift? No! -- Chi2 5.413 -- p-value 0.492
Marital Status -- Drift? No! -- Chi2 1.167 -- p-value 0.761
Occupation -- Drift? No! -- Chi2 12.296 -- p-value 0.138
Relationship -- Drift? No! -- Chi2 6.520 -- p-value 0.259
Rac

# ChiSquareDrift

In [206]:
cols = list(category_map.keys())
cat_names = [feature_names[_] for _ in list(category_map.keys())]
X_ref_cat, X_t0_cat = X_ref[:, cols], X_t0[:, cols]
X_ref_cat.shape, X_t0_cat.shape

((10000, 8), (10000, 8))

In [207]:
cd = ChiSquareDrift(X_ref_cat, p_val=.05)
preds = cd.predict(X_t0_cat,drift_type="feature")
print(f"Threshold {preds['data']['threshold']}")
for f in range(cd.n_features):
    fname = cat_names[f]
    is_drift = (preds['data']['p_val'][f] < preds['data']['threshold']).astype(int)
    stat_val, p_val = preds['data']['distance'][f], preds['data']['p_val'][f]
    print(f'{fname} -- Drift? {labels[is_drift]} -- {stat} {stat_val:.3f} -- p-value {p_val:.3f}')

Threshold 0.05
Workclass -- Drift? No! -- Chi2 8.487 -- p-value 0.387
Education -- Drift? No! -- Chi2 4.753 -- p-value 0.576
Marital Status -- Drift? No! -- Chi2 3.160 -- p-value 0.368
Occupation -- Drift? No! -- Chi2 8.194 -- p-value 0.415
Relationship -- Drift? No! -- Chi2 0.485 -- p-value 0.993
Race -- Drift? No! -- Chi2 0.587 -- p-value 0.965
Sex -- Drift? No! -- Chi2 0.217 -- p-value 0.641
Country -- Drift? No! -- Chi2 9.991 -- p-value 0.441


In [208]:
import pickle
filepath = 'alibi-detector-artifacts/ref_cat_data.pkl'  # change to directory where detector is saved
pickle.dump(X_ref_cat, open(filepath,"wb"))

In [209]:
%%writefile model-settings.json
{
  "name": "income-classifier-cd",
  "implementation": "mlserver_alibi_detect.ChiSquareDriftDetector",
  "parameters": {
    "uri": "./alibi-detector-artifacts/ref_cat_data.pkl",
    "version": "v0.1.0",
    "initParameters": {
      "protocol": "tensorflow.http",
      "p_val": 0.05
    },
    "predictParameters": {
      "drift_type": "feature"
    }
  }
}

Overwriting model-settings.json


In [210]:
import requests

inference_request = {
    "instances": X_t0_cat.tolist()
}

endpoint = "http://localhost:8080/"
response = requests.post(endpoint, json=inference_request)

In [211]:
import json
response_dict = json.loads(response.text)
print(response_dict,"\n")

labels = ['No!', 'Yes!']
for f in range(cd.n_features):
    stat = 'Chi2' if f in list(categories_per_feature.keys()) else 'K-S'
    fname = cat_names[f]
    is_drift = response_dict['data']['is_drift'][f]
    stat_val, p_val = response_dict['data']['distance'][f], response_dict['data']['p_val'][f]
    print(f'{fname} -- Drift? {labels[is_drift]} -- {stat} {stat_val:.3f} -- p-value {p_val:.3f}')

{'data': {'is_drift': [0, 0, 0, 0, 0, 0, 0, 0], 'distance': [8.486513137817383, 4.752931594848633, 3.1599440574645996, 8.194136619567871, 0.4845852553844452, 0.5865231156349182, 0.21689055860042572, 9.991032600402832], 'p_val': [0.3874434530735016, 0.5758693218231201, 0.3676162362098694, 0.4147398769855499, 0.992676854133606, 0.9645494222640991, 0.6414194703102112, 0.4412803649902344], 'threshold': 0.05}, 'meta': {'name': 'ChiSquareDrift', 'detector_type': 'offline', 'data_type': None}} 

Workclass -- Drift? No! -- K-S 8.487 -- p-value 0.387
Education -- Drift? No! -- Chi2 4.753 -- p-value 0.576
Marital Status -- Drift? No! -- Chi2 3.160 -- p-value 0.368
Occupation -- Drift? No! -- Chi2 8.194 -- p-value 0.415
Relationship -- Drift? No! -- Chi2 0.485 -- p-value 0.993
Race -- Drift? No! -- Chi2 0.587 -- p-value 0.965
Sex -- Drift? No! -- Chi2 0.217 -- p-value 0.641
Country -- Drift? No! -- Chi2 9.991 -- p-value 0.441
