# MLFlow and Seldon

End to end example integrating MLFlow and Seldon, with A/B testing of the models.
The slides accompanying this demo can be [found here](https://docs.google.com/presentation/d/1QXiOZkd_XNw6PbUalhYDajljKYQjgKczzNncTyLk9uA/edit?usp=sharing).

## Pre-requisites

### Python Libraries

The training part of the example assumes that you are able to run `mlflow` on your local environment. It is recommended to use this in a fresh conda environment.
To set it up, you can run:

In [None]:
!pip install -r requirements.txt

### Kubernetes

The serving side of the example assumes that you've got access to a Kubernetes cluster where Seldon Core is installed.
If you don't have access to a local cluster, feel free to use [`kind`](https://kind.sigs.k8s.io/).

For instructions on how to install Seldon Core, please check their [setup docs](https://docs.seldon.io/projects/seldon-core/en/latest/workflow/install.html).

### Analytics

Additionally, after we deploy the models, we will compare their performance using Seldon Core's integration with Prometheus and Grafana.
For that part to work, we will need to install Prometheus and Grafana.

To speed things up, we can do this through the [`seldon-core-analytics` chart](https://docs.seldon.io/projects/seldon-core/en/latest/charts/seldon-core-analytics.html).

## Training

This first section will cover how to train models using MLFlow.

### MLflow Project

The MLproject file defines:
- The environment where the training runs.
- The hyperparameters that can be tweaked. In our case, these are $\{\alpha, l_{1}\}$.
- The interface to train the model.

In [None]:
%%writefile ./training/MLproject
name: mlflow-talk

conda_env: conda.yaml

entry_points:
  main:
    parameters:
      alpha: float
      l1_ratio: {type: float, default: 0.1}
    command: "python train.py {alpha} {l1_ratio}"

This allows us to have a single command to train the model. 

``` bash
$ mlflow run ./training -P alpha=... -P l1_ratio=...
```

For our example, we will train two versions of the model, which we'll later compare using A/B testing.

- $M_{1}$ with $\alpha = 0.5$
- $M_{2}$ with $\alpha = 0.75$

In [11]:
!mlflow run ./training -P alpha=0.1

2022/12/19 16:49:07 INFO mlflow.utils.conda: === Creating conda environment mlflow-62f9b69251cbd655a1308303dace7190a5738de8 ===
Collecting package metadata (repodata.json): done
Solving environment: done


  current version: 22.9.0
  latest version: 22.11.1

Please update conda by running

    $ conda update -n base -c defaults conda


Preparing transaction: done
Verifying transaction: done
Executing transaction: done
Collecting package metadata (repodata.json): done
Solving environment: done


  current version: 22.9.0
  latest version: 22.11.1

Please update conda by running

    $ conda update -n base -c defaults conda



Downloading and Extracting Packages
scikit-learn-1.1.3   | 6.4 MB    | ##################################### | 100% 
certifi-2022.9.24    | 155 KB    | ##################################### | 100% 
pip-22.3.1           | 2.7 MB    | ##################################### | 100% 
setuptools-65.5.0    | 1.1 MB    | ##################################### | 100% 
libgfor

In [12]:
!mlflow run ./training -P alpha=1.0

2022/12/19 16:55:28 INFO mlflow.utils.conda: Conda environment mlflow-62f9b69251cbd655a1308303dace7190a5738de8 already exists.
2022/12/19 16:55:28 INFO mlflow.projects.utils: === Created directory /var/folders/xv/n51qjph14_52lj9y4706w6sc0000gn/T/tmp9os3q7o8 for downloading remote URIs passed to arguments of type 'path' ===
2022/12/19 16:55:28 INFO mlflow.projects.backend.local: === Running command 'source /Users/josh/opt/anaconda3/bin/../etc/profile.d/conda.sh && conda activate mlflow-62f9b69251cbd655a1308303dace7190a5738de8 1>&2 && python train.py 1.0 0.1' in run with ID 'd2a6d8a5c3744af39227f5d632794d5a' === 
Elasticnet model (alpha=1.000000, l1_ratio=0.100000):
  RMSE: 0.8107373707184711
  MAE: 0.6241295925236752
  R2: 0.15105362812007328
2022/12/19 16:55:33 INFO mlflow.projects: === Run (ID 'd2a6d8a5c3744af39227f5d632794d5a') succeeded ===


### MLflow Tracking

The `train.py` script uses the `mlflow.log_param()` and `mlflow.log_metric()` commands to track each experiment. These are part of the `MLtrack` API, which tracks experiments parameters and results. These can be stored on a remote server, which can then be shared across the entire team. However, on our example we will store these locally on a `mlruns` folder.

In [13]:
!ls mlruns/0

[1m[36mc5080bf26e24443faaee06a757507d16[m[m meta.yaml
[1m[36md2a6d8a5c3744af39227f5d632794d5a[m[m


We can also run `mlflow ui` to show these visually. This will start the MLflow server in http://localhost:5000.

```bash
$ mlflow ui
```

![MLFlow UI](./images/mlflow-ui.png)

### MLflow Model

The `MLmodel` file allows us to version and share models easily. Below we can see an example.

In [14]:
!ls ./mlruns/0/

[1m[36mc5080bf26e24443faaee06a757507d16[m[m meta.yaml
[1m[36md2a6d8a5c3744af39227f5d632794d5a[m[m


In [15]:
model_a = 'c5080bf26e24443faaee06a757507d16'
model_b = 'd2a6d8a5c3744af39227f5d632794d5a'

In [16]:
!cat ./mlruns/0/c5080bf26e24443faaee06a757507d16/artifacts/model/MLmodel

artifact_path: model
flavors:
  python_function:
    env: conda.yaml
    loader_module: mlflow.sklearn
    model_path: model.pkl
    python_version: 3.8.13
  sklearn:
    code: null
    pickled_model: model.pkl
    serialization_format: cloudpickle
    sklearn_version: 1.1.2
mlflow_version: 1.28.0
model_uuid: 360cabd7b83146ee94bbc4fe8d1b2e9d
run_id: c5080bf26e24443faaee06a757507d16
utc_time_created: '2022-12-19 21:51:35.798350'


As we can see above the `MLmodel` keeps track, between others, of

- The experiment id, `5a6be5a1ef844783a50a6577745dbdc3`
- Date 
- Version of `sklearn` 
- How the model was stored

As we shall see shortly, the pre-packaged Seldon's model server will use this file to serve this model.

## Preparing for deployment

#### Pack Model A

In [17]:
!ls mlruns/0 | sed -n 1p

c5080bf26e24443faaee06a757507d16


In [18]:
!conda pack -o mlruns/0/$(ls mlruns/0 | sed -n 1p)/artifacts/model/environment.tar.gz -f

Collecting packages...
Packing environment at '/Users/josh/opt/anaconda3/envs/mlflow-ab' to 'mlruns/0/c5080bf26e24443faaee06a757507d16/artifacts/model/environment.tar.gz'
[########################################] | 100% Completed |  0.1s


#### Pack Model B

In [19]:
!ls mlruns/0 | sed -n 2p

d2a6d8a5c3744af39227f5d632794d5a


In [20]:
!conda pack -o mlruns/0/$(ls mlruns/0 | sed -n 2p)/artifacts/model/environment.tar.gz -f

Collecting packages...
Packing environment at '/Users/josh/opt/anaconda3/envs/mlflow-ab' to 'mlruns/0/d2a6d8a5c3744af39227f5d632794d5a/artifacts/model/environment.tar.gz'
[########################################] | 100% Completed |  0.1s


#### Upload models (optional)

As a last step, we will persist the models we have just trained using `MLflow`. For that, we will upload them into Google Cloud Storage. Note that to run these commands you need write access into the `gs://seldon-models` bucket and you need to have `gsutil` set up.

Note that in a production setting, MLflow would be configured to log models against a persistent data store (e.g. GCS, Minio, etc.). In that case, this manual step wouldn't be needed.

We will upload both versions of the model to:

- `gs://seldon-models/mlflow/model-a`
- `gs://seldon-models/mlflow/model-b`

In [None]:
minioClient = Minio("35.185.70.254:9000", "admin@seldon.io", "12341234", secure=False)

In [None]:
minioClient.list_buckets()

In [5]:
def upload_local_directory_to_minio(local_path: str, bucket_name: str, folder_name: str):
    assert os.path.isdir(local_path)

    for local_file in glob.glob(local_path + '/**'):
        local_file = local_file.replace(os.sep, "/")
        if not os.path.isfile(local_file):
            upload_local_directory_to_minio(
                local_file, bucket_name)
        else:
            remote_path = os.path.join(
                local_file[1 + len(local_path):])
            remote_path = remote_path.replace(
                os.sep, "/")
            remote_path = folder_name + "/" + remote_path
            minioClient.fput_object(bucket_name, remote_path, local_file)

In [None]:
upload_local_directory_to_minio(f"mlruns/0/{RUN}/artifacts/model", f"{BUCKET_NAME}", "default")

In [21]:
!gsutil cp -r mlruns/0/c5080bf26e24443faaee06a757507d16/artifacts/model/* gs://josh-seldon/ab-test/model-a
!gsutil cp -r mlruns/0/d2a6d8a5c3744af39227f5d632794d5a/artifacts/model/* gs://josh-seldon/ab-test/model-b

Copying file://mlruns/0/c5080bf26e24443faaee06a757507d16/artifacts/model/MLmodel [Content-Type=application/octet-stream]...
Copying file://mlruns/0/c5080bf26e24443faaee06a757507d16/artifacts/model/conda.yaml [Content-Type=application/octet-stream]...
Copying file://mlruns/0/c5080bf26e24443faaee06a757507d16/artifacts/model/environment.tar.gz [Content-Type=application/x-tar]...
Copying file://mlruns/0/c5080bf26e24443faaee06a757507d16/artifacts/model/model.pkl [Content-Type=application/octet-stream]...
\ [4 files][  7.2 KiB/  7.2 KiB]                                                
==> NOTE: You are performing a sequence of gsutil operations that may
run significantly faster if you instead use gsutil -m cp ... Please
see the -m section under "gsutil help options" for further information
about when gsutil -m can be advantageous.

Copying file://mlruns/0/c5080bf26e24443faaee06a757507d16/artifacts/model/python_env.yaml [Content-Type=application/octet-stream]...
Copying file://mlruns/0/c5080b

## Serving

To serve this model we will use Seldon.

### Deploy models v1 Protocol

Once the cluster is set up, the next step will to upload these models into a common repository and to deploy two `SeldonDeployment` specs to `k8s`. As we can see below, we will route 50% of the traffic to each of the models.

In [28]:
%%writefile ./serving/model-a-b.yaml
---
apiVersion: machinelearning.seldon.io/v1alpha2
kind: SeldonDeployment
metadata:
  name: wines-classifier
spec:
  annotations:
    seldon.io/executor: "false" 
  predictors:
  - graph:
      children: []
      implementation: MLFLOW_SERVER
      modelUri: gs://josh-seldon/ab-test/model-a
      name: wines-classifier
    name: model-a
    replicas: 1
    traffic: 50
    componentSpecs:
    - spec:
        # We are setting high failureThreshold as installing conda dependencies
        # can take long time and we want to avoid k8s killing the container prematurely
        containers:
        - name: wines-classifier
          livenessProbe:
            initialDelaySeconds: 100
            failureThreshold: 500
            periodSeconds: 5
            successThreshold: 1
            httpGet:
              path: /health/ping
              port: http
              scheme: HTTP
          readinessProbe:
            initialDelaySeconds: 100
            failureThreshold: 500
            periodSeconds: 5
            successThreshold: 1
            httpGet:
              path: /health/ping
              port: http
              scheme: HTTP
  - graph:
      children: []
      implementation: MLFLOW_SERVER
      modelUri: gs://josh-seldon/ab-test/model-b
      name: wines-classifier
    name: model-b
    replicas: 1
    traffic: 50
    componentSpecs:
    - spec:
        # We are setting high failureThreshold as installing conda dependencies
        # can take long time and we want to avoid k8s killing the container prematurely
        containers:
        - name: wines-classifier
          livenessProbe:
            initialDelaySeconds: 100
            failureThreshold: 500
            periodSeconds: 5
            successThreshold: 1
            httpGet:
              path: /health/ping
              port: http
              scheme: HTTP
          readinessProbe:
            initialDelaySeconds: 100
            failureThreshold: 500
            periodSeconds: 5
            successThreshold: 1
            httpGet:
              path: /health/ping
              port: http
              scheme: HTTP

Overwriting ./serving/model-a-b.yaml


In [29]:
!kubectl apply -f ./serving/model-a-b.yaml

seldondeployment.machinelearning.seldon.io/wines-classifier configured


In [25]:
!kubectl delete -f ./serving/model-a-b.yaml

seldondeployment.machinelearning.seldon.io "wines-classifier" deleted


We can verify these have been deployed by checking the pods and `SeldonDeployment` resources in the cluster.

### Test models

We will now run a sample query to test that the inference graph is working.

In [4]:
import requests

inference_request = {
    "data": {
        "names": ["fixed acidity", "volatile acidity", "citric acid", "residual sugar", "chlorides", "free sulfur dioxide", "total sulfur dioxide", "density", "pH", "sulphates", "alcohol"],
        "ndarray": [
            [7, 0.27, 0.36, 20.7, 0.045, 45, 170, 1.001, 3, 0.45, 8.8]
        ]
    }
}

endpoint = "http://localhost:8080/seldon/default/wines-classifier/api/v1.0/predictions"
response = requests.post(endpoint, json=inference_request)

response.json()


{'data': {'names': [], 'ndarray': [5.550530190667395]},
 'meta': {'requestPath': {'wines-classifier': 'seldonio/mlflowserver:1.15.0'}}}

### Deploy models v1 Protocol

Once the cluster is set up, the next step will to upload these models into a common repository and to deploy two `SeldonDeployment` specs to `k8s`. As we can see below, we will route 50% of the traffic to each of the models.

In [None]:
%%writefile ./serving/model-a-b.yaml
---
apiVersion: machinelearning.seldon.io/v1alpha2
kind: SeldonDeployment
metadata:
  name: wines-classifier
spec:
  annotations:
    seldon.io/executor: "false" 
  predictors:
  - graph:
      children: []
      implementation: MLFLOW_SERVER
      modelUri: gs://josh-seldon/ab-test/model-a
      name: wines-classifier
    name: model-a
    replicas: 1
    traffic: 50
    componentSpecs:
    - spec:
        # We are setting high failureThreshold as installing conda dependencies
        # can take long time and we want to avoid k8s killing the container prematurely
        containers:
        - name: wines-classifier
          livenessProbe:
            initialDelaySeconds: 100
            failureThreshold: 500
            periodSeconds: 5
            successThreshold: 1
            httpGet:
              path: /health/ping
              port: http
              scheme: HTTP
          readinessProbe:
            initialDelaySeconds: 100
            failureThreshold: 500
            periodSeconds: 5
            successThreshold: 1
            httpGet:
              path: /health/ping
              port: http
              scheme: HTTP
  - graph:
      children: []
      implementation: MLFLOW_SERVER
      modelUri: gs://josh-seldon/ab-test/model-b
      name: wines-classifier
    name: model-b
    replicas: 1
    traffic: 50
    componentSpecs:
    - spec:
        # We are setting high failureThreshold as installing conda dependencies
        # can take long time and we want to avoid k8s killing the container prematurely
        containers:
        - name: wines-classifier
          livenessProbe:
            initialDelaySeconds: 100
            failureThreshold: 500
            periodSeconds: 5
            successThreshold: 1
            httpGet:
              path: /health/ping
              port: http
              scheme: HTTP
          readinessProbe:
            initialDelaySeconds: 100
            failureThreshold: 500
            periodSeconds: 5
            successThreshold: 1
            httpGet:
              path: /health/ping
              port: http
              scheme: HTTP

Overwriting ./serving/model-a-b.yaml


In [None]:
!kubectl apply -f ./serving/model-a-b.yaml

seldondeployment.machinelearning.seldon.io/wines-classifier configured


In [None]:
!kubectl delete -f ./serving/model-a-b.yaml

seldondeployment.machinelearning.seldon.io "wines-classifier" deleted


### Test models

We will now run a sample query to test that the inference graph is working.

In [4]:
import requests

inference_request = {
    "data": {
        "names": ["fixed acidity", "volatile acidity", "citric acid", "residual sugar", "chlorides", "free sulfur dioxide", "total sulfur dioxide", "density", "pH", "sulphates", "alcohol"],
        "ndarray": [
            [7, 0.27, 0.36, 20.7, 0.045, 45, 170, 1.001, 3, 0.45, 8.8]
        ]
    }
}

endpoint = "http://localhost:8080/seldon/default/wines-classifier/api/v1.0/predictions"
response = requests.post(endpoint, json=inference_request)

response.json()


{'data': {'names': [], 'ndarray': [5.601358618516229]},
 'meta': {'requestPath': {'wines-classifier': 'seldonio/mlflowserver:1.15.0'}}}

## Analytics

To access Grafana, it will be necessary to forward the port to the respective pod as we did previously to access the Seldon Core deployment.
The credentials will be simply `admin` // `password`.

This command needs to run constantly on the background, so **please make sure you run it on a separate terminal**.

```bash
$ kubectl port-forward \
    $(kubectl get pods \
        -l app=grafana-prom-server -o jsonpath='{.items[0].metadata.name}') \
    3000:3000
```

Now that we have both models running in production, we can analyse their performance using Seldon Core's integration with Prometheus and Grafana.
To do so, we will iterate over the training set (which can be foud in `./training/wine-quality.csv`), making a request and sending the feedback of the prediction.

Since the `/feedback` endpoint requires a `reward` signal (i.e. higher better), we will simulate one as

$$
  R(x_{n})
    = \begin{cases}
        \frac{1}{(y_{n} - f(x_{n}))^{2}} &, y_{n} \neq f(x_{n}) \\
        500 &, y_{n} = f(x_{n})
      \end{cases}
$$

, where $R(x_{n})$ is the reward for input point $x_{n}$, $f(x_{n})$ is our trained model and $y_{n}$ is the actual value.

In [10]:
### %%writefile feedback.py
import pandas as pd
import numpy as np
from seldon_core.seldon_client import SeldonClient

sc = SeldonClient(
    gateway="istio", 
    namespace="default",
    gateway_endpoint="localhost:8080",
    deployment_name='wines-classifier')

df = pd.read_csv("./training/wine-quality.csv")

def _get_reward(y, y_pred):
    if y == y_pred:
        return 500    
    
    return 1 / np.square(y - y_pred)

def _test_row(row):
    input_features = row[:-1]
    feature_names = input_features.index.to_list()
    X = input_features.values.reshape(1, -1)
    y = row[-1].reshape(1, -1)
    
    r = sc.predict(
        data=X,
        names=feature_names)
    
    y_pred = r.response['data']['tensor']['values']
    reward = _get_reward(y, y_pred)
    sc.feedback(
        prediction_request=r.request,
        prediction_response=r.response,
        reward=reward)
    
    return reward[0]

df.apply(_test_row, axis=1)

TypeError: array([[4.94992876]]) has type numpy.ndarray, but expected one of: int, float

In [11]:
!python feedback.py

Traceback (most recent call last):
  File "/Users/josh/opt/anaconda3/envs/tmp-wine/lib/python3.8/site-packages/urllib3/connection.py", line 174, in _new_conn
    conn = connection.create_connection(
  File "/Users/josh/opt/anaconda3/envs/tmp-wine/lib/python3.8/site-packages/urllib3/util/connection.py", line 95, in create_connection
    raise err
  File "/Users/josh/opt/anaconda3/envs/tmp-wine/lib/python3.8/site-packages/urllib3/util/connection.py", line 85, in create_connection
    sock.connect(sa)
ConnectionRefusedError: [Errno 61] Connection refused

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/josh/opt/anaconda3/envs/tmp-wine/lib/python3.8/site-packages/urllib3/connectionpool.py", line 703, in urlopen
    httplib_response = self._make_request(
  File "/Users/josh/opt/anaconda3/envs/tmp-wine/lib/python3.8/site-packages/urllib3/connectionpool.py", line 398, in _make_request
    conn.request(method, url, **httpli

 We can now access the Grafana dashboard in http://localhost:3000 (credentials are `admin` // `password`). Inside the portal, we will go to the Prediction Analytics dashboard.
 
 
We can see a snapshot below.

![Seldon Analytics](./images/seldon-analytics.png)