# MRE Quality ML Prediction

### MLFlow Pre-packaged Model Server AB Test Deployment on Kubernetes distrubuted compute with GPUs 
In this example powered by *ADEPT* methodogies, we will build two models with MLFlow and we will deploy them as an A/B test deployment. The reason this is powerful is because it allows you to deploy a new model next to the old one, distributing a percentage of traffic. These deployment strategies are quite simple using Seldon, and can be extended to shadow deployments, multi-armed-bandits, etc.

Tutorial Overview
This tutorial will follow closely break down in the following sections:

1. Train the MLFlow elastic net MRE Quality example

2. Deploy your trained model leveraging MLFlow model server

3. Test the deployed MLFlow model by sending requests

4. Deploy your second model as an A/B test

5. Visualise and monitor the performance of your models with automated self-service dashboards for decsion engineering!

In [1]:
### temp fix for PoC - fix deps; wont be in production version

!pip uninstall itsdangerous werkzeug scikit-learn mlflow==1.14.1 --yes
!pip install itsdangerous==2.0.1 werkzeug==2.0.3 scikit-learn==1.1.2 pandas mlflow

Found existing installation: itsdangerous 2.1.2
Uninstalling itsdangerous-2.1.2:
  Successfully uninstalled itsdangerous-2.1.2
Found existing installation: Werkzeug 2.1.1
Uninstalling Werkzeug-2.1.1:
  Successfully uninstalled Werkzeug-2.1.1
Found existing installation: scikit-learn 0.24.2
Uninstalling scikit-learn-0.24.2:
  Successfully uninstalled scikit-learn-0.24.2
Found existing installation: mlflow 1.14.1
Uninstalling mlflow-1.14.1:
  Successfully uninstalled mlflow-1.14.1
[0mCollecting itsdangerous==2.0.1
  Downloading itsdangerous-2.0.1-py3-none-any.whl (18 kB)
Collecting werkzeug==2.0.3
  Downloading Werkzeug-2.0.3-py3-none-any.whl (289 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m289.2/289.2 KB[0m [31m3.4 MB/s[0m eta [36m0:00:00[0m00:01[0m
[?25hCollecting scikit-learn==1.1.2
  Downloading scikit_learn-1.1.2-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (30.8 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m30.8/30.8 

In [2]:
import os
import sys, json
import warnings
import numpy as np
import pandas as pd
import mlflow.sklearn
import werkzeug as wkz
import mlflow, sklearn
import itsdangerous as itd

from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score
from sklearn.model_selection import train_test_split
from sklearn.linear_model import ElasticNet

from seldon_core.seldon_client import SeldonClient

In [3]:
##  Reload the notebook kernel.

print('v' + itd.__version__)        ##  Output: v2.0.1
print('v' + wkz.__version__)        ##  Output: v2.0.3
print('v' + sklearn.__version__)    ##  Output: v1.1.2
print('v' + mlflow.__version__)     ##  Output: v1.14.1

v2.0.1
v2.0.3
v1.1.2
v1.29.0


In [4]:
 ## replace tracking URI from ADEPT/Aktiver GUI Pipeline Catalog page for MLFlow
MLFLOW_MODEL_TRACKING = 'http://54.166.226.23:30888' # replace with url of mlflow

In [6]:
mlflow.set_tracking_uri(MLFLOW_MODEL_TRACKING)
mlflow.set_experiment("ADEPT_MRE_DEMO")

<Experiment: artifact_location='s3://adeptdemo/adept/1', creation_time=None, experiment_id='1', last_update_time=None, lifecycle_stage='active', name='ADEPT_MRE_DEMO', tags={}>

### Define MLflow project with MLproject file:
Next, project uses Conda for the environment and that it’s defined in the conda.yaml file.
Lastly, we can also see that the training will be performed by the train.py file, which receives two parameters ```alpha``` and ```l1_ratio```.

*Note: It is important to version lock the software, as in ```conda.yaml``` as a step towards reproducible data science.

In [7]:
!pygmentize -l yaml MLproject
!pygmentize conda.yaml

[94mname[39;49;00m:[37m [39;49;00mmre-quality[37m[39;49;00m
[37m[39;49;00m
[94mconda_env[39;49;00m:[37m [39;49;00mconda.yaml[37m[39;49;00m
[37m[39;49;00m
[94mentry_points[39;49;00m:[37m[39;49;00m
[37m  [39;49;00m[94mmain[39;49;00m:[37m[39;49;00m
[37m    [39;49;00m[94mparameters[39;49;00m:[37m[39;49;00m
[37m      [39;49;00m[94malpha[39;49;00m:[37m [39;49;00mfloat[37m[39;49;00m
[37m      [39;49;00m[94ml1_ratio[39;49;00m:[37m [39;49;00m{[94mtype[39;49;00m:[37m [39;49;00m[31mfloat[39;49;00m,[94m default[39;49;00m:[37m [39;49;00m[31m0.1[39;49;00m}[37m[39;49;00m
[37m    [39;49;00m[94mcommand[39;49;00m:[37m [39;49;00m[33m"[39;49;00m[33mpython[39;49;00m[31m [39;49;00m[33mtrain.py[39;49;00m[31m [39;49;00m[33m{alpha}[39;49;00m[31m [39;49;00m[33m{l1_ratio}[39;49;00m[33m"[39;49;00m[37m[39;49;00m
[94mname[39;49;00m:[37m [39;49;00mmre-quality[37m[39;49;00m
[94mchannels[39;49;00m:[37m[39;49;00m
[37m  [39

In [8]:
data = pd.read_csv("mre-quality.csv")
data.head()

Unnamed: 0,fixed acidity,volatile acidity,citric acid,residual sugar,chlorides,free sulfur dioxide,total sulfur dioxide,density,pH,sulphates,alcohol,quality
0,7.0,0.27,0.36,20.7,0.045,45.0,170.0,1.001,3.0,0.45,8.8,6
1,6.3,0.3,0.34,1.6,0.049,14.0,132.0,0.994,3.3,0.49,9.5,6
2,8.1,0.28,0.4,6.9,0.05,30.0,97.0,0.9951,3.26,0.44,10.1,6
3,7.2,0.23,0.32,8.5,0.058,47.0,186.0,0.9956,3.19,0.4,9.9,6
4,7.2,0.23,0.32,8.5,0.058,47.0,186.0,0.9956,3.19,0.4,9.9,6


In [11]:
def eval_metrics(actual, pred):
    rmse = np.sqrt(mean_squared_error(actual, pred))
    mae = mean_absolute_error(actual, pred)
    r2 = r2_score(actual, pred)
    return rmse, mae, r2


warnings.filterwarnings("ignore")
np.random.seed(40)

# Read the mre-quality csv file (make sure you're running this from the root of MLflow!)
mre_path = "mre-quality.csv"
data = pd.read_csv(mre_path)

# Split the data into training and test sets. (0.75, 0.25) split.
train, test = train_test_split(data)

# The predicted column is "quality" which is a scalar from [3, 9]
train_x = train.drop(["quality"], axis=1)
test_x = test.drop(["quality"], axis=1)
train_y = train[["quality"]]
test_y = test[["quality"]]

params = {"alpha": 0.5, "l1_ratio": 0.5}

mlflow.sklearn.autolog()
with mlflow.start_run():
    run = mlflow.active_run()
    print("ML Experiment Run UUID: " + run._info.run_id)
    
    f = open('aws.json')
    creds = json.load(f)
    os.environ["AWS_ACCESS_KEY_ID"] = creds["aws_access_key"]
    os.environ["AWS_SECRET_ACCESS_KEY"] = creds["aws_secret_key"]
    
    lr = ElasticNet(alpha=params["alpha"], l1_ratio=params["l1_ratio"], random_state=42)
    lr.fit(train_x, train_y)

    predicted_qualities = lr.predict(test_x)

    (rmse, mae, r2) = eval_metrics(test_y, predicted_qualities)

    print("Elasticnet model (alpha=%f, l1_ratio=%f):" % (params["alpha"], params["l1_ratio"]))
    print("  RMSE: %s" % rmse)
    print("  MAE: %s" % mae)
    print("  R2: %s" % r2)

    mlflow.log_param("alpha", params["alpha"])
    mlflow.log_param("l1_ratio", params["l1_ratio"])
    mlflow.log_metric("rmse", rmse)
    mlflow.log_metric("r2", r2)
    mlflow.log_metric("mae", mae)
    
    
    # https://adeptdemo.s3.amazonaws.com/artifacts/1/c8962777bd2e4faab7697fabbf311ae8/artifacts/model/conda.yaml
    mlflow.sklearn.log_model(
        sk_model=lr,
        artifact_path='artifacts',
        registered_model_name="adept_model-v"
    )
    
    #mlflow.sklearn.log_model(lr, f'''https://adeptdemo.s3.amazonaws.com/artifacts/1/{run._info.run_id}/artifacts/model''')
    #print("Model artifacts saved to: " + f'''https://adeptdemo.s3.amazonaws.com/artifacts/1/{run._info.run_id}/artifacts/model/''')
    #mlflow.log_artifacts(f'''https://adeptdemo.s3.amazonaws.com/artifacts/1/{run._info.run_id}/artifacts/model''')

ML Experiment Run UUID: 2b34f79b4ab243ea998109dce4c3d49d
Elasticnet model (alpha=0.500000, l1_ratio=0.500000):
  RMSE: 0.82224284975954
  MAE: 0.6278761410160693
  R2: 0.12678721972772689


Registered model 'adept_model-v' already exists. Creating a new version of this model...
2022/10/04 19:21:46 INFO mlflow.tracking._model_registry.client: Waiting up to 300 seconds for model version to finish creation.                     Model name: adept_model-v, version 2
Created version '2' of model 'adept_model-v'.


In [13]:
!curl -LO https://dl.k8s.io/release/v1.25.0/bin/linux/amd64/kubectl

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   138  100   138    0     0   2759      0 --:--:-- --:--:-- --:--:--  2816
100 42.9M  100 42.9M    0     0  94.3M      0 --:--:-- --:--:-- --:--:-- 94.3M


In [16]:
!install -o root -g root -m 0755 kubectl /usr/local/bin/kubectl

In [22]:
!curl -fsSL -o get_helm.sh https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3
!chmod 700 get_helm.sh
!./get_helm.sh

Downloading https://get.helm.sh/helm-v3.10.0-linux-amd64.tar.gz
Verifying checksum... Done.
Preparing to install helm into /usr/local/bin
helm installed into /usr/local/bin/helm


In [56]:
!helm repo add datawire https://www.getambassador.io
!helm upgrade --install ambassador datawire/ambassador \
  --set image.repository=docker.io/datawire/ambassador \
  --set service.type=ClusterIP \
  --set replicaCount=1 \
  --set crds.keep=false \
  --set enableAES=false \
  --create-namespace \
  --namespace ambassador

"datawire" has been added to your repositories
Release "ambassador" does not exist. Installing it now.
manifest_sorter.go:192: info: skipping unknown hook: "crd-install"
manifest_sorter.go:192: info: skipping unknown hook: "crd-install"
manifest_sorter.go:192: info: skipping unknown hook: "crd-install"
manifest_sorter.go:192: info: skipping unknown hook: "crd-install"
manifest_sorter.go:192: info: skipping unknown hook: "crd-install"
manifest_sorter.go:192: info: skipping unknown hook: "crd-install"
manifest_sorter.go:192: info: skipping unknown hook: "crd-install"
manifest_sorter.go:192: info: skipping unknown hook: "crd-install"
manifest_sorter.go:192: info: skipping unknown hook: "crd-install"
manifest_sorter.go:192: info: skipping unknown hook: "crd-install"
manifest_sorter.go:192: info: skipping unknown hook: "crd-install"
manifest_sorter.go:192: info: skipping unknown hook: "crd-install"
manifest_sorter.go:192: info: skipping unknown hook: "crd-install"
manifest_sorter.go:192: in

In [65]:
!export export POD_NAME=$(kubectl get pods --namespace ambassador -l "app=ambassador,release=ambassador" -o jsonpath="{.items[0].metadata.name}")

error: error executing jsonpath "{.items[0].metadata.name}": Error executing template: array index out of bounds: index 0, length 0. Printing more information for debugging the template:
	template was:
		{.items[0].metadata.name}
	object given to jsonpath engine was:
		map[string]interface {}{"apiVersion":"v1", "items":[]interface {}{}, "kind":"List", "metadata":map[string]interface {}{"resourceVersion":""}}




In [57]:
#!helm install seldon-core seldon-core-operator \
#    --repo https://storage.googleapis.com/seldon-charts \
#    --set usageMetrics.enabled=true \
#    --set istio.enabled=false \
#    --namespace seldon-system

!helm upgrade --install seldon-core seldon-core-operator \
    --repo https://storage.googleapis.com/seldon-charts \
    --set ambassador.enabled=true \
    --create-namespace \
    --namespace seldon-system

NAME: seldon-core
LAST DEPLOYED: Tue Oct  4 20:29:14 2022
NAMESPACE: seldon-system
STATUS: deployed
REVISION: 1
TEST SUITE: None
Release "seldon-core" has been upgraded. Happy Helming!
NAME: seldon-core
LAST DEPLOYED: Tue Oct  4 20:29:19 2022
NAMESPACE: seldon-system
STATUS: deployed
REVISION: 2
TEST SUITE: None


In [58]:
!kubectl get pods -A

NAMESPACE       NAME                                          READY   STATUS        RESTARTS   AGE
ambassador      ambassador-6c86bd49f9-rwcbx                   1/1     Running       0          3m21s
ambassador      ambassador-agent-7958b89495-dfzsd             1/1     Running       0          3m21s
default         mlflow-65d6df4fdd-g9rzx                       1/1     Running       1          81m
default         nfs-nfs-server-provisioner-0                  1/1     Running       0          81m
default         rapidsai-98368-jupyter-75cc9dc675-z6zdw       1/1     Running       0          81m
default         rapidsai-98368-scheduler-686f5f7b65-2l8w4     1/1     Running       2          81m
default         rapidsai-98368-worker-d56nh                   1/1     Running       1          81m
default         rapidsai-98368-worker-gcflw                   1/1     Running       6          81m
kube-system     coredns-6c46d74d64-7qmpm                      1/1     Running       0          81m
kube-s

NAME: seldon-core-analytics
LAST DEPLOYED: Tue Oct  4 20:23:14 2022
NAMESPACE: seldon-system
STATUS: deployed
REVISION: 1


In [52]:
!helm ls -A

NAME                 	NAMESPACE    	REVISION	UPDATED                                	STATUS  	CHART                       	APP VERSION
mlflow               	default      	1       	2022-10-04 19:08:03.618914726 +0000 UTC	deployed	mlflow-0.7.13               	1.28.0.19  
nfs                  	default      	1       	2022-10-04 19:08:01.505567575 +0000 UTC	deployed	nfs-server-provisioner-1.4.0	3.0.0      
rapidsai-98368       	default      	1       	2022-10-04 19:08:08.487430457 +0000 UTC	deployed	rapidsai-1.47.4             	0.17       
seldon-core-analytics	seldon-system	1       	2022-10-04 20:23:14.240017231 +0000 UTC	deployed	seldon-core-analytics-1.14.1	1.14.1     
traefik              	kube-system  	1       	2022-10-04 19:13:32.306156737 +0000 UTC	deployed	traefik-10.9.100            	2.5.6      
traefik-crd          	kube-system  	1       	2022-10-04 19:13:26.962565853 +0000 UTC	deployed	traefik-crd-10.9.100        	           


In [54]:
#!helm uninstall seldon-core-analytics -n seldon-system

release "seldon-core-analytics" uninstalled


In [59]:
!kubectl apply -f model-a-b.yaml

Error from server (InternalError): error when creating "model-a-b.yaml": Internal error occurred: failed calling webhook "v1.vseldondeployment.kb.io": Post "https://seldon-webhook-service.seldon-system.svc:443/validate-machinelearning-seldon-io-v1-seldondeployment?timeout=10s": x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "custom-metrics-ca")


In [34]:
!kubectl get sdep

NAME             AGE
mre-classifier   3s
