Train and Deploy Machine Learning Models on Kubernetes with Kubeflow and Seldon-Core

Using:

kubeflow
seldon-core

The example will be the MNIST handwritten digit classification task. We will train 3 different models to solve this task:

A TensorFlow neural network model.
A scikit-learn random forest model.
An R least squares model.

We will then show various rolling deployments

Deploy the single Tensorflow model.
Do a rolling update to an AB test of the Tensorflow model and the sklearn model.
Do a rolling update to a Multi-armed Bandit over all 3 models to direct traffic in real time to the best model.

In the follow we will:

Install kubeflow and seldon-core on a kubernetes cluster
Train the models
Serve the models

Requirements

gcloud
kubectl
ksonnet
argo

Setup

There is a consolidated script to create the demo which can be found here. For a step by step guide do the following:

Install kubeflow on GKE. This should create kubeflow in a namespace kubeflow. We suggest you use the command line install so you can easily modify your Ksonnet installation. Ensure you have the environment variables KUBEFLOW_SRC and KFAPP set. OAUTH is preferred as with basic auth port-forwarding to ambassador is insufficient

Install seldon. Go to your Ksonnet application folder setup in the previous step and run

cd ${KUBEFLOW_SRC}/${KFAPP}/ks_app

ks pkg install kubeflow/seldon
ks generate seldon seldon
ks apply default -c seldon

Install Helm

kubectl -n kube-system create sa tiller
kubectl create clusterrolebinding tiller --clusterrole cluster-admin --serviceaccount=kube-system:tiller
helm init --service-account tiller
kubectl rollout status deploy/tiller-deploy -n kube-system

Create an NFS disk and persistent volume claim called nfs-1. You can follow one guide on create an NFS volume using Google Filestore here. A consolidated set of steps is shown here

Add Cluster Roles so Argo can start jobs successfully

kubectl create clusterrolebinding my-cluster-admin-binding --clusterrole=cluster-admin --user=$(gcloud info --format="value(config.account)")
kubectl create clusterrolebinding default-admin2 --clusterrole=cluster-admin --serviceaccount=kubeflow:default

Install Seldon Analytics Dashboard

helm install seldon-core-analytics --name seldon-core-analytics --set grafana_prom_admin_password=password --set persistence.enabled=false --repo https://storage.googleapis.com/seldon-charts --namespace kubeflow

Port forward the dashboard when running

kubectl port-forward $(kubectl get pods -n kubeflow -l app=grafana-prom-server -o jsonpath='{.items[0].metadata.name}') -n kubeflow 3000:3000

Visit http://localhost:3000/dashboard/db/prediction-analytics?refresh=5s&orgId=1 and login using "admin" and the password you set above when launching with helm.

MNIST models

Tensorflow Model

Python training code
Python runtime prediction code
Dockerfile to wrap runtime prediction code to run under seldon-Core.

SKLearn Model

Python training code
Python runtime prediction code
Dockerfile to wrap runtime prediction code to run under seldon-Core.

R Model

R training code
R runtime prediction code
Dockerfile to wrap runtime prediction code to run under seldon-Core.

Train the Models

Follow the steps in ./notebooks/training.ipynb to:

Run Argo Jobs for each model to:
- Creating training images and push to repo
- Run training
- Create runtime prediction images and push to repo
- Deploy individual runtime model

To push to your own repo the Docker images you will need to setup your docker credentials as a Kubernetes secret containing a config.json. To do this you can find your docker home (typically ~/.docker) and run kubectl create secret generic docker-config --from-file=config.json=${DOCKERHOME}/config.json --type=kubernetes.io/config to create a secret.

Serve the Models

Follow the steps in ./notebooks/serving.ipynb to:

Deploy the single Tensorflow model.
Do a rolling update to an AB test of the Tensorflow model and the sklearn model.
Do a rolling update to a Multi-armed Bandit over all 3 models to direct traffic in real time to the best model.

To ensure the notebook can run successfully install the python dependencies:

pip install -r notebooks/requirements.txt

If you have installed the Seldon-Core analytics you can view them on the grafana dashboard:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Train and Deploy Machine Learning Models on Kubernetes with Kubeflow and Seldon-Core

Requirements

Setup

MNIST models

Tensorflow Model

SKLearn Model

R Model

Train the Models

Serve the Models

Files

README.md

Latest commit

History

README.md

File metadata and controls

Train and Deploy Machine Learning Models on Kubernetes with Kubeflow and Seldon-Core

Requirements

Setup

MNIST models

Tensorflow Model

SKLearn Model

R Model

Train the Models

Serve the Models