Using:
The example will be the MNIST handwritten digit classification task. We will train 3 different models to solve this task:
- A TensorFlow neural network model.
- A scikit-learn random forest model.
- An R least squares model.
We will then show various rolling deployments
- Deploy the single Tensorflow model.
- Do a rolling update to an AB test of the Tensorflow model and the sklearn model.
- Do a rolling update to a Multi-armed Bandit over all 3 models to direct traffic in real time to the best model.
In the follow we will:
- gcloud
- kubectl
- ksonnet
- argo
There is a consolidated script to create the demo which can be found here. For a step by step guide do the following:
-
Install kubeflow on GKE. This should create kubeflow in a namespace
kubeflow
. We suggest you use the command line install so you can easily modify your Ksonnet installation. Ensure you have the environment variablesKUBEFLOW_SRC
andKFAPP
set. OAUTH is preferred as with basic auth port-forwarding to ambassador is insufficient -
Install seldon. Go to your Ksonnet application folder setup in the previous step and run
cd ${KUBEFLOW_SRC}/${KFAPP}/ks_app ks pkg install kubeflow/seldon ks generate seldon seldon ks apply default -c seldon
-
Install Helm
kubectl -n kube-system create sa tiller kubectl create clusterrolebinding tiller --clusterrole cluster-admin --serviceaccount=kube-system:tiller helm init --service-account tiller kubectl rollout status deploy/tiller-deploy -n kube-system
-
Create an NFS disk and persistent volume claim called
nfs-1
. You can follow one guide on create an NFS volume using Google Filestore here. A consolidated set of steps is shown here -
Add Cluster Roles so Argo can start jobs successfully
kubectl create clusterrolebinding my-cluster-admin-binding --clusterrole=cluster-admin --user=$(gcloud info --format="value(config.account)") kubectl create clusterrolebinding default-admin2 --clusterrole=cluster-admin --serviceaccount=kubeflow:default
-
Install Seldon Analytics Dashboard
helm install seldon-core-analytics --name seldon-core-analytics --set grafana_prom_admin_password=password --set persistence.enabled=false --repo https://storage.googleapis.com/seldon-charts --namespace kubeflow
-
Port forward the dashboard when running
kubectl port-forward $(kubectl get pods -n kubeflow -l app=grafana-prom-server -o jsonpath='{.items[0].metadata.name}') -n kubeflow 3000:3000
-
Visit http://localhost:3000/dashboard/db/prediction-analytics?refresh=5s&orgId=1 and login using "admin" and the password you set above when launching with helm.
- Python training code
- Python runtime prediction code
- Dockerfile to wrap runtime prediction code to run under seldon-Core.
- Python training code
- Python runtime prediction code
- Dockerfile to wrap runtime prediction code to run under seldon-Core.
- R training code
- R runtime prediction code
- Dockerfile to wrap runtime prediction code to run under seldon-Core.
Follow the steps in ./notebooks/training.ipynb to:
- Run Argo Jobs for each model to:
- Creating training images and push to repo
- Run training
- Create runtime prediction images and push to repo
- Deploy individual runtime model
To push to your own repo the Docker images you will need to setup your docker credentials as a Kubernetes secret containing a config.json. To do this you can find your docker home (typically ~/.docker) and run kubectl create secret generic docker-config --from-file=config.json=${DOCKERHOME}/config.json --type=kubernetes.io/config
to create a secret.
Follow the steps in ./notebooks/serving.ipynb to:
- Deploy the single Tensorflow model.
- Do a rolling update to an AB test of the Tensorflow model and the sklearn model.
- Do a rolling update to a Multi-armed Bandit over all 3 models to direct traffic in real time to the best model.
To ensure the notebook can run successfully install the python dependencies:
pip install -r notebooks/requirements.txt
If you have installed the Seldon-Core analytics you can view them on the grafana dashboard: