# Using Kubeflow with Algorithmia

This notebook walks you through how to traing a machine learning model using Google's Kubeflow, and then import the model into Algorithmia and write an algorithm that incorporates it.  There are 5 steps in this workflow:

* Installing the software you need to access google's cloud services, control kubeflow, etc
* Setting up a project in Google's cloud, under which kubeflow will run
* Create a kubernetes cluster that is associated with ths project and running kubeflow
* Set up a storage bucket in google's cloud that is associated with the project
* Create a docker image that can be used for training, by downloading the example code
* Configure the training job and kick it off.  This will store a trained model in the bucket you created
* Download the trained model, load it into Algorithmia's data API, and create an algorithm that uses it

This tutorial is largely based on [Google's end-to-end Kubeflow tutorial](https://www.kubeflow.org/docs/gke/gcp-e2e/)  (which is archived [here](https://web.archive.org/save/https://www.kubeflow.org/docs/gke/gcp-e2e/)).

# STEP 1: Installing Required Software

We need to make the following software available on your computer:
* gcloud: access to Google’s cloud services
* kubectl: controlling a Kubernetes cluster
* kustomize: a helper tool that makes it easier to modify kubernetes jobs
* kfctl: controlling Kubeflow specifically

First off though, we will be CDing in and out of some directories, so let's make life easier by creating a python variable that keeps track of out root directory:

In [58]:
import os
#ROOTDIR = os.getcwd()
ROOTDIR='/Users/fieldcady/Desktop/model-deployment/'
print('The working directory is:' + ROOTDIR)

The working directory is:/Users/fieldcady/Desktop/model-deployment/


## Gcloud, Kubectl and kusomize
In this tutorial there will be a lot of shell commands executed from within Jupyter.  Unfortunately, installing the first round of software requires to shell commands that we can **NOT** put in a cell to execute (they require some interactivity, restarting a shell, etc).  You will have to open a terminal program and run them from there.


If you are on a Mac you will have to run the following from the command shell to instalol gcloud:
```
curl https://sdk.cloud.google.com | bash  # install gcloud
exec -l $SHELL  # restart the shell
gcloud init
```
If you are not on a Mac check out instructions for other systems [here](https://cloud.google.com/sdk/docs/downloads-interactive).

Once you have  gcloud installed you can install kubectl and kustomize like this (again, on a Mac):
```
gcloud components install kubectl
brew install kustomize
```

## kfctl
We don’t “install” kfctl exactly - we just download the executable and put it in a place we can reference.  Download the appropriate version from the [Kubeflow releases page](https://github.com/kubeflow/kubeflow/releases/) into the directory of your choice, unzip it, and then put it on your path.  Example commands (in this case just putting it in the working directory) are:

In [60]:
%%capture
%cd $ROOTDIR
!wget https://github.com/kubeflow/kubeflow/releases/download/v0.5.1/kfctl_v0.5.1_darwin.tar.gz
!tar -xvf kfctl_v0.5.1_darwin.tar.gz

In [100]:
if 'kfctl' in os.listdir(ROOTDIR):
    print('Extracted kfctl in the root dir')
else: print('Error!')

Extracted kfctl in the root dir


# STEP 2: Creating the Project
Google Cloud divides things into "projects" that can have multiple resources assocaited with them.  If you already have a project you can just use it.  In this section we will create a new project and configure gcloud to point to it.

First use gcloud to tell Google who you are and associate all of this with your google email address.  This command will open up a browser window where you can log in:

In [102]:
%%capture
!gcloud auth application-default login

Then go to the [Google Cloud console page](https://console.cloud.google.com), create a project, and get its ID.  Also **make sure billing is enabled for it**, and choose a geographical region and zone for the project.  Set those decisions as python variables:

In [64]:
PROJECT='kubeflow-245520'
REGION='us-west2'
ZONE='us-west2-c'

Then configure gcloud to point to that project and zone:

In [65]:
!gcloud config set project $PROJECT
!gcloud config set compute/zone $ZONE

Updated property [core/project].
Updated property [compute/zone].


# STEP 3: Create GKE Cluster
Now we will create a Google Kubernetes Engine (GKE) cluster, under the umbrella of the current project, that has kubeflow running on it.

Choose a name for the Kubernetes cluster to use, a name for the GCloud deployment, and login credentials for the web UI of the cluster:

In [66]:
KFAPP='kfapp7'
KUBEFLOW_USERNAME='fcady'
KUBEFLOW_PASSWORD='mypass'

Use kfctl to create this Kubeflow project and a directory dedicated to it:

In [67]:
%env KUBEFLOW_USERNAME=$KUBEFLOW_USERNAME
%env KUBEFLOW_PASSWORD=$KUBEFLOW_PASSWORD
%cd $ROOTDIR
!$ROOTDIR/kfctl init $KFAPP --platform gcp --project $PROJECT --use_basic_auth -V

env: KUBEFLOW_USERNAME=fcady
env: KUBEFLOW_PASSWORD=mypass
/Users/fieldcady/Desktop/model-deployment
[36mINFO[0m[0005] Not skipping GCP project init, running gcpInitProject.  [36mfilename[0m="gcp/gcp.go:1619"
[33mWARN[0m[0008] batch API enabling is running: [deploymentmanager.googleapis.com servicemanagement.googleapis.com container.googleapis.com cloudresourcemanager.googleapis.com endpoints.googleapis.com file.googleapis.com ml.googleapis.com iam.googleapis.com sqladmin.googleapis.com] (op = operations/acf.d5cbac48-65c2-4828-b118-05fc6ab87eed)  [33mfilename[0m="gcp/gcp.go:1594"
[33mWARN[0m[0009] batch API enabling is running: [deploymentmanager.googleapis.com servicemanagement.googleapis.com container.googleapis.com cloudresourcemanager.googleapis.com endpoints.googleapis.com file.googleapis.com ml.googleapis.com iam.googleapis.com sqladmin.googleapis.com] (op = operations/acf.d5cbac48-65c2-4828-b118-05fc6ab87eed)  [33mfilename[0m="gcp/gcp.go:1594"
[33mWARN[0m[0010] bat

To confirm that it all went well, check that there is a directory into the ROOTDIR that has the name of your Kubeflow project:

In [68]:
contents = os.listdir(ROOTDIR)
if KFAPP in contents: print('Success!  You made the directory')
else: print('Darn, something screwed up')

Success!  You made the directory


Now CD into the directory and let's set up the cluster itself, allocating resources and all that.   This will fail if you don’t have billing enabled!  This process will take a while (maybe 20 minutes?), so now would be a good time to grab some coffee :)

In [98]:
%env GOOGLE_APPLICATION_CREDENTIALS="/Users/fieldcady/Downloads/kubeflow-d7acf7627f84.json"

env: GOOGLE_APPLICATION_CREDENTIALS="/Users/fieldcady/Downloads/kubeflow-d7acf7627f84.json"


In [99]:
%cd $ROOTDIR/$KFAPP
#!$ROOTDIR/kfctl generate all -V --zone $ZONE
!$ROOTDIR/kfctl apply all -V

/Users/fieldcady/Desktop/model-deployment/kfapp7
[36mINFO[0m[0000] deploying kubeflow application                [36mfilename[0m="cmd/apply.go:35"
[36mINFO[0m[0000] reading from /Users/fieldcady/Desktop/model-deployment/kfapp7/app.yaml  [36mfilename[0m="coordinator/coordinator.go:341"
[31mFATA[0m[0000] Could not authenticate Client: google: error getting credentials using GOOGLE_APPLICATION_CREDENTIALS environment variable: open "/Users/fieldcady/Downloads/kubeflow-d7acf7627f84.json": no such file or directory  [31mfilename[0m="gcp/gcp.go:101"


After the cluster starts it will have a URL where you can log in with the previously set username and password to check on everything.  This code snippet will print the url:

In [84]:
print(f"https://{KFAPP}.endpoints.{PROJECT}.cloud.goog/")

https://kfapp7.endpoints.kubeflow-245520.cloud.goog/


Next connect to Kubeflow cluster+deployment:

In [19]:
#%%capture
!gcloud container clusters get-credentials $KFAPP --zone $ZONE --project $PROJECT

Fetching cluster endpoint and auth data.
kubeconfig entry generated for kfapp3.


# STEP 4: Make Storage Bucket
Now we will set up a cloud storage location where our trained model will be stored:

In [40]:
BUCKET_NAME=PROJECT + '-' + KFAPP + '-bucket'

In [41]:
!echo $ZONE gs://$BUCKET_NAME
!gsutil mb -c regional -l $REGION gs://$BUCKET_NAME

us-west2-c gs://kubeflow-245520-kfapp6-bucket
Creating gs://kubeflow-245520-kfapp6-bucket/...


You should see the bucket listed at ```https://console.cloud.google.com/storage/browser?project=<PROJECT>```.

# STEP 5: Get code, build training Docker image, and Push it to the Registry
The next step for getting Kubeflow running is to get the Docker training image built and stored in GCE where Kubeflow can find it.

This is where you will start changing things for your own projects, but for this walk-through we will use the MNIST example.

In [49]:
import datetime
import time
VERSION_TAG=str(round(time.time()))
TRAIN_IMG_PATH=f"gcr.io/{PROJECT}/{KFAPP}-train:{VERSION_TAG}"
WORKING_DIR=ROOTDIR + "examples/mnist"
print(TRAIN_IMG_PATH)

gcr.io/kubeflow-245520/kfapp6-train:1567792362


In [51]:
%%capture
%cd $ROOTDIR
!git clone https://github.com/kubeflow/examples.git
!docker build -f examples/mnist/Dockerfile.model -t $TRAIN_IMG_PATH examples/mnist
!gcloud auth configure-docker --quiet
!docker push $TRAIN_IMG_PATH

You should then be able to see the image in the Image Registry in gcloud at the URL created by this code snippet:

In [50]:
print(f"https://console.cloud.google.com/gcr/images/{PROJECT}")

https://console.cloud.google.com/gcr/images/kubeflow-245520


# STEP 6: Create the training job and run it on the cluster

First we cd into the directory with the configuration files for training and use kustomize to make some edits:

In [44]:
%%capture
%cd $ROOTDIR/examples/mnist/training/GCS
!kustomize edit add configmap mnist-map-training --from-literal=secretName=user-gcp-sa
!kustomize edit add configmap mnist-map-training --from-literal=secretMountPath=/var/secrets
!kustomize edit add configmap mnist-map-training --from-literal=GOOGLE_APPLICATION_CREDENTIALS=/var/secrets/user-gcp-sa.json

At this point we **should** be one command away from launching the training job on our Kubeflow cluster.

Alas, Kubeflow is an early-stage open-source project, and that means some rough edges.  In this case there are bugs that improperly handle the example configuration files, and we will need to edit them ourselves.  The two files that need editing are kustomization.yaml in the current directory ("main file") and kustomization.yaml in the base directory ("base file").  The changes that need to be made are:
* In the main file the file Chief_patch.yaml gets added as a path.  But it needs a namespace associated with it that matches the one in the base file.  Do that by adding “namespace: kubeflow” to the properties underneath it.
* The main and base files both have a "vars:" section that defines some properties.  Move all of the properties in the base file to the main file, deleting that section in the base file.

For simplicity, the next cell will write versions of these files with those changes already made:

In [52]:
%cd $ROOTDIR/examples/mnist/training/GCS
!git checkout ../base/kustomization.yaml
!git checkout kustomization.yaml

/Users/fieldcady/Desktop/model-deployment/kubeflow_integration/examples/mnist/training/GCS


In [53]:
base_file = """apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization

resources:
- Chief.yaml

namespace: kubeflow

generatorOptions:
  disableNameSuffixHash: true

configurations:
- params.yaml
"""

main_file = f"""apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization


configurations:
- params.yaml

# TBD (jinchihe) Need move the image to base file once.
# the issue addressed: kubernetes-sigs/kustomize/issues/1040
# TBD (jinchihe) Need to update the image once
# the issue addressed: kubeflow/testing/issues/373
images:
- name: training-image
  newName: {TRAIN_IMG_PATH}

vars:
- fieldref:
    fieldPath: data.name
  name: trainingName
  objref:
    apiVersion: v1
    kind: ConfigMap
    name: mnist-map-training
- fieldref:
    fieldPath: data.modelDir
  name: modelDir
  objref:
    apiVersion: v1
    kind: ConfigMap
    name: mnist-map-training
- fieldref:
    fieldPath: data.exportDir
  name: exportDir
  objref:
    apiVersion: v1
    kind: ConfigMap
    name: mnist-map-training
- fieldref:
    fieldPath: data.trainSteps
  name: trainSteps
  objref:
    apiVersion: v1
    kind: ConfigMap
    name: mnist-map-training
- fieldref:
    fieldPath: data.batchSize
  name: batchSize
  objref:
    apiVersion: v1
    kind: ConfigMap
    name: mnist-map-training
- fieldref:
    fieldPath: data.learningRate
  name: learningRate
  objref:
    apiVersion: v1
    kind: ConfigMap
    name: mnist-map-training
- fieldref:
    fieldPath: data.GOOGLE_APPLICATION_CREDENTIALS
  name: GOOGLE_APPLICATION_CREDENTIALS
  objref:
    apiVersion: v1
    kind: ConfigMap
    name: mnist-map-training
- fieldref:
    fieldPath: data.secretName
  name: secretName
  objref:
    apiVersion: v1
    kind: ConfigMap
    name: mnist-map-training
- fieldref:
    fieldPath: data.secretMountPath
  name: secretMountPath
  objref:
    apiVersion: v1
    kind: ConfigMap
    name: mnist-map-training

patchesJson6902:
- path: Chief_patch.yaml
  target:
    group: kubeflow.org
    kind: TFJob
    name: $(trainingName)
    namespace: kubeflow
    version: v1beta2
resources:
- ../base
configMapGenerator:
- literals:
  - name=mnist-train-dist5
  - trainSteps=2000
  - batchSize=1000
  - learningRate=0.01
  - secretName=user-gcp-sa
  - secretMountPath=/var/secrets
  - GOOGLE_APPLICATION_CREDENTIALS=/var/secrets/user-gcp-sa.json
  - modelDir=gs://{BUCKET_NAME}/
  - exportDir=gs://{BUCKET_NAME}/export
  name: mnist-map-training
"""

_=open('../base/kustomization.yaml', 'w').write(base_file)
_=open('kustomization.yaml', 'w').write(main_file)

Note the name=mnist-train-dist line in the YAML file.  **You will have to choose a new name every
time you re-kick-off the training job** - otherwise the workflow will not get created.  Now we can kick off the job with this command:

In [47]:
!kustomize build . | kubectl apply -f -

configmap "mnist-map-training-bdgcg76k57" created
tfjob.kubeflow.org "mnist-train-dist5" configured


You can monitor the workflow on the [Google Worklow Page](https://console.cloud.google.com/kubernetes/workload), where you should see a workflow like "mnist-train-dist-chief-0".  **Wait for it to finish.**  After it finishes running then you can copy the exported model file (which is a SavedModel in Tensorflow) to your local machine and zip it up for transfer to Algorithmia.

Now is a good time to grab some coffe - this might take a while. :)

In [30]:
%%capture
# Copy trained model to local space
!gsutil cp -r gs://$BUCKET_NAME/export .
# Make model/ directory, deleting if it already exists
!rm -rf model*
!mkdir model
# Compress the trained model into a ZIP file
# NOTE: this expects there to be only one model in yoru export/ folder
!cp -r export/$(ls export)/* model/
!zip model.zip -r model

# STEP 7: Loading Model to Algorithmia and Creating an Algorithm

In [31]:
%%capture
# Put zipped model into Algorithmia
!algo rm .my/kubeflow_example force=true
!algo mkdir .my/kubeflow_example
!algo cp model.zip data://.my/kubeflow_example

Then go to [Algorithmia](http://www.algorithmia.com), create a new algorithm (**making sure to give it internet access** for this tutorial), and set its name as a variable:

In [32]:
ALGORITHMIA_USERNAME='fcady'
ALGORITHM_NAME="foo"

FILE_TO_WRITE=ALGORITHM_NAME+'/src/'+ALGORITHM_NAME+'.py'
FILE_TO_COMMIT='src/'+ALGORITHM_NAME+'.py'
REPO_TO_CLONE=ALGORITHMIA_USERNAME+'/'+ALGORITHM_NAME

Now let's use git to checkout this algorithm, write some example code that uses out model, and push it:

In [33]:
%%capture
%cd $ROOTDIR
!rm -rf $ALGORITHM_NAME  # Delete previously downloaded clone, if it exists
!algo clone $REPO_TO_CLONE

In [34]:
%%writefile $FILE_TO_WRITE
import Algorithmia
import tensorflow as tf
import requests, zipfile
from os import mkdir, listdir

IMAGE_FNAME  = '/tmp/foo.png'

client = Algorithmia.client()

def extract_model():
    filename = "data://.my/kubeflow_example/model.zip"
    input_zip = client.file(filename).getFile().name
    mkdir("/tmp/unzipped_files")
    zipped_file = zipfile.ZipFile(input_zip)
    return zipped_file.extractall("/tmp/unzipped_files")

def download_image(url):
    with open(IMAGE_FNAME, 'wb') as f:
        f.write(requests.get(url).content)

def create_session(path_to_graph = "/tmp/unzipped_files/model"):
    session = tf.Session()
    tf.saved_model.loader.load(session, ['serve'], path_to_graph)
    y = session.graph.get_tensor_by_name('Softmax:0')
    x = session.graph.get_tensor_by_name('Placeholder:0')
    return (y, x, session)

extract_model()
Y, X, SESSION = create_session()

def classify_image():
    img = tf.keras.preprocessing.image.load_img(IMAGE_FNAME).resize((28,28))
    x = tf.keras.preprocessing.image.img_to_array(img)
    xx = x.mean(axis=2).reshape((1,28,28)) / 255
    predict_values = tf.argmax(Y, 1)
    ret = predict_values.eval(session=SESSION,feed_dict={X: xx})
    return int(ret)

def apply(input):
    try:
        download_image(input['url'])
        msg = 'downloaded image'
    except Exception as e:
        msg = 'failed to get image:' + str(e)
    try: label = classify_image()
    except Exception as e:
        label = str(e)
    output = {
        'label': label,
        'msg': msg
    }
    return output

Overwriting foo/src/foo.py


Committing and pushing our changes can be a bit finicky when done through Jupyter.  Specifically:
* We commit the changes twice, because the first commit doesn't always take
* Sometimes you may have to kill the push process and restart it

In [35]:
%%capture
%cd $ROOTDIR/$ALGORITHM_NAME
!git commit -a -m "Code for the algorithm, from Jupyter"
%cd $ROOTDIR/$ALGORITHM_NAME
!git commit -a -m "Code for the algorithm, from Jupyter"

In [37]:
# Occassionally you have to run this twice...
%cd $ROOTDIR/$ALGORITHM_NAME
!git push

/Users/fieldcady/Desktop/model-deployment/kubeflow_integration/kfapp3/examples/mnist/training/GCS/foo
Everything up-to-date
/Users/fieldcady/Desktop/model-deployment/kubeflow_integration/kfapp3/examples/mnist/training/GCS


Now let's build the algorithm!

In [None]:
import Algorithmia, os
api_key = os.environ['ALGORITHMIA_API_KEY']  # Or enter the key directly
client = Algorithmia.client(api_key)

algo_namespace = "{}/{}".format(ALGORITHMIA_USERNAME, ALGORITHM_NAME)
client.algo(algo_namespace).compile()
client.algo(algo_namespace).publish()

In [41]:
latest_hash = client.algo(algo_namespace).info().version_info.git_hash
algo_input = {
    "url": "https://edwin-de-jong.github.io/blog/mnist-sequence-data/fig/5.png"
    #"url": "https://miro.medium.com/max/490/1*nlfLUgHUEj5vW7WVJpxY-g.png"
}
res = client.algo(algo_namespace+'/'+latest_hash).pipe(algo_input).result
print(res)

{'label': 8, 'msg': 'downloaded image'}
