# Iris Categorisation - TensorFlow - MLE

This time we use GCP Cloud ML Engine (MLE) to:

- Accelerate training
- Deploy the model to an production endpoint 


In [24]:
import os

## Initial Setup

Lets configure the project parameters. 

In [25]:
PROJECT = 'irisml-217400' # REPLACE WITH YOUR PROJECT ID
BUCKET = 'iris-demo-4a8337d54c6d59x9' # REPLACE WITH YOUR BUCKET NAME
REGION = 'asia-east1'  # Closet region with MLE (can't train on asia-northeast1)
ENDPOINT_REGION = 'asia-northeast1' 

# For Python Code
# Model Info
MODEL_NAME = 'iris'
# Model Version
MODEL_VERSION = 'v1'
# Training Directory name
TRAINING_DIR = 'iris_trained'
TFVERSION = '1.10'

# For Bash Code (because google-cloud-storage is lousy)
os.environ['PROJECT'] = PROJECT
os.environ['BUCKET'] = BUCKET
os.environ['REGION'] = REGION
os.environ['ENDPOINT_REGION'] = ENDPOINT_REGION
os.environ['MODEL_NAME'] = MODEL_NAME
os.environ['MODEL_VERSION'] = MODEL_VERSION
os.environ['TRAINING_DIR'] = TRAINING_DIR 
os.environ['TFVERSION'] = TFVERSION  # Tensorflow version
os.environ['OUTDIR'] = 'gs://{BUCKET}/trained'.format(BUCKET=BUCKET)

In [26]:
%%bash
gcloud config set project ${PROJECT}
gcloud config set compute/region ${REGION}

Updated property [core/project].
Updated property [compute/region].


## Create our Bucket and Configure Access

No need to download the data locally again if you have it already.

#### First get the service account

In [27]:
from googleapiclient import discovery
from oauth2client.client import GoogleCredentials
import json

credentials = GoogleCredentials.get_application_default()

ml = discovery.build('ml', 'v1', credentials=credentials,
            discoveryServiceUrl='https://storage.googleapis.com/cloud-ml/discovery/ml_v1_discovery.json')

# I find it baffling why the `.projects()` method is used instead of `.Project('project-name').get_config()`
project_config = ml.projects().getConfig(name='projects/'+ PROJECT).execute()
os.environ['SVC_ACCOUNT'] = project_config[u'serviceAccount']

In [28]:
%%bash
gsutil mb -p ${PROJECT} -l ${REGION} gs://${BUCKET}
gsutil -m acl ch -u ${SVC_ACCOUNT}:W gs://${BUCKET} 
gsutil -m cp -r ../data gs://${BUCKET}                  # Send test and training data to GS

Creating gs://iris-demo-4a8337d54c6d59x9/...
ServiceException: 409 Bucket iris-demo-4a8337d54c6d59x9 already exists.
No changes to gs://iris-demo-4a8337d54c6d59x9/
Copying file://../data/iris_test.csv [Content-Type=text/csv]...
Copying file://../data/iris_training.csv [Content-Type=text/csv]...
/ [0 files][    0.0 B/  573.0 B]                                                / [0/3 files][    0.0 B/  1.0 MiB]   0% Done                                    Copying file://../data/validation_data.hdf [Content-Type=application/x-hdf]...
/ [0/3 files][    0.0 B/  1.0 MiB]   0% Done                                    / [1/3 files][  1.0 MiB/  1.0 MiB]  99% Done                                    / [2/3 files][  1.0 MiB/  1.0 MiB]  99% Done                                    / [3/3 files][  1.0 MiB/  1.0 MiB] 100% Done                                    
Operation completed over 3 objects/1.0 MiB.                                      


In [29]:
%%bash
gsutil ls gs://${BUCKET}/data

gs://iris-demo-4a8337d54c6d59x9/data/iris_test.csv
gs://iris-demo-4a8337d54c6d59x9/data/iris_training.csv
gs://iris-demo-4a8337d54c6d59x9/data/validation_data.hdf


## Deployment Package

ML Engine requires that we give it an installable python package.  

In [30]:
ls ../package/trainer

__init__.py  model.py  task.py


## Test locally before Sending to MLE

The model gets the data from Cloud Storage, so to test locally we need to install the python package to do it.

In [31]:
!pip install google-cloud-storage

Collecting google-cloud-storage
[?25l  Downloading https://files.pythonhosted.org/packages/d7/62/a2e3111bf4d1eb54fe86dec694418644e024eb059bf1e66ebdcf9f98ad70/google_cloud_storage-1.13.0-py2.py3-none-any.whl (59kB)
[K    100% |████████████████████████████████| 61kB 16.2MB/s ta 0:00:01
Installing collected packages: google-cloud-storage
Successfully installed google-cloud-storage-1.13.0


### Train

In [32]:
%%bash
# Use Cloud Machine Learning Engine to train the model in local file system
# same as:
# python package/trainer/task.py
gcloud ml-engine local train \
    --module-name=trainer.task \
    --package-path=${PWD}/../package/trainer \
    -- \
    --outdir=/tmp/trained \
    --train_steps=2 \
    --bucket=${BUCKET} \
    --project=${PROJECT}  \
    --test_file=data/iris_test.csv  \
    --train_file=data/iris_training.csv

Running Tensorflow version: 1.8.0
Received OUTDIR: /tmp/trained
Downloading: gs://iris-demo-4a8337d54c6d59x9/data/iris_training.csv
Downloading: gs://iris-demo-4a8337d54c6d59x9/data/iris_test.csv
Defining training spec
Expecting data from:  /tmp/data/iris_training.csv
Defining eval spec
Expecting data from:  /tmp/data/iris_test.csv
Starting training ...


  from ._conv import register_converters as _register_converters
INFO:tensorflow:TF_CONFIG environment variable: {u'environment': u'cloud', u'cluster': {}, u'job': {u'args': [u'--outdir=/tmp/trained', u'--train_steps=2', u'--bucket=iris-demo-4a8337d54c6d59x9', u'--project=irisml-217400', u'--test_file=data/iris_test.csv', u'--train_file=data/iris_training.csv'], u'job_name': u'trainer.task'}, u'task': {}}
INFO:tensorflow:Using default config.
INFO:tensorflow:Using config: {'_save_checkpoints_secs': 600, '_session_config': None, '_keep_checkpoint_max': 5, '_task_type': 'worker', '_train_distribute': None, '_is_chief': True, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7fdbef8edb50>, '_evaluation_master': '', '_save_checkpoints_steps': None, '_keep_checkpoint_every_n_hours': 10000, '_service': None, '_num_ps_replicas': 0, '_tf_random_seed': None, '_master': '', '_num_worker_replicas': 1, '_task_id': 0, '_log_step_count_steps': 100, '_model_dir': '/tmp/t

### Test

Ensure our exported model is service predictions properly

In [33]:
!pip install tables



In [34]:
import pandas as pd

In [35]:
SPECIES = {0: 'Setosa', 1: 'Versicolor', 2: 'Virginica'}

In [36]:
valid = pd.read_hdf('../data/validation_data.hdf', 'test1')

features = valid.drop('Species', axis=1)

In [37]:
print('Expected class outputs are: ')
valid['Species'].map({v:k for k,v in SPECIES.items()})

Expected class outputs are: 


0    0
1    1
2    2
Name: Species, dtype: int64

In [38]:
features.to_json('../test.json', orient='records', lines=True)

In [39]:
!cat '../test.json'

{"PetalLength":1.7,"PetalWidth":0.5,"SepalLength":5.1,"SepalWidth":3.3}
{"PetalLength":4.2,"PetalWidth":1.5,"SepalLength":5.9,"SepalWidth":3.0}
{"PetalLength":5.4,"PetalWidth":2.1,"SepalLength":6.9,"SepalWidth":3.1}

In [40]:
%%bash
# This model dir is the model exported after training and is used for prediction
#
MODEL_RESULTS=/tmp/trained
latest_model_dir=$(ls ${MODEL_RESULTS}/export/exporter | tail -1)
# predict using the trained model
gcloud ml-engine local predict  \
    --model-dir=${MODEL_RESULTS}/export/exporter/${latest_model_dir} \
    --json-instances=../test.json #| awk {'print $1'}

CLASS_IDS  CLASSES  LOGITS                                                           PROBABILITIES
[2]        [u'2']   [-0.3140818476676941, 0.18077421188354492, 0.41236674785614014]  [0.2124050408601761, 0.34839996695518494, 0.43919506669044495]
[2]        [u'2']   [-0.8448461890220642, 0.20084339380264282, 0.9563210606575012]   [0.1009889617562294, 0.2873499393463135, 0.6116611361503601]
[2]        [u'2']   [-1.127501130104065, 0.16294443607330322, 1.3556257486343384]    [0.06019357964396477, 0.21876788139343262, 0.7210385203361511]


  from ._conv import register_converters as _register_converters
2018-10-10 09:30:09.742255: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA



## Train on MLE

Use cloud resources to train the model much more thouroughly than we can locally by increasing the training steps and the target machine (`--scale-tier`)


In [41]:
%%bash
JOBNAME=${MODEL_NAME}_$(date -u +%y%m%d_%H%M%S)
echo $OUTDIR $REGION $JOBNAME
# Clear the Cloud Storage Bucket used for the training job
gsutil -m rm -rf ${OUTDIR}
gcloud ml-engine jobs submit training ${JOBNAME} \
   --region=${REGION} \
   --module-name=trainer.task \
   --package-path=${PWD}/../package/trainer \
   --job-dir=$OUTDIR \
   --staging-bucket=gs://${BUCKET} \
   --scale-tier=BASIC \
   --runtime-version=${TFVERSION} \
   -- \
    --outdir=${OUTDIR} \
    --train_steps=1000 \
    --bucket=${BUCKET} \
    --project=${PROJECT}  \
    --test_file=data/iris_test.csv  \
    --train_file=data/iris_training.csv

gs://iris-demo-4a8337d54c6d59x9/trained asia-east1 iris_181010_093010
jobId: iris_181010_093010
state: QUEUED


Removing gs://iris-demo-4a8337d54c6d59x9/trained/eval/#1538290082752480...
Removing gs://iris-demo-4a8337d54c6d59x9/trained/checkpoint#1538290080288845...
Removing gs://iris-demo-4a8337d54c6d59x9/trained/#1538290078949940...
Removing gs://iris-demo-4a8337d54c6d59x9/trained/eval/events.out.tfevents.1538290082.cmle-training-2845701797023043042#1538290083694278...
Removing gs://iris-demo-4a8337d54c6d59x9/trained/events.out.tfevents.1538290070.cmle-training-2845701797023043042#1538290089076800...
Removing gs://iris-demo-4a8337d54c6d59x9/trained/export/#1538290084472278...
/ [1/19 objects]   5% Done                                                      Removing gs://iris-demo-4a8337d54c6d59x9/trained/export/exporter/#1538290084639511...
/ [2/19 objects]  10% Done                                                      / [3/19 objects]  15% Done                                                      Removing gs://iris-demo-4a8337d54c6d59x9/trained/export/exporter/1538290083/#1538290088020745...

## Deploy 

In [51]:
%%bash
# Create model
gcloud ml-engine models create ${MODEL_NAME} --regions ${ENDPOINT_REGION}

MODEL_LOCATION=$(gsutil ls ${OUTDIR}/export/exporter | tail -1)

echo "MODEL_LOCATION = ${MODEL_LOCATION}"

gcloud ml-engine versions create ${MODEL_VERSION} \
    --model ${MODEL_NAME} --origin ${MODEL_LOCATION} \
    --runtime-version $TFVERSION

MODEL_LOCATION = gs://iris-demo-4a8337d54c6d59x9/trained/export/exporter/1539163961/


ERROR: (gcloud.ml-engine.models.create) Resource in project [irisml-217400] is the subject of a conflict: Field: model.name Error: A model with the same name already exists.
- '@type': type.googleapis.com/google.rpc.BadRequest
  fieldViolations:
  - description: A model with the same name already exists.
    field: model.name
Creating version (this might take a few minutes)......
.................................................................................................done.


### Test new REST Endpoint

Docs [here](https://cloud.google.com/ml-engine/reference/rest/v1/projects/predict)

In [52]:
%%bash 

PROJECT_ID=$PROJECT
AUTH_TOKEN=$(gcloud auth print-access-token)
TEST_INSTANCE="{\"instances\":[$(cat ../test.json|head -1)]}"

curl -X POST -H "Content-Type: application/json" \
    -H "Authorization: Bearer $AUTH_TOKEN" \
    -H "Content-Type: application/json" \
    --data $TEST_INSTANCE \
    https://ml.googleapis.com/v1/projects/${PROJECT_ID}/models/iris:predict

{"predictions": [{"probabilities": [0.9957330822944641, 0.0042669884860515594, 6.210235505097611e-15], "class_ids": [0], "classes": ["0"], "logits": [12.176715850830078, 6.72414493560791, -20.531585693359375]}]}

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0100    87    0     0  100    87      0     59  0:00:01  0:00:01 --:--:--    59100   298    0   211  100    87     95     39  0:00:02  0:00:02 --:--:--    95


# Cleanup

Remove the endpoint and model

In [55]:
%%bash
gcloud ml-engine versions delete ${MODEL_VERSION} --model ${MODEL_NAME} 
gcloud ml-engine models delete ${MODEL_NAME} -q


This will delete version [v1]...

Do you want to continue (Y/n)?  Please enter 'y' or 'n':  
Deleting version [v1]......
........................................done.
