<h1> Scaling up ML using Cloud ML </h1>

This notebook is Lab3a of CPB 102, Google's course on Machine Learning using Cloud ML.

In this notebook, we take a previously developed TensorFlow model to predict taxifare rides and package it up so that it can be run in Cloud ML. For now, we'll run this on a small dataset. The model that was developed is rather simplistic, and therefore, the accuracy of the model is not great either.  However, this notebook illustrates *how* to package up a TensorFlow model to run it within Cloud ML. 

<div id="toc"></div>

Later in the course, we will look at ways to make a more effective machine learning model.

In [34]:
%%javascript
$.getScript('https://kmahelona.github.io/ipython_notebook_goodies/ipython_notebook_toc.js')

<IPython.core.display.Javascript object>

In [None]:
import google.cloud.ml as ml
import tensorflow as tf
print tf.__version__
print ml.sdk_location

<h2> Environment variables for project and bucket </h2>

Change the cell below to reflect your Project ID and bucket name. Note that:
<ol>
<li> Your project id is the *unique* string that identifies your project (not the project name). You can find this from the GCP Console dashboard's Home page.  My dashboard reads:  <b>Project ID:</b> cloud-training-demos </li>
<li> Cloud training often involves saving and restoring model files. Therefore, we should <b>create a single-region bucket</b>. If you don't have a bucket already, I suggest that you create one from the GCP console (because it will dynamically check whether the bucket name you want is available) </li>
</ol>

The next cell ensures that your bucket is writeable by Cloud ML. You need to do this only once (not once per notebook).

In [None]:
import os
PROJECT = 'cloud-training-demos'    # CHANGE THIS
BUCKET = 'cloud-training-demos-ml'  # CHANGE THIS
REGION = 'us-central1' # CHANGE THIS

os.environ['PROJECT'] = PROJECT # for bash
os.environ['BUCKET'] = BUCKET # for bash
os.environ['REGION'] = REGION # for bash

In [None]:
%bash
PROJECT_ID=$PROJECT
AUTH_TOKEN=$(gcloud auth print-access-token)
SVC_ACCOUNT=$(curl -X GET -H "Content-Type: application/json" -H "Authorization: Bearer $AUTH_TOKEN" https://ml.googleapis.com/v1beta1/projects/$PROJECT_ID:getConfig | python -c "import json; import sys; response = json.load(sys.stdin); print response['serviceAccount']")
echo "Authorizing Cloud ML service account $SVC_ACCOUNT to write to $BUCKET"
gsutil acl ch -u $SVC_ACCOUNT:WRITE gs://$BUCKET/
gsutil defacl ch -u $SVC_ACCOUNT:O gs://$BUCKET/

Verify the project and bucket settings:

In [None]:
%bash
echo "project=$PROJECT"
echo "bucket=$BUCKET"

<h2> Packaging up the code </h2>

Take your code and put into a standard Python package structure.  model.py and task.py contain the Tensorflow code from earlier

In [None]:
!find taxifare

In [None]:
!cat taxifare/trainer/model.py

<h2> Find absolute paths to your data </h2>

Note the absolute paths below. /content is mapped in Datalab to where the home icon takes you

In [None]:
%bash
rm -rf /content/training-data-analyst/CPB102/lab3a/taxi_trained
head -1 /content/training-data-analyst/CPB102/lab1a/taxi-train.csv
head -1 /content/training-data-analyst/CPB102/lab1a/taxi-valid.csv

Ensure that you are using Tensorflow 1.0

In [None]:
import tensorflow as tf
print tf.__version__

<h2> Running the Python module from the command-line </h2>

In [None]:
%bash
rm -rf taxifare.tar.gz taxi_trained
export PYTHONPATH=${PYTHONPATH}:/content/training-data-analyst/CPB102/lab3a/taxifare
python -m trainer.task \
   --train_data_paths=/content/training-data-analyst/CPB102/lab1a/taxi-train.csv \
   --eval_data_paths=/content/training-data-analyst/CPB102/lab1a/taxi-valid.csv  \
   --output_dir=/content/training-data-analyst/CPB102/lab3a/taxi_trained \
   --num_epochs=100

In [None]:
!ls /content/training-data-analyst/CPB102/lab3a/taxi_trained

<h2> Running locally using gcloud </h2>

In [None]:
%bash
rm -rf taxifare.tar.gz taxi_trained
gcloud beta ml local train \
   --module-name=trainer.task \
   --package-path=/content/training-data-analyst/CPB102/lab3a/taxifare/trainer \
   -- \
   --train_data_paths=/content/training-data-analyst/CPB102/lab1a/taxi-train.csv \
   --eval_data_paths=/content/training-data-analyst/CPB102/lab1a/taxi-valid.csv  \
   --output_dir=/content/training-data-analyst/CPB102/lab3a/taxi_trained \
   --num_epochs=100

In [None]:
%tensorboard start --logdir /content/training-data-analyst/CPB102/lab3a/taxi_trained

In [None]:
%tensorboard stop --pid 25130

In [None]:
!ls /content/training-data-analyst/CPB102/lab3a/taxi_trained

<h2> Submit training job using gcloud </h2>

In [None]:
%bash
echo $BUCKET
gsutil cp /content/training-data-analyst/CPB102/lab1a/*.csv gs://${BUCKET}/taxifare/smallinput/

In [None]:
%%bash
OUTDIR=gs://${BUCKET}/taxifare/smallinput/taxi_trained
JOBNAME=lab3a_$(date -u +%y%m%d_%H%M%S)
echo $OUTDIR $REGION $JOBNAME
gsutil rm -rf $OUTDIR
gcloud beta ml jobs submit training $JOBNAME \
   --region=$REGION \
   --module-name=trainer.task \
   --package-path=/content/training-data-analyst/CPB102/lab3a/taxifare/trainer \
   --job-dir=$OUTDIR \
   --staging-bucket=gs://$BUCKET \
   --scale-tier=BASIC \
   -- \
   --train_data_paths=gs://${BUCKET}/taxifare/smallinput/taxi-train.csv \
   --eval_data_paths=gs://${BUCKET}/taxifare/smallinput/taxi-valid.csv  \
   --output_dir=$OUTDIR \
   --num_epochs=100

In [None]:
!gcloud beta ml jobs describe smallinput_job_170213_222450

<h2> Prediction </h2>

Make sure that the training job has completed before proceeding to this step (check the log above)

To predict the taxifare for new inputs, you first have to deploy the trained model (deleting a previous one if necessary):

In [None]:
%bash
# Work around https://buganizer.corp.google.com/issues/31730085
gsutil cp gs://$BUCKET/taxifare/taxi_preproc/metadata.yaml gs://$BUCKET/taxifare/taxi_trained/model/

In [None]:
%mlalpha delete --name taxifare.v1

In [None]:
%mlalpha deploy --name taxifare.v1 --path gs://$BUCKET/taxifare/taxi_trained/model/

In [None]:
from googleapiclient import discovery
from oauth2client.client import GoogleCredentials
import json

import google.cloud.ml.features as features
from google.cloud.ml import session_bundle

credentials = GoogleCredentials.get_application_default()
api = discovery.build('ml', 'v1beta1', credentials=credentials,
            discoveryServiceUrl='https://storage.googleapis.com/cloud-ml/discovery/ml_v1beta1_discovery.json')

request_data = {'instances':
  [
    {'examples':
      {
        'pickup_longitude': -73.885262,
        'pickup_latitude': 40.773008,
        'dropoff_longitude': -73.987232,
        'dropoff_latitude': 40.732403,
        'passenger_count': 2,
        'fare_amount': -999
      }
    }
  ]
}

parent = 'projects/%s/models/%s/versions/%s' % (PROJECT, 'taxifare', 'v1')
response = api.projects().predict(body=request_data, name=parent).execute()
print "response={0}".format(response)

Copyright 2016 Google Inc. Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License