<h1> Scaling up ML using Cloud ML Engine </h1>

In this notebook, you use a previously developed TensorFlow model to predict taxifare rides and package it so that it can be run in Cloud MLE. For now, you will run this on a small dataset. The model that was developed is rather simplistic, and therefore, the accuracy of the model is not great. However, this notebook illustrates *how* to package up a TensorFlow model to run it in Cloud MLE. 

Later in the course, you will look at ways to make a more effective machine learning model.

In [44]:
import tensorflow as tf
print tf.__version__

1.5.0


<h2> Environment variables for project and bucket </h2>

Note that:
<ol>
<li> Enter the <b>Project ID</b>, <b>Bucket Name</b>, and <b>Bucket Region</b> below.
The project ID can be recalled in Cloud Shell using:
<code>echo $DEVSHELL_PROJECT_ID</code>

<l1>You verified or created a bucket in the lab guide steps prior to starting this notebook.
A Multi-Regional bucket will not work with this lab.

</ol>
<b>Change the cell below</b> to reflect your <b>Project ID</b>, <b>Bucket Name</b>, and <b>Bucket Region</b>


In [45]:
import os
PROJECT = 'project-id' # REPLACE WITH YOUR PROJECT ID
BUCKET = 'bucket-name' # REPLACE WITH YOUR BUCKET NAME
REGION = 'bucket-region' # REPLACE WITH YOUR BUCKET REGION e.g. us-central1

<b>Pass the location of the repository.</b>

When you used ungit to clone the repository into Datalab, it placed the repository at a location on your system.
Explanation: /content is mapped to Datalab home location. And the path to the training_data_analyst repository is relative to /content.

The path below should work.
If for some reason your repository was loaded into a different location in Datalab, you would need to modify the path.

In [46]:
import os
REPO = "/content/datalab/training-data-analyst"
os.listdir(REPO)

['self-paced-labs',
 '.gitignore',
 'CONTRIBUTING.md',
 'README.md',
 'CPB100',
 'blogs',
 'LICENSE',
 'quests',
 'datalab',
 'courses',
 '.git']

In [47]:
# for bash
os.environ['PROJECT'] = PROJECT
os.environ['BUCKET'] = BUCKET
os.environ['REGION'] = REGION
os.environ['REPO'] = REPO

In [48]:
%bash
gcloud config set project $PROJECT
gcloud config set compute/region $REGION

Updated property [core/project].
Updated property [compute/region].


Allow the Cloud ML Engine service account to read/write to the bucket containing training data. Note that the cell below requires Google Cloud Machine Learning Engine to be enabled in the project of choice.

In [49]:
%bash
PROJECT_ID=$PROJECT
AUTH_TOKEN=$(gcloud auth print-access-token)
SVC_ACCOUNT=$(curl -X GET -H "Content-Type: application/json" \
    -H "Authorization: Bearer $AUTH_TOKEN" \
    https://ml.googleapis.com/v1/projects/${PROJECT_ID}:getConfig \
    | python -c "import json; import sys; response = json.load(sys.stdin); \
    print response['serviceAccount']")

echo "Authorizing the Cloud ML Service account $SVC_ACCOUNT to access files in $BUCKET"
gsutil -m defacl ch -u $SVC_ACCOUNT:R gs://$BUCKET
gsutil -m acl ch -u $SVC_ACCOUNT:R -r gs://$BUCKET  # error message (if bucket is empty) can be ignored
gsutil -m acl ch -u $SVC_ACCOUNT:W gs://$BUCKET

Authorizing the Cloud ML Service account service-524491751501@cloud-ml.google.com.iam.gserviceaccount.com to access files in qwiklabs-bucket-for-ml


  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0100   134    0   134    0     0    535      0 --:--:-- --:--:-- --:--:--   538
Updated default ACL on gs://qwiklabs-bucket-for-ml/
Encountered a problem: CommandException: No URLs matched: gs://qwiklabs-bucket-for-ml/*
Updated ACL on gs://qwiklabs-bucket-for-ml/


<h2> Packaging up the code </h2>

Take your code and put into a standard Python package structure.  <a href="taxifare/trainer/model.py">model.py</a> and <a href="taxifare/trainer/task.py">task.py</a> contain the Tensorflow code from earlier (explore the <a href="taxifare/trainer/">directory structure</a>).

In [50]:
!find taxifare

taxifare
taxifare/trainer
taxifare/trainer/model.pyc
taxifare/trainer/__init__.py
taxifare/trainer/__init__.pyc
taxifare/trainer/model.py
taxifare/trainer/task.py
taxifare/trainer.egg-info
taxifare/trainer.egg-info/dependency_links.txt
taxifare/trainer.egg-info/SOURCES.txt
taxifare/trainer.egg-info/PKG-INFO
taxifare/trainer.egg-info/top_level.txt
taxifare/PKG-INFO
taxifare/setup.cfg
taxifare/setup.py


In [51]:
!cat taxifare/trainer/model.py

#!/usr/bin/env python

# Copyright 2017 Google Inc. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#      http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

import tensorflow as tf
from tensorflow.contrib import layers
import tensorflow.contrib.learn as tflearn
from tensorflow.contrib import metrics


tf.logging.set_verbosity(tf.logging.INFO)

CSV_COLUMNS = ['fare_amount', 'pickuplon','pickuplat

<h2> Find absolute paths to your data </h2>

Note the absolute paths below. /content is mapped in Datalab to where the home icon takes you

In [52]:
%bash
rm -rf $REPO/courses/machine_learning/cloudmle/taxi_trained
head -1 $REPO/courses/machine_learning/datasets/taxi-train.csv
head -1 $REPO/courses/machine_learning/datasets/taxi-valid.csv

10.5,-74.00773,40.744548,-74.011785,40.708727,3,0
8.0,-73.95463,40.76562,-73.95901,40.77483,6,0


<h2> Running the Python module from the command-line </h2>

In [53]:
%bash
rm -rf taxifare.tar.gz taxi_trained
export PYTHONPATH=${PYTHONPATH}:${REPO}/courses/machine_learning/cloudmle/taxifare
python -m trainer.task \
   --train_data_paths="${REPO}/courses/machine_learning/datasets/taxi-train*" \
   --eval_data_paths=${REPO}/courses/machine_learning/datasets/taxi-valid.csv  \
   --output_dir=${REPO}/courses/machine_learning/cloudmle/taxi_trained \
   --num_epochs=10 --job-dir=./tmp

  from ._conv import register_converters as _register_converters
INFO:tensorflow:Using default config.
INFO:tensorflow:Using config: {'_save_checkpoints_secs': 600, '_num_ps_replicas': 0, '_keep_checkpoint_max': 5, '_task_type': None, '_is_chief': True, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7f7a01c85d90>, '_model_dir': '/content/datalab/training-data-analyst/courses/machine_learning/cloudmle/taxi_trained/', '_save_checkpoints_steps': None, '_keep_checkpoint_every_n_hours': 10000, '_session_config': None, '_tf_random_seed': None, '_save_summary_steps': 100, '_environment': 'local', '_num_worker_replicas': 0, '_task_id': 0, '_log_step_count_steps': 100, '_tf_config': gpu_options {
  per_process_gpu_memory_fraction: 1.0
}
, '_evaluation_master': '', '_master': ''}
Instructions for updating:
Monitors are deprecated. Please use tf.train.SessionRunHook.
INFO:tensorflow:Create CheckpointSaverHook.
2018-03-01 16:31:33.489219: I tensorflow/core/platform

In [54]:
!ls $REPO/courses/machine_learning/cloudmle/taxi_trained/export/Servo

1519921896


In [55]:
%writefile ./test.json
{"pickuplon": -73.885262,"pickuplat": 40.773008,"dropofflon": -73.987232,"dropofflat": 40.732403,"passengers": 2}

Overwriting ./test.json


In [56]:
%bash
model_dir=$(ls ${REPO}/courses/machine_learning/cloudmle/taxi_trained/export/Servo)
gcloud ml-engine local predict \
    --model-dir=${REPO}/courses/machine_learning/cloudmle/taxi_trained/export/Servo/${model_dir} \
    --json-instances=./test.json

SCORES
12.2861


  from ._conv import register_converters as _register_converters
2018-03-01 16:31:58.689276: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA



<h2> Running locally using gcloud </h2>

In [57]:
%bash
rm -rf taxifare.tar.gz taxi_trained
gcloud ml-engine local train \
   --module-name=trainer.task \
   --package-path=${REPO}/courses/machine_learning/cloudmle/taxifare/trainer \
   -- \
   --train_data_paths=${REPO}/courses/machine_learning/datasets/taxi-train.csv \
   --eval_data_paths=${REPO}/courses/machine_learning/datasets/taxi-valid.csv  \
   --num_epochs=10 \
   --output_dir=${REPO}/courses/machine_learning/cloudmle/taxi_trained 

  from ._conv import register_converters as _register_converters
INFO:tensorflow:Using default config.
INFO:tensorflow:Using config: {'_save_checkpoints_secs': 600, '_num_ps_replicas': 0, '_keep_checkpoint_max': 5, '_task_type': None, '_is_chief': True, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7f5b17931d10>, '_model_dir': '/content/datalab/training-data-analyst/courses/machine_learning/cloudmle/taxi_trained/', '_save_checkpoints_steps': None, '_keep_checkpoint_every_n_hours': 10000, '_session_config': None, '_tf_random_seed': None, '_save_summary_steps': 100, '_environment': u'cloud', '_num_worker_replicas': 0, '_task_id': 0, '_log_step_count_steps': 100, '_tf_config': gpu_options {
  per_process_gpu_memory_fraction: 1.0
}
, '_evaluation_master': '', '_master': ''}
Instructions for updating:
Monitors are deprecated. Please use tf.train.SessionRunHook.
INFO:tensorflow:Create CheckpointSaverHook.
2018-03-01 16:32:09.876169: I tensorflow/core/platfor

When I ran it (due to random seeds, your results will be different), the RMSE on the evaluation dataset was 9.26.

In [58]:
from google.datalab.ml import TensorBoard
TensorBoard().start('{}/courses/machine_learning/cloudmle/taxi_trained'.format(REPO))

5339

In [59]:
for pid in TensorBoard.list()['pid']:
  TensorBoard().stop(pid)
  print 'Stopped TensorBoard with pid {}'.format(pid)

Stopped TensorBoard with pid 5339


If the above step (to stop TensorBoard) appears stalled, just move on to the next step. You don't need to wait for it to return.

In [60]:
!ls $REPO/courses/machine_learning/cloudmle/taxi_trained

checkpoint				     model.ckpt-160.index
eval					     model.ckpt-160.meta
events.out.tfevents.1519921929.4f0e1aa33704  model.ckpt-1.data-00000-of-00001
export					     model.ckpt-1.index
graph.pbtxt				     model.ckpt-1.meta
model.ckpt-160.data-00000-of-00001


<h2> Submit training job using gcloud </h2>

First copy the training data to the cloud.  Then, launch a training job.

After you submit the job, go to the cloud console (http://console.cloud.google.com) and select <b>Products and services > ML Engine > Jobs</b> to monitor progress.  

<b>Note:</b> Don't be concerned if the notebook stalls (with a blue progress bar) or returns with an error about being unable to refresh auth tokens. This is a long-lived Cloud job and work is going on in the cloud.  Use the Cloud Console link (above) to monitor the job.

%bash
echo $BUCKET
gsutil -m rm -rf gs://${BUCKET}/taxifare/smallinput/
gsutil -m cp ${REPO}/courses/machine_learning/datasets/*.csv gs://${BUCKET}/taxifare/smallinput/

In [66]:
%%bash
OUTDIR=gs://${BUCKET}/taxifare/smallinput/taxi_trained
JOBNAME=lab3a_$(date -u +%y%m%d_%H%M%S)
echo $OUTDIR $REGION $JOBNAME
gsutil -m rm -rf $OUTDIR
gcloud ml-engine jobs submit training $JOBNAME \
   --region=$REGION \
   --module-name=trainer.task \
   --package-path=${REPO}/courses/machine_learning/cloudmle/taxifare/trainer \
   --job-dir=$OUTDIR \
   --staging-bucket=gs://$BUCKET \
   --scale-tier=BASIC \
   --runtime-version=1.0 \
   -- \
   --train_data_paths="gs://${BUCKET}/taxifare/smallinput/taxi-train*" \
   --eval_data_paths="gs://${BUCKET}/taxifare/smallinput/taxi-valid*"  \
   --output_dir=$OUTDIR \
   --num_epochs=100

gs://qwiklabs-bucket-for-ml/taxifare/smallinput/taxi_trained us-central1 lab3a_180301_163723
jobId: lab3a_180301_163723
state: QUEUED


CommandException: 1 files/objects could not be removed.
  for chunk in iter(lambda: fp.read(4096), ''):
Job [lab3a_180301_163723] submitted successfully.
Your job is still active. You may view the status of your job with the command

  $ gcloud ml-engine jobs describe lab3a_180301_163723

or continue streaming the logs with the command

  $ gcloud ml-engine jobs stream-logs lab3a_180301_163723


Don't be concerned if the notebook appears stalled (with a blue progress bar) or returns with an error about being unable to refresh auth tokens. This is a long-lived Cloud job and work is going on in the cloud. 

<b>Use the Cloud Console link to monitor the job and do NOT proceed until the job is done.</b>

<h2> Deploy model </h2>

Find out the actual name of the subdirectory where the model is stored and use it to deploy the model.  Deploying model will take up to <b>15 minutes</b>.

In [68]:
%bash
gsutil ls gs://${BUCKET}/taxifare/smallinput/taxi_trained/export/Servo

gs://qwiklabs-bucket-for-ml/taxifare/smallinput/taxi_trained/export/Servo/
gs://qwiklabs-bucket-for-ml/taxifare/smallinput/taxi_trained/export/Servo/1519922615882/


In [69]:
%bash
MODEL_NAME="taxifare"
MODEL_VERSION="v1"
MODEL_LOCATION=$(gsutil ls gs://${BUCKET}/taxifare/smallinput/taxi_trained/export/Servo | tail -1)
echo "Deleting and deploying $MODEL_NAME $MODEL_VERSION from $MODEL_LOCATION ... this will take a few minutes"
#gcloud ml-engine versions delete ${MODEL_VERSION} --model ${MODEL_NAME}
#gcloud ml-engine models delete ${MODEL_NAME}
gcloud ml-engine models create ${MODEL_NAME} --regions $REGION
gcloud ml-engine versions create ${MODEL_VERSION} --model ${MODEL_NAME} --origin ${MODEL_LOCATION}

Deleting and deploying taxifare v1 from gs://qwiklabs-bucket-for-ml/taxifare/smallinput/taxi_trained/export/Servo/1519922615882/ ... this will take a few minutes


Created ml engine model [projects/qwiklabs-gcp-f5532019d0e2a251/models/taxifare].
Creating version (this might take a few minutes)......
.................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................done.


<h2> Prediction </h2>

In [70]:
%bash
gcloud ml-engine predict --model=taxifare --version=v1 --json-instances=./test.json

OUTPUTS
11.5591


In [71]:
from googleapiclient import discovery
from oauth2client.client import GoogleCredentials
import json

credentials = GoogleCredentials.get_application_default()
api = discovery.build('ml', 'v1', credentials=credentials,
            discoveryServiceUrl='https://storage.googleapis.com/cloud-ml/discovery/ml_v1_discovery.json')

request_data = {'instances':
  [
      {
        'pickuplon': -73.885262,
        'pickuplat': 40.773008,
        'dropofflon': -73.987232,
        'dropofflat': 40.732403,
        'passengers': 2,
      }
  ]
}

parent = 'projects/%s/models/%s/versions/%s' % (PROJECT, 'taxifare', 'v1')
response = api.projects().predict(body=request_data, name=parent).execute()
print "response={0}".format(response)

response={u'predictions': [{u'outputs': 11.559096336364746}]}


<h2> Train on larger dataset </h2>

I have already followed the steps below and the files are already available. <b> You don't need to do the steps in this comment. </b> In the next chapter (on feature engineering), we will avoid all this manual processing by using Cloud Dataflow.

Go to http://bigquery.cloud.google.com/ and type the query:
<pre>
SELECT
  (tolls_amount + fare_amount) AS fare_amount,
  pickup_longitude AS pickuplon,
  pickup_latitude AS pickuplat,
  dropoff_longitude AS dropofflon,
  dropoff_latitude AS dropofflat,
  passenger_count*1.0 AS passengers,
  'nokeyindata' AS key
FROM
  [nyc-tlc:yellow.trips]
WHERE
  trip_distance > 0
  AND fare_amount >= 2.5
  AND pickup_longitude > -78
  AND pickup_longitude < -70
  AND dropoff_longitude > -78
  AND dropoff_longitude < -70
  AND pickup_latitude > 37
  AND pickup_latitude < 45
  AND dropoff_latitude > 37
  AND dropoff_latitude < 45
  AND passenger_count > 0
  AND ABS(HASH(pickup_datetime)) % 1000 == 1
</pre>

Note that this is now 1,000,000 rows (i.e. 100x the original dataset).  Export this to CSV using the following steps (Note that <b>I have already done this and made the resulting GCS data publicly available</b>, so you don't need to do it.):
<ol>
<li> Click on the "Save As Table" button and note down the name of the dataset and table.
<li> On the BigQuery console, find the newly exported table in the left-hand-side menu, and click on the name.
<li> Click on "Export Table"
<li> Supply your bucket name and give it the name train.csv (for example: gs://cloud-training-demos-ml/taxifare/ch3/train.csv). Note down what this is.  Wait for the job to finish (look at the "Job History" on the left-hand-side menu)
<li> In the query above, change the final "== 1" to "== 2" and export this to Cloud Storage as valid.csv (e.g.  gs://cloud-training-demos-ml/taxifare/ch3/valid.csv)
<li> Download the two files, remove the header line and upload it back to GCS.
</ol>

<p/>
<p/>

<h2> Run Cloud training on 1-million row dataset </h2>

This took 60 minutes and uses as input 1-million rows.  The model is exactly the same as above. The only changes are to the input (to use the larger dataset) and to the Cloud MLE tier (to use STANDARD_1 instead of BASIC -- STANDARD_1 is approximately 10x more powerful than BASIC).  At the end of the training the loss was 32, but the RMSE (calculated on the validation dataset) was stubbornly at 9.03. So, simply adding more data doesn't help.

In [None]:
%%bash

XXXXX  this takes 60 minutes. if you are sure you want to run it, then remove this line.

OUTDIR=gs://${BUCKET}/taxifare/ch3/taxi_trained
JOBNAME=lab3a_$(date -u +%y%m%d_%H%M%S)
CRS_BUCKET=cloud-training-demos # use the already exported data
echo $OUTDIR $REGION $JOBNAME
gsutil -m rm -rf $OUTDIR
gcloud ml-engine jobs submit training $JOBNAME \
   --region=$REGION \
   --module-name=trainer.task \
   --package-path=${REPO}/courses/machine_learning/cloudmle/taxifare/trainer \
   --job-dir=$OUTDIR \
   --staging-bucket=gs://$BUCKET \
   --scale-tier=STANDARD_1 \
   --runtime-version=1.2 \
   -- \
   --train_data_paths="gs://${CRS_BUCKET}/taxifare/ch3/train.csv" \
   --eval_data_paths="gs://${CRS_BUCKET}/taxifare/ch3/valid.csv"  \
   --output_dir=$OUTDIR \
   --train_steps=42239

Copyright 2018 Google Inc. Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License