# Scaling up ML using Cloud ML Engine

In this notebook, we take a previously developed TensorFlow model to predict taxifare rides and package it up so that it can be run in Cloud MLE. For now, we'll run this on a small dataset. The model that was developed is rather simplistic, and therefore, the accuracy of the model is not great either.  However, this notebook illustrates *how* to package up a TensorFlow model to run it within Cloud ML. 

Later in the course, we will look at ways to make a more effective machine learning model.

## Environment variables for project and bucket

Note that:
<ol>
<li> Your project id is the *unique* string that identifies your project (not the project name). You can find this from the GCP Console dashboard's Home page.  My dashboard reads:  <b>Project ID:</b> cloud-training-demos </li>
<li> Cloud training often involves saving and restoring model files. If you don't have a bucket already, I suggest that you create one from the GCP console (because it will dynamically check whether the bucket name you want is available). A common pattern is to prefix the bucket name by the project id, so that it is unique. Also, for cost reasons, you might want to use a single region bucket. </li>
</ol>
<b>Change the cell below</b> to reflect your Project ID and bucket name.


In [15]:
import os
PROJECT = 'qwiklabs-gcp-1a8cf323172d8441' # REPLACE WITH YOUR PROJECT ID
BUCKET = 'qwiklabs-gcp-1a8cf323172d8441' # REPLACE WITH YOUR BUCKET NAME
REGION = 'us-central1' # REPLACE WITH YOUR BUCKET REGION e.g. us-central1

In [16]:
# For Python Code
# Model Info
MODEL_NAME = 'stackoverflow'
# Model Version
MODEL_VERSION = 'v1'
# Training Directory name
TRAINING_DIR = 'stackoverflow_trained'

In [17]:
# For Bash Code
os.environ['PROJECT'] = PROJECT
os.environ['BUCKET'] = BUCKET
os.environ['REGION'] = REGION
os.environ['MODEL_NAME'] = MODEL_NAME
os.environ['MODEL_VERSION'] = MODEL_VERSION
os.environ['TRAINING_DIR'] = TRAINING_DIR 
os.environ['TFVERSION'] = '1.13.1'  # Tensorflow version

In [18]:
%%bash
gcloud config set project $PROJECT
gcloud config set compute/region $REGION

Updated property [core/project].
Updated property [compute/region].


### Create the bucket to store model and training data for deploying to Google Cloud Machine Learning Engine Component

In [5]:
%%bash
# The bucket needs to exist for the gsutil commands in next cell to work
gsutil mb -p ${PROJECT} gs://${BUCKET}

Creating gs://qwiklabs-gcp-1a8cf323172d8441/...
ServiceException: 409 Bucket qwiklabs-gcp-1a8cf323172d8441 already exists.


### Enable the Cloud Machine Learning Engine API

The next command works with Cloud Machine Learning Engine API.  In order for the command to work, you must enable the API using the Cloud Console UI.   Use this [link.](https://console.cloud.google.com/project/_/apis/library)  Then search the API list for Cloud Machine Learning and enable the API before executing the next cell.

Allow the Cloud ML Engine service account to read/write to the bucket containing training data.

In [6]:
%%bash
# This command will fail if the Cloud Machine Learning Engine API is not enabled using the link above.
echo "Getting the service account email associated with the Cloud Machine Learning Engine API"

AUTH_TOKEN=$(gcloud auth print-access-token)
SVC_ACCOUNT=$(curl -X GET -H "Content-Type: application/json" \
    -H "Authorization: Bearer $AUTH_TOKEN" \
    https://ml.googleapis.com/v1/projects/${PROJECT}:getConfig \
    | python -c "import json; import sys; response = json.load(sys.stdin); \
    print (response['serviceAccount'])")  # If this command fails, the Cloud Machine Learning Engine API has not been enabled above.

echo "Authorizing the Cloud ML Service account $SVC_ACCOUNT to access files in $BUCKET"
gsutil -m defacl ch -u $SVC_ACCOUNT:R gs://$BUCKET   
gsutil -m acl ch -u $SVC_ACCOUNT:R -r gs://$BUCKET   # error message (if bucket is empty) can be ignored.  
gsutil -m acl ch -u $SVC_ACCOUNT:W gs://$BUCKET      

Getting the service account email associated with the Cloud Machine Learning Engine API
Authorizing the Cloud ML Service account service-860946498807@cloud-ml.google.com.iam.gserviceaccount.com to access files in qwiklabs-gcp-1a8cf323172d8441


  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0100   235    0   235    0     0    597      0 --:--:-- --:--:-- --:--:--   599
Updated default ACL on gs://qwiklabs-gcp-1a8cf323172d8441/
Updated ACL on gs://qwiklabs-gcp-1a8cf323172d8441/EstimatorAPI_DNNClassifier_stackoverflow.posts_questions_TrainAndEval.ipynb
Updated ACL on gs://qwiklabs-gcp-1a8cf323172d8441/EstimatorAPI_DNNClassifier_stackoverflow.posts_questions_TrainAndEval_cloudmle.ipynb
Updated ACL on gs://qwiklabs-gcp-1a8cf323172d8441/stackoverflow/trainer/model.py
Updated ACL on gs://qwiklabs-gcp-1a8cf323172d8441/stackoverflow/trainer/task.py
Updated ACL on gs://qwiklabs-gcp-1a8cf323172d8441/


## Packaging up the code

Take your code and put into a standard Python package structure.  <a href="taxifare/trainer/model.py">model.py</a> and <a href="taxifare/trainer/task.py">task.py</a> containing the Tensorflow code from earlier (explore the <a href="taxifare/trainer/">directory structure</a>).

In [7]:
%%bash
find ${MODEL_NAME}

stackoverflow
stackoverflow/trainer
stackoverflow/trainer/model.py
stackoverflow/trainer/task.py


In [19]:
%%bash
cat ${MODEL_NAME}/trainer/model.py


import google.datalab.bigquery as bq
import pandas as pd
import numpy as np
import seaborn as sns
import shutil

def sample_between(a, b, shredstart):
    basequery = """
  SELECT 
    answer_count, comment_count, favorite_count,  score, view_count,
    TIMESTAMP_DIFF(last_activity_date, creation_date, DAY) as days_posted,
    IF(accepted_answer_id IS NULL , 0, 1 ) as accepted
  FROM 
    `bigquery-public-data.stackoverflow.posts_questions`
  """
  
    # Use sampling for initial model development. Once model is developed, shread the entire dataset into  .csv files based on condition in the sampler.
    sampler = "WHERE MOD(ABS(FARM_FINGERPRINT(CAST(id as STRING))), EVERY_N * 100) < {1} AND MOD(ABS(FARM_FINGERPRINT(CAST(id as STRING))), EVERY_N * 100) >= {0}".format(
            shredstart, shredstart + 10
            )
    sampler2 = "AND {0} >= {1}\n AND {0} < {2}".format(
           "MOD(ABS(FARM_FINGERPRINT(CAST(id AS STRING))), EVERY_N * 100) * {}".format(10),
           (shredst

## Running the Python module from the command-line

#### Clean model training dir/output dir

In [9]:
%%bash
# This is so that the trained model is started fresh each time. However, this needs to be done before 
# tensorboard is started
rm -rf $PWD/${TRAINING_DIR}

In [10]:
import google.datalab.bigquery as bq
import pandas as pd
import numpy as np
import seaborn as sns
import shutil

def sample_between(a, b, shredstart):
  basequery = """
  SELECT 
    answer_count, comment_count, favorite_count,  score, view_count,
    TIMESTAMP_DIFF(last_activity_date, creation_date, DAY) as days_posted,
    IF(accepted_answer_id IS NULL , 0, 1 ) as accepted
  FROM 
    `bigquery-public-data.stackoverflow.posts_questions`
  """
  
  # Use sampling for initial model development. Once model is developed, shread the entire dataset into  .csv files based on condition in the sampler.
  sampler = "WHERE MOD(ABS(FARM_FINGERPRINT(CAST(id as STRING))), EVERY_N * 100) < {1} AND MOD(ABS(FARM_FINGERPRINT(CAST(id as STRING))), EVERY_N * 100) >= {0}".format(
            shredstart, shredstart + 10
            )
  sampler2 = "AND {0} >= {1}\n AND {0} < {2}".format(
           "MOD(ABS(FARM_FINGERPRINT(CAST(id AS STRING))), EVERY_N * 100) * {}".format(10),
           (shredstart*10)+a, (shredstart*10)+b
          )
  return "{}\n{}\n{}".format(basequery, sampler, sampler2)


def create_query(phase, EVERY_N, shredstart):
  """Phase: train (70%) valid (15%) or test (15%)"""
  query = ""
  if phase == 'train':
    query = sample_between(0,60, shredstart)
  elif phase == 'valid':
    query = sample_between(60,75, shredstart)
  else:
    query = sample_between(75, 100, shredstart)
  return query.replace("EVERY_N", str(EVERY_N))

#print(create_query('train', 100))
#(answer_count - AVG(answer_count)) / STDDEV_POP(answer_count)  as answer_count,
#IF(accepted_answer_id IS NULL , cast(0 as int64), cast(1 as int64)) as accepted

def to_csv(df, filename):
  outdf = df.copy(deep = True)
  #outdf.loc[:, 'key'] = np.arange(0, len(outdf)) # rownumber as key
  # Reorder columns so that target is first column
  #print(outdf.head())
  #print(df.head())
  cols = outdf.columns.tolist()
  #print(cols)
  cols.remove('accepted')
  cols.insert(0, 'accepted')
  #print(cols)
  outdf = outdf[cols]  
  
  
  #Normalizing input columns  and replace NaN or null
  normalize_cols = outdf.columns.tolist()
  normalize_cols.remove('accepted')
  for normalize_cols_name in normalize_cols:
    outdf[normalize_cols_name].fillna(0, inplace = True)
    outdf[normalize_cols_name] = (outdf[normalize_cols_name] - outdf[normalize_cols_name].mean())  / outdf[normalize_cols_name].std() 
  #print(outdf)
  #print(outdf['answer_count'] )
  outdf.to_csv(filename,  header = False, index_label = False, index = False)
  print("Wrote {} to {}".format(len(outdf), filename))

for phase in ['train', 'valid', 'test']:
  #for x in range(2):
  for x in range(10):
    query = create_query(phase, 100, x*10)
    #print(query)
    df = bq.Query(query).execute().result().to_dataframe()
    #print(df.head())
    to_csv(df, 'stackoverflow-{}-{}.csv'.format(phase,(x+1)*10))


Wrote 10186 to stackoverflow-train-10.csv
Wrote 10333 to stackoverflow-train-20.csv
Wrote 10426 to stackoverflow-train-30.csv
Wrote 10260 to stackoverflow-train-40.csv
Wrote 10298 to stackoverflow-train-50.csv
Wrote 10401 to stackoverflow-train-60.csv
Wrote 10276 to stackoverflow-train-70.csv
Wrote 10249 to stackoverflow-train-80.csv
Wrote 10291 to stackoverflow-train-90.csv
Wrote 10332 to stackoverflow-train-100.csv
Wrote 3500 to stackoverflow-valid-10.csv
Wrote 3453 to stackoverflow-valid-20.csv
Wrote 3367 to stackoverflow-valid-30.csv
Wrote 3573 to stackoverflow-valid-40.csv
Wrote 3482 to stackoverflow-valid-50.csv
Wrote 3476 to stackoverflow-valid-60.csv
Wrote 3431 to stackoverflow-valid-70.csv
Wrote 3496 to stackoverflow-valid-80.csv
Wrote 3469 to stackoverflow-valid-90.csv
Wrote 3354 to stackoverflow-valid-100.csv
Wrote 3352 to stackoverflow-test-10.csv
Wrote 3476 to stackoverflow-test-20.csv
Wrote 3439 to stackoverflow-test-30.csv
Wrote 3408 to stackoverflow-test-40.csv
Wrote 35

In [11]:
%%bash
# Clear Cloud Storage bucket and copy the CSV files to Cloud Storage bucket
echo $BUCKET
gsutil -m rm -rf gs://${BUCKET}/${MODEL_NAME}/dataset/
gsutil -m cp ${PWD}/*.csv gs://${BUCKET}/${MODEL_NAME}/dataset/

qwiklabs-gcp-1a8cf323172d8441


CommandException: 1 files/objects could not be removed.
Copying file:///content/datalab/notebooks/qwiklabs-gcp-1a8cf323172d8441/stackoverflow-test-30.csv [Content-Type=text/csv]...
Copying file:///content/datalab/notebooks/qwiklabs-gcp-1a8cf323172d8441/stackoverflow-test-20.csv [Content-Type=text/csv]...
/ [0 files][    0.0 B/417.6 KiB]                                                / [0 files][    0.0 B/838.4 KiB]                                                Copying file:///content/datalab/notebooks/qwiklabs-gcp-1a8cf323172d8441/stackoverflow-test-10.csv [Content-Type=text/csv]...
/ [0 files][    0.0 B/  1.2 MiB]                                                Copying file:///content/datalab/notebooks/qwiklabs-gcp-1a8cf323172d8441/stackoverflow-test-100.csv [Content-Type=text/csv]...
Copying file:///content/datalab/notebooks/qwiklabs-gcp-1a8cf323172d8441/stackoverflow-test-40.csv [Content-Type=text/csv]...
/ [0 files][    0.0 B/  1.6 MiB]                                           

In [12]:
# Ensure that we have TensorFlow 1.13.1 installed.
!pip3 freeze | grep tensorflow==1.13.1 || pip3 install tensorflow==1.13.1

Collecting tensorflow==1.13.1
[?25l  Downloading https://files.pythonhosted.org/packages/ca/f2/0931c194bb98398017d52c94ee30e5e1a4082ab6af76e204856ff1fdb33e/tensorflow-1.13.1-cp35-cp35m-manylinux1_x86_64.whl (92.5MB)
[K    100% |████████████████████████████████| 92.5MB 322kB/s eta 0:00:01  0% |▎                               | 737kB 36.4MB/s eta 0:00:03    4% |█▎                              | 3.8MB 35.3MB/s eta 0:00:03    7% |██▍                             | 6.7MB 34.9MB/s eta 0:00:03    21% |██████▉                         | 19.9MB 27.7MB/s eta 0:00:03    28% |█████████                       | 25.9MB 13.9MB/s eta 0:00:05    29% |█████████▌                      | 27.5MB 12.8MB/s eta 0:00:06    36% |███████████▋                    | 33.6MB 30.0MB/s eta 0:00:02    37% |████████████                    | 34.9MB 29.6MB/s eta 0:00:02    39% |████████████▌                   | 36.2MB 29.7MB/s eta 0:00:02    40% |█████████████                   | 37.6MB 29.8MB/s eta 0:00:02    47% |█████████

#### Monitor using Tensorboard

In [13]:
from google.datalab.ml import TensorBoard
TensorBoard().start('./'+ TRAINING_DIR)

  from ._conv import register_converters as _register_converters


6600

In [24]:
%%bash
# Setup python so it sees the task module which controls the model.py
export PYTHONPATH=${PYTHONPATH}:${PWD}/${MODEL_NAME}
# Currently set for python 2.  To run with python 3 
#    1.  Replace 'python' with 'python3' in the following command
#    2.  Edit trainer/task.py to reflect proper module import method 
python -m trainer.task \
   --train_data_paths=${PWD} \
   --eval_data_paths=${PWD}  \
   --output_dir=${PWD}/${TRAINING_DIR} \
   --train_steps=10 --job-dir=./tmp

1.8.0


  from ._conv import register_converters as _register_converters
INFO:tensorflow:Using config: {'_save_checkpoints_secs': None, '_session_config': None, '_keep_checkpoint_max': 5, '_tf_random_seed': None, '_task_type': 'worker', '_train_distribute': None, '_is_chief': True, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7f355f783950>, '_model_dir': '/content/datalab/notebooks/qwiklabs-gcp-1a8cf323172d8441/stackoverflow_trained', '_num_worker_replicas': 1, '_task_id': 0, '_log_step_count_steps': 100, '_master': '', '_save_checkpoints_steps': 1000, '_keep_checkpoint_every_n_hours': 10000, '_evaluation_master': '', '_service': None, '_global_id_in_cluster': 0, '_save_summary_steps': 100, '_num_ps_replicas': 0}
INFO:tensorflow:Running training and evaluation locally (non-distributed).
INFO:tensorflow:Start train and evaluate loop. The evaluate will happen after 300 secs (eval_spec.throttle_secs) or training is finished.
INFO:tensorflow:Skipping training sin

In [25]:
%%bash
ls $PWD/${TRAINING_DIR}/export/exporter/

1558458773
1558458937
1558459037


In [28]:
%%writefile ./test.json
{"answer_count": 2, "comment_count": 2, "favorite_count": 1, "score": 2, "view_count": 2, "days_posted": 10}

Overwriting ./test.json


In [29]:
%%bash
# This model dir is the model exported after training and is used for prediction
#
model_dir=$(ls ${PWD}/${TRAINING_DIR}/export/exporter | tail -1)
# predict using the trained model
gcloud ml-engine local predict  \
    --model-dir=${PWD}/${TRAINING_DIR}/export/exporter/${model_dir} \
    --json-instances=./test.json

CLASS_IDS  CLASSES  LOGISTIC             LOGITS               PROBABILITIES
[1]        [u'1']   [0.919774055480957]  [2.439281702041626]  [0.08022589981555939, 0.919774055480957]


  from ._conv import register_converters as _register_converters
2019-05-21 17:25:52.837396: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA



#### Stop Tensorboard
The training directory will be deleted.  Stop the exising tensorboard before removing the directory its using.

In [30]:
pids_df = TensorBoard.list()
if not pids_df.empty:
    for pid in pids_df['pid']:
        TensorBoard().stop(pid)
        print('Stopped TensorBoard with pid {}'.format(pid))

Stopped TensorBoard with pid 6600


#### Clean model training dir/output dir

In [31]:
%%bash
# This is so that the trained model is started fresh each time. However, this needs to be done before 
# tensorboard is started
rm -rf $PWD/${TRAINING_DIR}

#### Restart tensorboard for monitoring

In [32]:
TensorBoard().start('./'+ TRAINING_DIR)

9273

## Running locally using gcloud

In [33]:
%%bash
# Use Cloud Machine Learning Engine to train the model in local file system
gcloud ml-engine local train \
   --module-name=trainer.task \
   --package-path=${PWD}/${MODEL_NAME}/trainer \
   -- \
   --train_data_paths=${PWD} \
   --eval_data_paths=${PWD}  \
   --train_steps=1000 \
   --output_dir=${PWD}/${TRAINING_DIR} 

1.8.0
[512]


  from ._conv import register_converters as _register_converters
INFO:tensorflow:TF_CONFIG environment variable: {u'environment': u'cloud', u'cluster': {}, u'job': {u'args': [u'--train_data_paths=/content/datalab/notebooks/qwiklabs-gcp-1a8cf323172d8441', u'--eval_data_paths=/content/datalab/notebooks/qwiklabs-gcp-1a8cf323172d8441', u'--train_steps=1000', u'--output_dir=/content/datalab/notebooks/qwiklabs-gcp-1a8cf323172d8441/stackoverflow_trained'], u'job_name': u'trainer.task'}, u'task': {}}
INFO:tensorflow:Using config: {'_save_checkpoints_secs': None, '_session_config': None, '_keep_checkpoint_max': 5, '_tf_random_seed': None, '_task_type': 'worker', '_train_distribute': None, '_is_chief': True, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7f91285e71d0>, '_model_dir': '/content/datalab/notebooks/qwiklabs-gcp-1a8cf323172d8441/stackoverflow_trained', '_num_worker_replicas': 1, '_task_id': 0, '_log_step_count_steps': 100, '_master': '', '_save_checkpo

Use TensorBoard to examine results.  When I ran it (due to random seeds, your results will be different), the ```average_loss``` (Mean Squared Error) on the evaluation dataset was 187, meaning that the RMSE was around 13.

### Stop Tensorboard

In [34]:
pids_df = TensorBoard.list()
if not pids_df.empty:
    for pid in pids_df['pid']:
        TensorBoard().stop(pid)
        print('Stopped TensorBoard with pid {}'.format(pid))

Stopped TensorBoard with pid 9273


If the above step (to stop TensorBoard) appears stalled, just move on to the next step. You don't need to wait for it to return.

In [35]:
%%bash
ls $PWD/${TRAINING_DIR}

checkpoint
eval
events.out.tfevents.1558459686.9a7f7741b2b1
export
graph.pbtxt
model.ckpt-1000.data-00000-of-00001
model.ckpt-1000.index
model.ckpt-1000.meta
model.ckpt-1.data-00000-of-00001
model.ckpt-1.index
model.ckpt-1.meta


## Submit training job using gcloud

First copy the training data to the cloud.  Then, launch a training job.

After you submit the job, go to the cloud console (http://console.cloud.google.com) and select <b>Machine Learning | Jobs</b> to monitor progress.  

<b>Note:</b> Don't be concerned if the notebook stalls (with a blue progress bar) or returns with an error about being unable to refresh auth tokens. This is a long-lived Cloud job and work is going on in the cloud.  Use the Cloud Console link (above) to monitor the job.

In [None]:
%%bash
# Clear Cloud Storage bucket and copy the CSV files to Cloud Storage bucket
echo $BUCKET
gsutil -m rm -rf gs://${BUCKET}/${MODEL_NAME}/dataset/
gsutil -m cp ${PWD}/*.csv gs://${BUCKET}/${MODEL_NAME}/dataset/

In [36]:
%%bash
OUTDIR=gs://${BUCKET}/${MODEL_NAME}/dataset/${TRAINING_DIR}
JOBNAME=${MODEL_NAME}_$(date -u +%y%m%d_%H%M%S)
echo $OUTDIR $REGION $JOBNAME
# Clear the Cloud Storage Bucket used for the training job
gsutil -m rm -rf $OUTDIR
gcloud ml-engine jobs submit training $JOBNAME \
   --region=$REGION \
   --module-name=trainer.task \
   --package-path=${PWD}/${MODEL_NAME}/trainer \
   --job-dir=$OUTDIR \
   --staging-bucket=gs://$BUCKET \
   --scale-tier=BASIC \
   --runtime-version=$TFVERSION \
   -- \
   --train_data_paths="gs://${BUCKET}/${MODEL_NAME}/dataset" \
   --eval_data_paths="gs://${BUCKET}/${MODEL_NAME}/dataset"  \
   --output_dir=$OUTDIR \
   --train_steps=10000

gs://qwiklabs-gcp-1a8cf323172d8441/stackoverflow/dataset/stackoverflow_trained us-central1 stackoverflow_190521_173250


CommandException: 1 files/objects could not be removed.
ERROR: (gcloud.ml-engine.jobs.submit.training) INVALID_ARGUMENT: Field: runtime_version Error: The specified runtime version '1.13.1' with the Python version '' is not supported or is deprecated.  Please specify a different runtime version. See https://cloud.google.com/ml-engine/docs/runtime-version-list for a list of supported versions
- '@type': type.googleapis.com/google.rpc.BadRequest
  fieldViolations:
  - description: The specified runtime version '1.13.1' with the Python version ''
      is not supported or is deprecated.  Please specify a different runtime version.
      See https://cloud.google.com/ml-engine/docs/runtime-version-list for a list
      of supported versions
    field: runtime_version


Don't be concerned if the notebook appears stalled (with a blue progress bar) or returns with an error about being unable to refresh auth tokens. This is a long-lived Cloud job and work is going on in the cloud. 

<b>Use the Cloud Console link to monitor the job and do NOT proceed until the job is done.</b>

## Deploy model

Find out the actual name of the subdirectory where the model is stored and use it to deploy the model.  Deploying model will take up to <b>5 minutes</b>.

In [None]:
%%bash
gsutil ls gs://${BUCKET}/${MODEL_NAME}/smallinput/${TRAINING_DIR}/export/exporter

#### Deploy model : step 1 - remove version info 
Before an existing cloud model can be removed, it must have any version info removed.  If an existing model does not exist, this command will generate an error but that is ok.

In [None]:
%%bash
MODEL_LOCATION=$(gsutil ls gs://${BUCKET}/${MODEL_NAME}/smallinput/${TRAINING_DIR}/export/exporter | tail -1)

echo "MODEL_LOCATION = ${MODEL_LOCATION}"

gcloud ml-engine versions delete ${MODEL_VERSION} --model ${MODEL_NAME}

#### Deploy model: step 2 - remove existing model
Now that the version info is removed from an existing model, the actual model can be removed.  If an existing model is not deployed, this command will generate an error but that is ok.  It just means the model with the given name is not deployed.

In [None]:
%%bash
gcloud ml-engine models delete ${MODEL_NAME}

#### Deploy model: step 3 - deploy new model

In [None]:
%%bash
gcloud ml-engine models create ${MODEL_NAME} --regions $REGION

#### Deploy model: step 4 - add version info to the new model

In [None]:
%%bash
MODEL_LOCATION=$(gsutil ls gs://${BUCKET}/${MODEL_NAME}/smallinput/${TRAINING_DIR}/export/exporter | tail -1)

echo "MODEL_LOCATION = ${MODEL_LOCATION}"

gcloud ml-engine versions create ${MODEL_VERSION} --model ${MODEL_NAME} --origin ${MODEL_LOCATION} --runtime-version $TFVERSION

## Prediction

In [None]:
%%bash
gcloud ml-engine predict --model=${MODEL_NAME} --version=${MODEL_VERSION} --json-instances=./test.json

In [None]:
from googleapiclient import discovery
from oauth2client.client import GoogleCredentials
import json

credentials = GoogleCredentials.get_application_default()
api = discovery.build('ml', 'v1', credentials=credentials,
            discoveryServiceUrl='https://storage.googleapis.com/cloud-ml/discovery/ml_v1_discovery.json')

request_data = {'instances':
  [
      {
        'pickuplon': -73.885262,
        'pickuplat': 40.773008,
        'dropofflon': -73.987232,
        'dropofflat': 40.732403,
        'passengers': 2,
      }
  ]
}

parent = 'projects/%s/models/%s/versions/%s' % (PROJECT, MODEL_NAME, MODEL_VERSION)
response = api.projects().predict(body=request_data, name=parent).execute()
print "response={0}".format(response)

## Train on larger dataset

I have already followed the steps below and the files are already available. <b> You don't need to do the steps in this comment. </b> In the next chapter (on feature engineering), we will avoid all this manual processing by using Cloud Dataflow.

Go to http://bigquery.cloud.google.com/ and type the query:
<pre>
SELECT
  (tolls_amount + fare_amount) AS fare_amount,
  pickup_longitude AS pickuplon,
  pickup_latitude AS pickuplat,
  dropoff_longitude AS dropofflon,
  dropoff_latitude AS dropofflat,
  passenger_count*1.0 AS passengers,
  'nokeyindata' AS key
FROM
  [nyc-tlc:yellow.trips]
WHERE
  trip_distance > 0
  AND fare_amount >= 2.5
  AND pickup_longitude > -78
  AND pickup_longitude < -70
  AND dropoff_longitude > -78
  AND dropoff_longitude < -70
  AND pickup_latitude > 37
  AND pickup_latitude < 45
  AND dropoff_latitude > 37
  AND dropoff_latitude < 45
  AND passenger_count > 0
  AND ABS(HASH(pickup_datetime)) % 1000 == 1
</pre>

Note that this is now 1,000,000 rows (i.e. 100x the original dataset).  Export this to CSV using the following steps (Note that <b>I have already done this and made the resulting GCS data publicly available</b>, so you don't need to do it.):
<ol>
<li> Click on the "Save As Table" button and note down the name of the dataset and table.
<li> On the BigQuery console, find the newly exported table in the left-hand-side menu, and click on the name.
<li> Click on "Export Table"
<li> Supply your bucket name and give it the name train.csv (for example: gs://cloud-training-demos-ml/taxifare/ch3/train.csv). Note down what this is.  Wait for the job to finish (look at the "Job History" on the left-hand-side menu)
<li> In the query above, change the final "== 1" to "== 2" and export this to Cloud Storage as valid.csv (e.g.  gs://cloud-training-demos-ml/taxifare/ch3/valid.csv)
<li> Download the two files, remove the header line and upload it back to GCS.
</ol>

<p/>
<p/>

## Run Cloud training on 1-million row dataset

This took 60 minutes and uses as input 1-million rows.  The model is exactly the same as above. The only changes are to the input (to use the larger dataset) and to the Cloud MLE tier (to use STANDARD_1 instead of BASIC -- STANDARD_1 is approximately 10x more powerful than BASIC).  At the end of the training the loss was 32, but the RMSE (calculated on the validation dataset) was stubbornly at 9.03. So, simply adding more data doesn't help.

In [None]:
%%bash

XXXXX  this takes 60 minutes. if you are sure you want to run it, then remove this line.

OUTDIR=gs://${BUCKET}/${MODEL_NAME}/${TRAINING_DIR}
JOBNAME=${MODEL_NAME}_$(date -u +%y%m%d_%H%M%S)
CRS_BUCKET=cloud-training-demos # use the already exported data
echo $OUTDIR $REGION $JOBNAME
gsutil -m rm -rf $OUTDIR
gcloud ml-engine jobs submit training $JOBNAME \
   --region=$REGION \
   --module-name=trainer.task \
   --package-path=${PWD}/${MODEL_NAME}/trainer \
   --job-dir=$OUTDIR \
   --staging-bucket=gs://$BUCKET \
   --scale-tier=STANDARD_1 \
   --runtime-version=$TFVERSION \
   -- \
   --train_data_paths="gs://${CRS_BUCKET}/${MODEL_NAME}/ch3/train.csv" \
   --eval_data_paths="gs://${CRS_BUCKET}/${MODEL_NAME}/ch3/valid.csv"  \
   --output_dir=$OUTDIR \
   --train_steps=100000   

## Challenge Exercise

Modify your solution to the challenge exercise in d_trainandevaluate.ipynb appropriately. Make sure that you implement training and deployment. Increase the size of your dataset by 10x since you are running on the cloud. Does your accuracy improve?

### Clean-up

#### Delete Model : step 1 - remove version info 
Before an existing cloud model can be removed, it must have any version info removed.  

In [None]:
%%bash
gcloud ml-engine versions delete ${MODEL_VERSION} --model ${MODEL_NAME}

#### Delete model: step 2 - remove existing model
Now that the version info is removed from an existing model, the actual model can be removed.  

In [None]:
%%bash
gcloud ml-engine models delete ${MODEL_NAME}

Copyright 2016 Google Inc. Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License