# Introducing Cloud Machine Learning Engine (CMLE)
**Learning Objectives:**
  - Learn how to make code compatible with CMLE
  - Train your model using cloud infrastructure via CMLE
  - Deploy your model behind a production grade REST API using CMLE

In this notebook we'll make the jump from training and predicting locally, to do doing both in the cloud. We'll take advantage of Google Cloud's [Cloud Machine Learning Engine](https://cloud.google.com/ml-engine/). 

CMLE is a managed service that allows the training and deployment of ML models without having to provision or maintain servers. The infrastructure is handles seamless by the managed service for us.

## 1) Make Code Compatible with CMLE
In order to make our code compatible with CMLE we need to make the following changes:

1. Upload data to Google Cloud Storage 
2. Move code into a Python package
3. Modify code to read data from and write checkpoint files to GCS 

### 1.1 Upload Data to Google Cloud Storage (GCS)

Cloud services don't have access to our local files, so we need to upload them to a location the Cloud servers can read from. In this case we'll use GCS.

Specify your project name and bucket name in the cell below.

In [2]:
PROJECT='cloud-training-demos' # CHANGE TO YOUR PROJECT
BUCKET = PROJECT # optionally change, a GCS bucket with this name will be created 
REGION = 'us-central1' # optionally change, see https://cloud.google.com/ml-engine/docs/tensorflow/regions
TFVERSION = '1.12' # TF version for CMLE to use

Jupyter allows the subsitution of python variables into bash commands when using the `!<cmd>` format.
It is also possible using the `%%bash` magic but requires an [additional parameter](https://stackoverflow.com/questions/19579546/can-i-access-python-variables-within-a-bash-or-script-ipython-notebook-c). 

In [None]:
!gcloud config set project {PROJECT}
!gsutil mb -l {REGION} gs://{BUCKET}
!gsutil -m cp *.csv gs://{BUCKET}/taxifare/smallinput/

### 1.2 Move code into a Python package

When you execute a CMLE training job, the service zips up your code and ships it to the Cloud so it can be run on Cloud infrastructure. In order to do this CMLE requires your code to be a Python package.

A Python package is simply a collection of one or more `.py` files along with an `__init__.py` file to identify the containing directory as a package. The `__init__.py` sometimes contains initialization code but for our purposes an empty file suffices.

#### 1.2.1 Create Package Directory and \_\_init\_\_.py

The bash command `touch` creates an empty file in the specified location.

In [3]:
%%bash
mkdir taxifaremodel
touch taxifaremodel/__init__.py

mkdir: cannot create directory ‘taxifaremodel’: File exists


#### 1.2.2 Paste existing code into model.py

A Python package requires our code to be in a `.py` file, as opposed to notebook cells. So we simply copy and paste our existing code for the previous notebook into a single file.

the `%%writefile` magic writes the contents of its cell to disk with the specified name.

In [4]:
%%writefile taxifaremodel/model.py
import tensorflow as tf
import shutil
print(tf.__version__)

#1. Train and Evaluate Input Functions
CSV_COLUMN_NAMES = ['fare_amount','dayofweek','hourofday','pickuplon','pickuplat','dropofflon','dropofflat','passengers']
CSV_DEFAULTS = [[0.0],[1],[0],[-74.0], [40.0], [-74.0], [40.7], [1]]

def read_dataset(csv_path):
    def _parse_row(row):
        # Decode the CSV row into list of TF tensors
        fields = tf.decode_csv(row, record_defaults=CSV_DEFAULTS)

        # Pack the result into a dictionary
        features = dict(zip(CSV_COLUMN_NAMES, fields))
        
        # Separate the label from the features
        label = features.pop('fare_amount') # remove label from features and store

        return features, label
    
    # Create a dataset containing the text lines.
    dataset = tf.data.Dataset.list_files(csv_path) # (i.e. data_file_*.csv)
    dataset = dataset.flat_map(lambda filename:tf.data.TextLineDataset(filename).skip(1))

    # Parse each CSV row into correct (features,label) format for Estimator API
    dataset = dataset.map(_parse_row)
    
    return dataset

def train_input_fn(csv_path, batch_size=128):
    #1. Convert CSV into tf.data.Dataset  with (features,label) format
    dataset = read_dataset(csv_path)
      
    #2. Shuffle, repeat, and batch the examples.
    dataset = dataset.shuffle(1000).repeat().batch(batch_size)
   
    return dataset

def eval_input_fn(csv_path, batch_size=128):
    #1. Convert CSV into tf.data.Dataset  with (features,label) format
    dataset = read_dataset(csv_path)

    #2.Batch the examples.
    dataset = dataset.batch(batch_size)
   
    return dataset
  
#2. Feature Columns
FEATURE_NAMES = CSV_COLUMN_NAMES[1:] # all but first column
feature_cols = [tf.feature_column.numeric_column(k) for k in FEATURE_NAMES]

#3. Serving Input Receiver Function
def serving_input_receiver_fn():
    receiver_tensors = {
        'dayofweek' : tf.placeholder(tf.int32, shape=[None]), # shape is vector to allow batch of requests
        'hourofday' : tf.placeholder(tf.int32, shape=[None]),
        'pickuplon' : tf.placeholder(tf.float32, shape=[None]), 
        'pickuplat' : tf.placeholder(tf.float32, shape=[None]),
        'dropofflat' : tf.placeholder(tf.float32, shape=[None]),
        'dropofflon' : tf.placeholder(tf.float32, shape=[None]),
        'passengers' : tf.placeholder(tf.int32, shape=[None]),
    }
    # Note:
    # You would transform data here from the reiever format to the format expected
    # by your model, although in this case no transformation is needed.
    
    features = receiver_tensors # 'features' is what is passed on to the model
    return tf.estimator.export.ServingInputReceiver(features, receiver_tensors)
  
#4. Train and Evaluate
def train_and_evaluate(params):
  OUTDIR = params['output_dir']

  model = tf.estimator.DNNRegressor(
    hidden_units = [10,10], # specify neural architecture
    feature_columns = feature_cols, 
    model_dir = OUTDIR,
    config = tf.estimator.RunConfig(
          tf_random_seed=1, # for reproducibility
          save_checkpoints_steps=max(10,params['train_steps']//10) # checkpoint every N steps
    ) 
  )
  
  # Add custom evaluation metric
  def my_rmse(labels, predictions):
    pred_values = tf.squeeze(predictions['predictions'],axis=-1)
    return {'rmse': tf.metrics.root_mean_squared_error(labels, pred_values)}
  model = tf.contrib.estimator.add_metrics(model, my_rmse)  

  train_spec=tf.estimator.TrainSpec(
                     input_fn = lambda:train_input_fn(params['train_data_path']),
                     max_steps = params['train_steps'])

  exporter = tf.estimator.FinalExporter('exporter', serving_input_receiver_fn) # export SavedModel once at the end of training
  # Note: alternatively use tf.estimator.BestExporter to export at every checkpoint that has lower loss than the previous checkpoint


  eval_spec=tf.estimator.EvalSpec(
                     input_fn=lambda:eval_input_fn(params['eval_data_path']),
                     steps = None,
                     start_delay_secs=1, # wait at least N seconds before first evaluation (default 120)
                     throttle_secs=1, # wait at least N seconds before each subsequent evaluation (default 600)
                     exporters = exporter) # export SavedModel once at the end of training

  # Note:
  # We set start_delay_secs and throttle_secs to 1 because we want to evaluate after every checkpoint.
  # As long as checkpoints are > 1 sec apart this ensures the throttling never kicks in.

  tf.logging.set_verbosity(tf.logging.INFO) # so loss is printed during training
  shutil.rmtree(OUTDIR, ignore_errors = True) # start fresh each time

  tf.estimator.train_and_evaluate(model, train_spec, eval_spec)

Overwriting taxifaremodel/model.py


### 1.3 Modify code to read data from and write checkpoint files to GCS 

If you look closely above, you'll notice two changes to the code

1. The input function now supports reading a list of files matching a file name pattern instead of just a single CSV
  - This is useful because large datasets tend to exist in shards.
2. The train and evaluate portion is wrapped in a function that takes a parameter dictionary as an argument.
  - This is useful because the output directory, data paths and number of train steps will be different depending on whether we're training locally or in the cloud. Parametrizing allows us to use the same code for both.

We specify these parameters at run time via the command line. Which means we need to add code to parse command line parameters and invoke `train_and_evaluate()` with those params. This is the job of the `task.py` file. 

Exposing parameters to the command line also allows us to use CMLE's automatic hyperparameter tuning feature which we'll cover in a future lesson.

In [5]:
%%writefile taxifaremodel/task.py
import argparse
import json
import os

#import model # for python2
from . import model # for python3

if __name__ == '__main__':
    parser = argparse.ArgumentParser()
    parser.add_argument(
        '--train_data_path',
        help = 'GCS or local path to training data',
        required = True
    )
    parser.add_argument(
        '--train_steps',
        help = 'Steps to run the training job for (default: 1000)',
        type = int,
        default = 1000
    )
    parser.add_argument(
        '--eval_data_path',
        help = 'GCS or local path to evaluation data',
        required = True
    )
    parser.add_argument(
        '--output_dir',
        help = 'GCS location to write checkpoints and export models',
        required = True
    )
    parser.add_argument(
        '--job-dir',
        help='This is not used by our model, but it is required by gcloud',
    )
    args = parser.parse_args().__dict__

    # Run the training job
    model.train_and_evaluate(args)

Overwriting taxifaremodel/task.py


## 2) Train Using CMLE (Local)

CMLE comes with a local test tool ([`gcloud ml-engine local train`](https://cloud.google.com/sdk/gcloud/reference/ml-engine/local/train)) to ensure we've packaged our code directly. It's best to first run that for a few steps before trying a Cloud job. 

The arguments before `-- \` are for CMLE
- package-path: speficies the location of the Python package
- module-name: specifies which `.py` file should be run within the package. `task.py` is our entry point so we specify that

The arguments after `-- \` are sent to our `task.py`.

In [6]:
%%time
!gcloud ml-engine local train \
--package-path=taxifaremodel \
--module-name=taxifaremodel.task \
-- \
--train_data_path=taxi-train.csv \
--eval_data_path=taxi-valid.csv  \
--train_steps=5 \
--output_dir=taxi_trained 

  from ._conv import register_converters as _register_converters
1.12.0
INFO:tensorflow:Not using Distribute Coordinator.
INFO:tensorflow:Running training and evaluation locally (non-distributed).
INFO:tensorflow:Start train and evaluate loop. The evaluate will happen after every checkpoint. Checkpoint frequency is determined based on RunConfig arguments: save_checkpoints_steps 10 or save_checkpoints_secs None.
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Graph was finalized.
2019-01-06 16:24:05.525939: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Saving checkpoints for 0 into taxi_trained/model.ckpt.
INFO:tensorflow:loss = 51218.3

## 3) Train using CMLE (Cloud)

To submit to the Cloud we use [`gcloud ml-engine jobs submit training [jobname]`](https://cloud.google.com/sdk/gcloud/reference/ml-engine/jobs/submit/training) and simply specify some additional parameters for CMLE:
- jobname: A unique identifier for the Cloud job. We usually append system time to ensure uniqueness
- job-dir: A GCS location to upload the Python package to
- runtime-version: Version of TF to use. Defaults to 1.0 if not specified
- python-version: Version of Python to use. Defaults to 2.7 if not specified
- region: Cloud region to train in. See [here](https://cloud.google.com/ml-engine/docs/tensorflow/regions) for supported CMLE regions

Below the `-- \` note how we've changed our `task.py` args to be GCS locations

In [None]:
from google.datalab.ml import TensorBoard
OUTDIR='gs://{}/taxifare/trained_small'.format(BUCKET)
TensorBoard().start(OUTDIR) # TensorBoard supports gs:// URLs

In [None]:
!gsutil -m rm -rf {OUTDIR} # start fresh each time
!gcloud ml-engine jobs submit training taxifare_$(date -u +%y%m%d_%H%M%S) \
   --package-path=taxifaremodel \
   --module-name=taxifaremodel.task \
   --job-dir=gs://{BUCKET}/taxifare \
   --python-version=3.5 \
   --runtime-version={TFVERSION} \
   --region={REGION} \
   -- \
   --train_data_path=gs://{BUCKET}/taxifare/smallinput/taxi-train.csv \
   --eval_data_path=gs://{BUCKET}/taxifare/smallinput/taxi-valid.csv  \
   --train_steps=1000 \
   --output_dir={OUTDIR}

You can track your job and view logs using [cloud console](https://console.cloud.google.com/mlengine/jobs). It will take 5-10 minutes to complete. **Wait till the job finishes before moving on.**

## 4) Deploy model

Now let's take our exported SavedModel and deploy it behind a REST API. To do so we'll use CMLE's managed TF Serving feature which auto-scales based on load.

In [4]:
!gsutil ls gs://{BUCKET}/taxifare/trained_small/export/exporter

gs://vijays-sandbox/taxifare/trained_small/export/exporter/
gs://vijays-sandbox/taxifare/trained_small/export/exporter/1546792089/


CMLE uses a model versioning system. First you create a model folder, and within the folder you create versions of the model. 

Note: You will see an error below if the model folder already exists, it is safe to ignore

In [None]:
VERSION='v1'
!gcloud ml-engine models create taxifare --regions us-central1
!gcloud ml-engine versions delete {VERSION} --model taxifare --quiet
!gcloud ml-engine versions create {VERSION} --model taxifare \
  --origin $(gsutil ls gs://{BUCKET}/taxifare/trained_small/export/exporter | tail -1) \
  --python-version=3.5 \
  --runtime-version {TFVERSION}

## 5) Online Prediction

Now that we have deployed our model behind a production grade REST API, we can invoke it remotely. 

We could invoke it directly calling the REST API with an HTTP POST request [reference docs](https://cloud.google.com/ml-engine/reference/rest/v1/projects/predict), however CMLE provides an easy way to invoke it via command line.

### 5.1 Invoke Prediction REST API via Command Line
First we write our prediction requests to file in json format

In [6]:
%%writefile ./test.json
{"dayofweek":1,"hourofday":0,"pickuplon": -73.885262,"pickuplat": 40.773008,"dropofflon": -73.987232,"dropofflat": 40.732403,"passengers": 2}

Overwriting ./test.json


Then we use [`gcloud ml-engine predict`](https://cloud.google.com/sdk/gcloud/reference/ml-engine/predict) and specify the model name and location of the json file. Since we don't explicitly specify `--version`, the default model version will be used. 

Since we only have one version it is already the default, but if we had multiple model versions we can designate the default using [`gcloud ml-engine versions set-default`](https://cloud.google.com/sdk/gcloud/reference/ml-engine/versions/set-default) or using [cloud console](https://pantheon.corp.google.com/mlengine/models)

In [7]:
!gcloud ml-engine predict --model=taxifare --json-instances=./test.json

PREDICTIONS
[8.362519264221191]


### 5.2 Invoke Prediction REST API via Python

In [8]:
from googleapiclient import discovery
from oauth2client.client import GoogleCredentials
import json

credentials = GoogleCredentials.get_application_default()
api = discovery.build('ml', 'v1', credentials=credentials,
            discoveryServiceUrl='https://storage.googleapis.com/cloud-ml/discovery/ml_v1_discovery.json')

request_data = {'instances':
  [
      {
        'dayofweek':1,
        'hourofday':0,
        'pickuplon': -73.885262,
        'pickuplat': 40.773008,
        'dropofflon': -73.987232,
        'dropofflat': 40.732403,
        'passengers': 2,
      }
  ]
}

parent = 'projects/{}/models/taxifare'.format(PROJECT) # use default version
#parent = 'projects/{}/models/taxifare/versions/{}'.format(PROJECT,VERSION) # specify version

response = api.projects().predict(body=request_data, name=parent).execute()
print("response={0}".format(response))

response={'predictions': [{'predictions': [8.362519264221191]}]}


## 6) Cleanup

In [None]:
if len(TensorBoard.list())>0:
  [TensorBoard().stop(pid)for pid in TensorBoard.list()['pid']]
else: print('No TensorBoard instances to stop')

## Challenge Exercise

Modify your solution to the challenge exercise in d_trainandevaluate.ipynb appropriately. Make sure that you implement training and deployment. Increase the size of your dataset by 10x since you are running on the cloud. Does your accuracy improve?

Copyright 2016 Google Inc. Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License