<h1> Deploying a TensorFlow model to Cloud MLE</h1>

In this notebook, you will take a TensorFlow model that was trained using Cloud MLE and turn it into an Application Programming Interface (API) deployed to and hosted by Cloud MLE engine. This will ensure that your model is available to developers worldwide. You will also review how Cloud MLE monitors access to your model API so you can measure its popularity and performance.



---
Before you start, **make sure that you are logged in with your student account**. Otherwise you may incur Google Cloud charges for using this notebook. 

---


Also, remember to uncheck "Reset all runtimes before running" when executing the next cell.

Reseting the runtime will delete any files you may have on your notebook file system. 

![](https://i.imgur.com/9dgw0h0.png)


In [0]:
#@markdown Copy-paste your GCP Project ID in the following field:

PROJECT = "" #@param {type: "string"}

#@markdown Next, use Shift-Enter to run this cell and complete authentication.

try:  
  from google.colab import auth
  auth.authenticate_user()  
  print("AUTHENTICATED")
except:
  print("FAILED to authenticate")

#Modify the following to use a different bucket and/or region
#for Google Cloud Storage and for Cloud MLE
BUCKET = PROJECT  
REGION = "us-central1"  

# Copy taxi-*.csv files from github if they are missing from the runtime.
!wget --quiet -nc https://github.com/osipov/training-data-analyst/raw/master/bootcamps/serverless_ml/taxi-11k-datasets.zip
!unzip -q -n taxi-11k-datasets.zip 

In [0]:
# for bash
import os
os.environ['PROJECT'] = PROJECT
os.environ['BUCKET'] = BUCKET
os.environ['REGION'] = REGION
os.environ['TF_VERSION'] = '1.12'  # Tensorflow version

In [0]:
%%bash
gcloud config set project $PROJECT
gcloud config set compute/region $REGION

<h1>Use TensorBoard to monitor the model located on Google Cloud Storage</h1>

Recall that Cloud MLE saves a trained model as a collection of checkpoint files on in a Google Cloud Storage bucket. Open the [Jobs](https://console.cloud.google.com/mlengine/jobs) section of the MLE user interface to confirm that the training job is done or is close to being done. You can't deploy a model as an API util MLE finishes training. 

Run the next cell to start TensorBoard and to confirm that the model was trained as expected.

In [0]:
!pip install tensorboard==1.13.0
%load_ext tensorboard.notebook 

In [0]:
OUTDIR="gs://{}/taxifare/11k/taxi_trained".format(BUCKET)

#@markdown Once the TensorBoard comes up, put this cell in focus, click on the vertical ellipsis in the upper right of this cell, and choose view output full screen.
%tensorboard --logdir $OUTDIR

If you look closely and compare the global steps / second metric for Cloud MLE against what you have seen with Colab, you will notice that Cloud MLE is about half as fast as Colab. This is because you were using the BASIC scale tier in Cloud MLE during training. When running in production, it is easy to modify the scale tier and allocate a larger server or even a server cluster to train your model. The custom training clusters available in Cloud MLE can be more powerful than individual CPUs, GPUs, or even TPUs training in Colab.

<h2> Deploy the model </h2>

List the files in the storage bucket subdirectory where the trained model was saved and use it to deploy the model. 

In [0]:
%%bash
gsutil ls gs://${BUCKET}/taxifare/11k/taxi_trained/export/exporter

Set the environment variables using `os.environ` so that you can reuse them across multiple bash code cells.

In [0]:
os.environ['MODEL_NAME'] = "taxifare"
os.environ['MODEL_VERSION'] = "v1"

Use `gcloud` to deploy the model. This involves creating a deployed model as a model version in Cloud MLE. Keep in mind that deploying a model will take up to <b>5 minutes</b>.

In [0]:
%%bash
MODEL_LOCATION=$(gsutil ls gs://${BUCKET}/taxifare/11k/taxi_trained/export/exporter | tail -1)

echo "Run these commands one-by-one (the very first time, you'll create a model and then create a version)"
#gcloud ml-engine versions delete ${MODEL_VERSION} --model ${MODEL_NAME}
#gcloud ml-engine models delete ${MODEL_NAME}
gcloud ml-engine models create ${MODEL_NAME} --regions $REGION
gcloud ml-engine versions create ${MODEL_VERSION} --model ${MODEL_NAME} --origin ${MODEL_LOCATION} --runtime-version ${TF_VERSION}

While the previous cell is still running, you can monitor the deployment progress in the [Models](https://console.cloud.google.com/mlengine/models) section of the Cloud MLE dashboard. You will need to click on `taxifare` once it becomes available in the list of the models. After the deployment finishes, you will be able to open the `v1` version of the taxifare model.

<h3>Do not proceed until you can confirm that the last cell finished deploying the model</h3>

<h2> Prediction </h2>

Let's create a local file to test out the deployed model. 

In [0]:
%%writefile ./test.json
{"pickuplon": -73.885262,"pickuplat": 40.773008,"dropofflon": -73.987232,"dropofflat": 40.732403,"passengers": 2}

Start by getting a prediction from your model using the `gcloud` command:

In [0]:
%%bash
gcloud ml-engine predict --model=${MODEL_NAME} --version=${MODEL_VERSION} --json-instances=./test.json

Once you can see that the API returned a prediction, recall that that `gcloud` command requires the Google Cloud SDK. This is too complex for many applications. Instead, you can give application developers the code snippet from the next cell. Using this code, they will be able to use your model API from any Python program. Try it yourself:

In [0]:
from googleapiclient import discovery
from oauth2client.client import GoogleCredentials
import json

credentials = GoogleCredentials.get_application_default()
api = discovery.build('ml', 'v1', credentials=credentials,
            discoveryServiceUrl='https://storage.googleapis.com/cloud-ml/discovery/ml_v1_discovery.json')

request_data = {'instances':
  [
      {
        'pickuplon': -73.885262,
        'pickuplat': 40.773008,
        'dropofflon': -73.987232,
        'dropofflat': 40.732403,
        'passengers': 2,
      }
  ]
}

parent = 'projects/%s/models/%s/versions/%s' % (PROJECT, 'taxifare', 'v1')
response = api.projects().predict(body=request_data, name=parent).execute()
print "response={0}".format(response)

Now, if you return to the [Cloud MLE](https://console.cloud.google.com/mlengine/models/taxifare/versions/v1) user interface for the `taxifare/v1`, you should be able to monitor various metrics gathered by Cloud MLE, including number of predictions/second, model response latency, and others.

<h2>Recap</h2>

In this notebook, you used TensorBoard to confirm that a Cloud MLE training job started in an earlier lab  finished training a model. You also checked that the model files have been saved to a Google Cloud Storage bucket. 

Next, you used the model files from the bucket and deployed them as an API on Cloud MLE. After this, you were able to use the model hosted on Cloud MLE as an API to get back predictions from bash command line (using gcloud) and from a Python runtime.

Copyright 2019 Counter Factual .AI LLC. Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License