# Developing, Training, and Deploying a TensorFlow model on Google Cloud Platform (completely within Jupyter)


In Chapter 9 of [Data Science on the Google Cloud Platform](http://shop.oreilly.com/product/0636920057628.do), I trained a TensorFlow Estimator model to predict flight delays.

In this notebook, we'll modernize the workflow:
* Use eager mode for TensorFlow development
* Use tf.data to write the input pipeline
* Run the notebook as-is on Cloud using Deep Learning VM or Kubeflow pipelines
* Deploy the trained model to Cloud ML Engine as a web service

The combination of eager mode, tf.data and DLVM/KFP makes this workflow a lot easier.
We don't need to deal with Python packages or Docker containers.

In [2]:
# change these to try this notebook out
# In "production", these will be replaced by the parameters passed to papermill
BUCKET = 'cloud-training-demos-ml'
PROJECT = 'cloud-training-demos'
REGION = 'us-central1'
DEVELOP_MODE = True
NBUCKETS = 5

In [3]:
import os
os.environ['BUCKET'] = BUCKET
os.environ['PROJECT'] = PROJECT
os.environ['REGION'] = REGION

In [4]:
%%bash
gcloud config set project $PROJECT
gcloud config set compute/region $REGION

Updated property [core/project].
Updated property [compute/region].


## Creating the input data pipeline

In [5]:
DATA_BUCKET = "gs://cloud-training-demos/flights/chapter8/output/"
TRAIN_DATA_PATTERN = DATA_BUCKET + "train*"
VALID_DATA_PATTERN = DATA_BUCKET + "test*"

In [6]:
!gsutil ls $DATA_BUCKET

gs://cloud-training-demos/flights/chapter8/output/delays.csv
gs://cloud-training-demos/flights/chapter8/output/testFlights-00000-of-00007.csv
gs://cloud-training-demos/flights/chapter8/output/testFlights-00001-of-00007.csv
gs://cloud-training-demos/flights/chapter8/output/testFlights-00002-of-00007.csv
gs://cloud-training-demos/flights/chapter8/output/testFlights-00003-of-00007.csv
gs://cloud-training-demos/flights/chapter8/output/testFlights-00004-of-00007.csv
gs://cloud-training-demos/flights/chapter8/output/testFlights-00005-of-00007.csv
gs://cloud-training-demos/flights/chapter8/output/testFlights-00006-of-00007.csv
gs://cloud-training-demos/flights/chapter8/output/trainFlights-00000-of-00007.csv
gs://cloud-training-demos/flights/chapter8/output/trainFlights-00001-of-00007.csv
gs://cloud-training-demos/flights/chapter8/output/trainFlights-00002-of-00007.csv
gs://cloud-training-demos/flights/chapter8/output/trainFlights-00003-of-00007.csv
gs://cloud-training-demos/flights/chapter8/o

### Use tf.data to read the CSV files

In [11]:
import os, json, math
import numpy as np
import tensorflow as tf
print("Tensorflow version " + tf.__version__)

Tensorflow version 2.0.0-alpha0


In [31]:
CSV_COLUMNS  = ('ontime,dep_delay,taxiout,distance,avg_dep_delay,avg_arr_delay' + \
                ',carrier,dep_lat,dep_lon,arr_lat,arr_lon,origin,dest').split(',')
LABEL_COLUMN = 'ontime'
DEFAULTS     = [[0.0],[0.0],[0.0],[0.0],[0.0],[0.0],\
                ['na'],[0.0],[0.0],[0.0],[0.0],['na'],['na']]

def load_dataset(pattern):
  return tf.data.experimental.make_csv_dataset(pattern, 1, CSV_COLUMNS, DEFAULTS)

In [32]:
if DEVELOP_MODE:
    dataset = load_dataset(TRAIN_DATA_PATTERN)
    for n, data in enumerate(dataset):
        numpy_data = {k: v.numpy() for k, v in data.items()} # .numpy() works only in eager mode
        print(numpy_data)
        if n>3: break

{'ontime': array([1.], dtype=float32), 'dep_delay': array([-3.], dtype=float32), 'taxiout': array([14.], dtype=float32), 'distance': array([824.], dtype=float32), 'avg_dep_delay': array([26.482693], dtype=float32), 'avg_arr_delay': array([30.], dtype=float32), 'carrier': array([b'NK'], dtype=object), 'dep_lat': array([28.429445], dtype=float32), 'dep_lon': array([-81.30889], dtype=float32), 'arr_lat': array([40.274723], dtype=float32), 'arr_lon': array([-79.40667], dtype=float32), 'origin': array([b'MCO'], dtype=object), 'dest': array([b'LBE'], dtype=object)}
{'ontime': array([1.], dtype=float32), 'dep_delay': array([3.], dtype=float32), 'taxiout': array([10.], dtype=float32), 'distance': array([733.], dtype=float32), 'avg_dep_delay': array([33.2664], dtype=float32), 'avg_arr_delay': array([19.4], dtype=float32), 'carrier': array([b'AA'], dtype=object), 'dep_lat': array([32.896946], dtype=float32), 'dep_lon': array([-97.038055], dtype=float32), 'arr_lat': array([38.174168], dtype=float

In [14]:
%%writefile example_input.json
{"dep_delay": 14.0, "taxiout": 13.0, "distance": 319.0, "avg_dep_delay": 25.863039, "avg_arr_delay": 27.0, "carrier": "WN", "dep_lat": 32.84722, "dep_lon": -96.85167, "arr_lat": 31.9425, "arr_lon": -102.20194, "origin": "DAL", "dest": "MAF"}
{"dep_delay": -9.0, "taxiout": 21.0, "distance": 301.0, "avg_dep_delay": 41.050808, "avg_arr_delay": -7.0, "carrier": "EV", "dep_lat": 29.984444, "dep_lon": -95.34139, "arr_lat": 27.544167, "arr_lon": -99.46167, "origin": "IAH", "dest": "LRD"}

Overwriting example_input.json


In [34]:
def features_and_labels(features):
  label = features.pop('ontime') # this is what we will train for
  return features, label

def prepare_dataset(pattern, batch_size, truncate=None, mode=tf.estimator.ModeKeys.TRAIN):
  dataset = load_dataset(pattern)
  dataset = dataset.map(features_and_labels)
  dataset = dataset.cache()
  if mode == tf.estimator.ModeKeys.TRAIN:
    dataset = dataset.shuffle(1000)
    dataset = dataset.repeat()
  dataset = dataset.batch(batch_size)
  dataset = dataset.prefetch(1)
  if truncate is not None:
    dataset = dataset.take(truncate)
  return dataset

if DEVELOP_MODE:
    print("Calling prepare")
    one_item = prepare_dataset(TRAIN_DATA_PATTERN, batch_size=5, truncate=1)
    print(list(one_item)) # should print one batch of 2 items

Calling prepare
[(OrderedDict([('dep_delay', <tf.Tensor: id=6654, shape=(5, 1), dtype=float32, numpy=
array([[26.],
       [ 0.],
       [ 0.],
       [-5.],
       [-2.]], dtype=float32)>), ('taxiout', <tf.Tensor: id=6660, shape=(5, 1), dtype=float32, numpy=
array([[13.],
       [23.],
       [13.],
       [15.],
       [28.]], dtype=float32)>), ('distance', <tf.Tensor: id=6658, shape=(5, 1), dtype=float32, numpy=
array([[351.],
       [201.],
       [201.],
       [145.],
       [106.]], dtype=float32)>), ('avg_dep_delay', <tf.Tensor: id=6652, shape=(5, 1), dtype=float32, numpy=
array([[31.995535],
       [22.849628],
       [24.786774],
       [24.692053],
       [26.245787]], dtype=float32)>), ('avg_arr_delay', <tf.Tensor: id=6651, shape=(5, 1), dtype=float32, numpy=
array([[  4.],
       [  8.],
       [ -6.],
       [-13.],
       [  9.]], dtype=float32)>), ('carrier', <tf.Tensor: id=6653, shape=(5, 1), dtype=string, numpy=
array([[b'EV'],
       [b'EV'],
       [b'EV'],
       [

## Create TensorFlow wide-and-deep model

We'll create feature columns, and do some discretization and feature engineering.
See the book for details.

In [35]:
import tensorflow.feature_column as fc

real = {
    colname : fc.numeric_column(colname) \
          for colname in \
            ('dep_delay,taxiout,distance,avg_dep_delay,avg_arr_delay' +
             ',dep_lat,dep_lon,arr_lat,arr_lon').split(',')
}
sparse = {
      'carrier': fc.categorical_column_with_vocabulary_list('carrier',
                  vocabulary_list='AS,VX,F9,UA,US,WN,HA,EV,MQ,DL,OO,B6,NK,AA'.split(',')),
      'origin' : fc.categorical_column_with_hash_bucket('origin', hash_bucket_size=1000),
      'dest'   : fc.categorical_column_with_hash_bucket('dest', hash_bucket_size=1000)
}

### Feature engineering

In [36]:
latbuckets = np.linspace(20.0, 50.0, NBUCKETS).tolist()  # USA
lonbuckets = np.linspace(-120.0, -70.0, NBUCKETS).tolist() # USA
disc = {}
disc.update({
       'd_{}'.format(key) : fc.bucketized_column(real[key], latbuckets) \
          for key in ['dep_lat', 'arr_lat']
})
disc.update({
       'd_{}'.format(key) : fc.bucketized_column(real[key], lonbuckets) \
          for key in ['dep_lon', 'arr_lon']
})

# cross columns that make sense in combination
sparse['dep_loc'] = fc.crossed_column([disc['d_dep_lat'], disc['d_dep_lon']], NBUCKETS*NBUCKETS)
sparse['arr_loc'] = fc.crossed_column([disc['d_arr_lat'], disc['d_arr_lon']], NBUCKETS*NBUCKETS)
sparse['dep_arr'] = fc.crossed_column([sparse['dep_loc'], sparse['arr_loc']], NBUCKETS ** 4)
sparse['ori_dest'] = fc.crossed_column(['origin', 'dest'], hash_bucket_size=1000)

# embed all the sparse columns
embed = {
       colname : fc.embedding_column(col, 10) \
          for colname, col in sparse.items()
}
real.update(embed)

if DEVELOP_MODE:
    print(sparse.keys())
    print(real.keys())

dict_keys(['carrier', 'origin', 'dest', 'dep_loc', 'arr_loc', 'dep_arr', 'ori_dest'])
dict_keys(['dep_delay', 'taxiout', 'distance', 'avg_dep_delay', 'avg_arr_delay', 'dep_lat', 'dep_lon', 'arr_lat', 'arr_lon', 'carrier', 'origin', 'dest', 'dep_loc', 'arr_loc', 'dep_arr', 'ori_dest'])


### Serving

This serving input function is how the model will be deployed for prediction. We require these fields for prediction

In [37]:
def serving_input_fn():
    feature_placeholders = {
        # All the real-valued columns
        column: tf.Variable(dtype=tf.float32, trainable=False, expected_shape=[None], name=column) \
             for column in ('dep_delay,taxiout,distance,avg_dep_delay,avg_arr_delay' +
                            ',dep_lat,dep_lon,arr_lat,arr_lon').split(',')
    }
    feature_placeholders.update({
        column: tf.Variable(dtype=tf.string, trainable=False, expected_shape=[None], name=column) \
             for column in ['carrier', 'origin', 'dest']
    })
    features = feature_placeholders # no transformations
    return tf.estimator.export.ServingInputReceiver(features, feature_placeholders)

## Train the model and evaluate once in a while

Also checkpoint

In [38]:
model_dir='gs://{}/flights/trained_model'.format(BUCKET)
os.environ['OUTDIR'] = model_dir  # needed for deployment
print('Writing trained model to {}'.format(model_dir))

Writing trained model to gs://cloud-training-demos-ml/flights/trained_model


In [39]:
!gsutil -m rm -rf $OUTDIR

CommandException: 1 files/objects could not be removed.


In [40]:
estimator = tf.estimator.DNNLinearCombinedClassifier(
        model_dir = model_dir,
        linear_feature_columns = sparse.values(),
        dnn_feature_columns = real.values(),
        dnn_hidden_units = [64, 32])

train_batch_size = 64
train_input_fn = lambda: prepare_dataset(TRAIN_DATA_PATTERN, train_batch_size)
eval_batch_size = 100 if DEVELOP_MODE else 10000
eval_input_fn = lambda: prepare_dataset(VALID_DATA_PATTERN, eval_batch_size, eval_batch_size*10, tf.estimator.ModeKeys.EVAL)
num_steps = 10 if DEVELOP_MODE else (1000000 // train_batch_size)

train_spec = tf.estimator.TrainSpec(train_input_fn, max_steps = num_steps)
exporter = tf.estimator.LatestExporter('exporter', serving_input_fn)
eval_spec = tf.estimator.EvalSpec(eval_input_fn, steps=10, exporters=exporter)
tf.estimator.train_and_evaluate(estimator, train_spec, eval_spec)

W0315 21:21:26.647607 139710232544640 deprecation.py:323] From /opt/anaconda3/lib/python3.6/site-packages/tensorflow/python/ops/array_grad.py:425: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.cast` instead.
W0315 21:21:26.742827 139710232544640 deprecation.py:506] From /opt/anaconda3/lib/python3.6/site-packages/tensorflow/python/training/adagrad.py:76: calling Constant.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor
W0315 21:21:27.104131 139710232544640 deprecation.py:506] From /opt/anaconda3/lib/python3.6/site-packages/tensorflow/python/training/slot_creator.py:187: calling Zeros.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.
Instructions for updatin

AttributeError: 'DNNLinearCombinedClassifierV2' object has no attribute 'export_savedmodel'

## Deploy the trained model

In [None]:
%%bash
model_dir=$(gsutil ls ${OUTDIR}/export/exporter | tail -1)
echo $model_dir
saved_model_cli show --dir ${model_dir} --all

In [None]:
%%bash
MODEL_NAME="flights"
MODEL_VERSION="kfp"
TFVERSION="2.0"
MODEL_LOCATION=$(gsutil ls ${OUTDIR}/export/exporter | tail -1)
echo "Run these commands one-by-one (the very first time, you'll create a model and then create a version)"
#yes | gcloud ml-engine versions delete ${MODEL_VERSION} --model ${MODEL_NAME}
#gcloud ml-engine models delete ${MODEL_NAME}
gcloud ml-engine models create ${MODEL_NAME} --regions $REGION
gcloud ml-engine versions create ${MODEL_VERSION} --model ${MODEL_NAME} --origin ${MODEL_LOCATION} --runtime-version $TFVERSION

In [None]:
!gcloud ml-engine predict --model=flights --version=kfp --json-instances=example_input.json

Copyright 2016 Google Inc. Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License