# ML with TensorFlow Extended (TFX) -- Part 3
The puprpose of this tutorial is to show how to do end-to-end ML with TFX libraries on Google Cloud Platform. This tutorial covers:
1. Data analysis and schema generation with **TF Data Validation**.
2. Data preprocessing with **TF Transform**.
3. Model training with **TF Estimator**.
4. Model evaluation with **TF Model Analysis**.

This notebook has been tested in Jupyter on the Deep Learning VM.

## 0. Setup Python and Cloud environment

Install libraries we need.

In [None]:
%pip install -q --upgrade tensorflow_data_validation tensorflow_model_analysis

Note: you may need to restart the kernel to use updated packages.


In [1]:
import apache_beam as beam
import platform
import tensorflow as tf
import tensorflow_data_validation as tfdv
import tensorflow_transform as tft
import tornado

print('tornado version: {}'.format(tornado.version))
print('Python version: {}'.format(platform.python_version()))
print('TF version: {}'.format(tf.__version__))
print('TFT version: {}'.format(tft.__version__))
print('TFDV version: {}'.format(tfdv.__version__))
print('Apache Beam version: {}'.format(beam.__version__))

  'Running the Apache Beam SDK on Python 3 is not yet fully supported. '


tornado version: 6.0.2
Python version: 3.5.3
TF version: 1.13.1
TFT version: 0.13.0
TFDV version: 0.13.1
Apache Beam version: 2.11.0


In [2]:
PROJECT = 'cloud-training-demos'    # Replace with your PROJECT
BUCKET = 'cloud-training-demos-ml'  # Replace with your BUCKET
REGION = 'us-central1'              # Choose an available region for Cloud MLE

import os

os.environ['PROJECT'] = PROJECT
os.environ['BUCKET'] = BUCKET
os.environ['REGION'] = REGION

In [3]:
%%bash
gcloud config set project $PROJECT
gcloud config set compute/region $REGION

## ensure we predict locally with our current Python environment
gcloud config set ml_engine/local_python `which python`

Updated property [core/project].
Updated property [compute/region].
Updated property [ml_engine/local_python].


<img valign="middle" src="images/tfx.jpeg">

### Flights dataset

We'll use the flights dataset from the book [Data Science on Google Cloud Platform](http://shop.oreilly.com/product/0636920057628.do)

In [4]:
DATA_BUCKET = "gs://cloud-training-demos/flights/chapter8/output/"
TRAIN_DATA_PATTERN = DATA_BUCKET + "train*"
EVAL_DATA_PATTERN = DATA_BUCKET + "test*"

In [5]:
CSV_COLUMNS = ('ontime,dep_delay,taxiout,distance,avg_dep_delay,avg_arr_delay' + 
               ',carrier,dep_lat,dep_lon,arr_lat,arr_lon,origin,dest').split(',')
TARGET_FEATURE_NAME = 'ontime'
DEFAULTS     = [[0.0],[0.0],[0.0],[0.0],[0.0],[0.0],\
                ['na'],[0.0],[0.0],[0.0],[0.0],['na'],['na']]

## 3. Model Training
For training the model, we use [TF Estimators](https://www.tensorflow.org/guide/estimators) APIs to train a premade DNNClassifier. We perform the following:
1. Load the **transform schema**
2. Use the transform schema to parse TFRecords in **input_fn**
3. Use the transform schema to create **feature columns**
4. Create a premade **DNNClassifier**
5. **Train** the model
6. Implement the **serving_input_fn** and apply the **transform logic**
7. **Export** and test the saved model.

### 3.1 Load transform output

In [6]:
PREPROC_OUTPUT_DIR = 'gs://{}/flights/tfx'.format(BUCKET)  # from 02_transform.ipynb
TRANSFORM_ARTIFACTS_DIR = os.path.join(PREPROC_OUTPUT_DIR,'transform')
TRANSFORMED_DATA_DIR = os.path.join(PREPROC_OUTPUT_DIR,'transformed')
!gcloud storage ls $TRANSFORM_ARTIFACTS_DIR
!gcloud storage ls $TRANSFORMED_DATA_DIR

gs://cloud-training-demos-ml/flights/tfx/transform/
gs://cloud-training-demos-ml/flights/tfx/transform/transform_fn/
gs://cloud-training-demos-ml/flights/tfx/transform/transformed_metadata/
gs://cloud-training-demos-ml/flights/tfx/transformed/eval-00000-of-00008.tfrecords
gs://cloud-training-demos-ml/flights/tfx/transformed/eval-00001-of-00008.tfrecords
gs://cloud-training-demos-ml/flights/tfx/transformed/eval-00002-of-00008.tfrecords
gs://cloud-training-demos-ml/flights/tfx/transformed/eval-00003-of-00008.tfrecords
gs://cloud-training-demos-ml/flights/tfx/transformed/eval-00004-of-00008.tfrecords
gs://cloud-training-demos-ml/flights/tfx/transformed/eval-00005-of-00008.tfrecords
gs://cloud-training-demos-ml/flights/tfx/transformed/eval-00006-of-00008.tfrecords
gs://cloud-training-demos-ml/flights/tfx/transformed/eval-00007-of-00008.tfrecords
gs://cloud-training-demos-ml/flights/tfx/transformed/train-00000-of-00031.tfrecords
gs://cloud-training-demos-ml/flights/tfx/transformed/train-000

In [7]:
transform_output = tft.TFTransformOutput(TRANSFORM_ARTIFACTS_DIR)

### 3.2 TFRecords Input Function

In [8]:
def make_input_fn(tfrecords_files, 
  batch_size, num_epochs=1, shuffle=False):

  def input_fn():
    dataset = tf.data.experimental.make_batched_features_dataset(
      file_pattern=tfrecords_files,
      batch_size=batch_size,
      features=transform_output.transformed_feature_spec(),
      label_key=TARGET_FEATURE_NAME,
      reader=tf.data.TFRecordDataset,
      num_epochs=num_epochs,
      shuffle=shuffle
    )
    return dataset

  return input_fn

In [9]:
make_input_fn(TRANSFORMED_DATA_DIR+'/train*.tfrecords', 2, shuffle=False)()

<DatasetV1Adapter shapes: ({avg_dep_delay_scaled: (?, 1), distance_bucketized: (?, 1), avg_arr_delay_scaled: (?, 1), dep_lat_scaled: (?, 1), origin_integerized: (?, 1), distance_scaled: (?, 1), taxiout_scaled: (?, 1), arr_lon_scaled: (?, 1), dep_delay_scaled: (?, 1), dest_integerized: (?, 1), arr_lat_scaled: (?, 1), dep_lon_scaled: (?, 1), carrier_integerized: (?, 1)}, (?, 1)), types: ({avg_dep_delay_scaled: tf.float32, distance_bucketized: tf.int64, avg_arr_delay_scaled: tf.float32, dep_lat_scaled: tf.float32, origin_integerized: tf.int64, distance_scaled: tf.float32, taxiout_scaled: tf.float32, arr_lon_scaled: tf.float32, dep_delay_scaled: tf.float32, dest_integerized: tf.int64, arr_lat_scaled: tf.float32, dep_lon_scaled: tf.float32, carrier_integerized: tf.int64}, tf.float32)>

### 3.3 Create feature columns

In [10]:
import math

def create_feature_columns():

  feature_columns = []
  transformed_features = transform_output.transformed_metadata.schema._schema_proto.feature

  for feature in transformed_features:

    if feature.name in [TARGET_FEATURE_NAME]:
      continue

    if hasattr(feature, 'int_domain') and feature.int_domain.is_categorical:
      vocab_size = feature.int_domain.max + 1
      feature_columns.append(
        tf.feature_column.embedding_column(
          tf.feature_column.categorical_column_with_identity(
            feature.name, num_buckets=vocab_size),
            dimension = int(math.sqrt(vocab_size))))
    else:
      feature_columns.append(
        tf.feature_column.numeric_column(feature.name))

  return feature_columns

In [11]:
create_feature_columns()

[NumericColumn(key='arr_lat_scaled', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 NumericColumn(key='arr_lon_scaled', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 NumericColumn(key='avg_arr_delay_scaled', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 NumericColumn(key='avg_dep_delay_scaled', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 EmbeddingColumn(categorical_column=IdentityCategoricalColumn(key='carrier_integerized', number_buckets=14, default_value=None), dimension=3, combiner='mean', initializer=<tensorflow.python.ops.init_ops.TruncatedNormal object at 0x7f02e1c63eb8>, ckpt_to_load_from=None, tensor_name_in_ckpt=None, max_norm=None, trainable=True),
 NumericColumn(key='dep_delay_scaled', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 NumericColumn(key='dep_lat_scaled', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 NumericCo

### 3.4 Instantiate and Estimator

In [12]:
def create_estimator(params, run_config):
    
  feature_columns = create_feature_columns()

  estimator = tf.estimator.DNNClassifier(
    #weight_column=WEIGHT_COLUMN_NAME,
    #label_vocabulary=TARGET_LABELS,
    feature_columns=feature_columns,
    hidden_units=params.hidden_units,
    config=run_config
  )

  return estimator

### 3.5 Implement train and evaluate experiment

In [13]:
from datetime import datetime

def run_experiment(estimator, params, run_config, resume=False):
  
  tf.logging.set_verbosity(tf.logging.INFO)

  if not resume: 
    if tf.gfile.Exists(run_config.model_dir):
      print("Removing previous artifacts...")
      tf.gfile.DeleteRecursively(run_config.model_dir)
  else:
    print("Resuming training...")

  train_spec = tf.estimator.TrainSpec(
      input_fn = make_input_fn(
          TRANSFORMED_DATA_DIR+'/train*.tfrecords',
          batch_size=params.batch_size,
          num_epochs=None,
          shuffle=True
      ),
      max_steps=params.max_steps
  )

  eval_spec = tf.estimator.EvalSpec(
      input_fn = make_input_fn(
          TRANSFORMED_DATA_DIR+'/eval*.tfrecords',
          batch_size=params.batch_size,     
      ),
      start_delay_secs=0,
      throttle_secs=0,
      steps=None
  )
  
  time_start = datetime.utcnow() 
  print("Experiment started at {}".format(time_start.strftime("%H:%M:%S")))
  print(".......................................")
  
  tf.estimator.train_and_evaluate(
    estimator=estimator,
    train_spec=train_spec, 
    eval_spec=eval_spec)

  time_end = datetime.utcnow() 
  print(".......................................")
  print("Experiment finished at {}".format(time_end.strftime("%H:%M:%S")))
  print("")
  
  time_elapsed = time_end - time_start
  print("Experiment elapsed time: {} seconds".format(time_elapsed.total_seconds()))

### 3.5 Run experiment

In [14]:
MODELS_LOCATION = 'gs://{}/flights/tfx/models/'.format(BUCKET)
MODEL_NAME = 'dnn_classifier'
model_dir = os.path.join(MODELS_LOCATION, MODEL_NAME)
os.environ['MODEL_DIR'] = model_dir

params = tf.contrib.training.HParams()
params.hidden_units = [128, 64]
params.dropout = 0.15
params.batch_size =  128
params.max_steps = 1000

run_config = tf.estimator.RunConfig(
    tf_random_seed=19831006,
    save_checkpoints_steps=200, 
    keep_checkpoint_max=3, 
    model_dir=model_dir,
    log_step_count_steps=10
)

In [15]:
estimator = create_estimator(params, run_config)
run_experiment(estimator, params, run_config)

INFO:tensorflow:Using config: {'_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7f02a4dffe80>, '_train_distribute': None, '_service': None, '_model_dir': 'gs://cloud-training-demos-ml/flights/tfx/models/dnn_classifier', '_num_ps_replicas': 0, '_log_step_count_steps': 10, '_eval_distribute': None, '_experimental_distribute': None, '_tf_random_seed': 19831006, '_master': '', '_protocol': None, '_device_fn': None, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_evaluation_master': '', '_num_worker_replicas': 1, '_global_id_in_cluster': 0, '_keep_checkpoint_max': 3, '_task_id': 0, '_is_chief': True, '_save_checkpoints_secs': None, '_save_summary_steps': 100, '_task_type': 'worker', '_keep_checkpoint_every_n_hours': 10000, '_save_checkpoints_steps': 200}
Removing previous artifacts...
Experiment started at 05:31:25
.......................................
INFO:tensorflow:Not using Di

### 3.6 Export the model for serving

In [20]:
tf.logging.set_verbosity(tf.logging.ERROR)

RAW_SCHEMA_LOCATION = 'raw_schema.pbtxt'
def make_serving_input_receiver_fn():
  from tensorflow_transform.tf_metadata import schema_utils

  source_raw_schema = tfdv.load_schema_text(RAW_SCHEMA_LOCATION)
  raw_feature_spec = schema_utils.schema_as_feature_spec(source_raw_schema).feature_spec
  raw_feature_spec.pop(TARGET_FEATURE_NAME)

  # Create the interface for the serving function with the raw features
  raw_features = tf.estimator.export.build_parsing_serving_input_receiver_fn(raw_feature_spec)().features

  receiver_tensors = {feature: tf.placeholder(shape=[None], dtype=raw_features[feature].dtype) 
    for feature in raw_features
  }

  receiver_tensors_expanded = {tensor: tf.reshape(receiver_tensors[tensor], (-1, 1)) 
    for tensor in receiver_tensors
  }

  # Apply the transform function 
  transformed_features = transform_output.transform_raw_features(receiver_tensors_expanded)

  return tf.estimator.export.ServingInputReceiver(
    transformed_features, receiver_tensors)

In [21]:
export_dir = os.path.join(model_dir, 'export')

if tf.gfile.Exists(export_dir):
    tf.gfile.DeleteRecursively(export_dir)
        
estimator.export_savedmodel(
    export_dir_base=export_dir,
    serving_input_receiver_fn=make_serving_input_receiver_fn
)

b'gs://cloud-training-demos-ml/flights/tfx/models/dnn_classifier/export/1554183658'

In [24]:
%%bash

saved_models_base=${MODEL_DIR}/export/
saved_model_dir=$(gcloud storage ls ${saved_models_base} | tail -n 1)
echo ${saved_model_dir}
saved_model_cli show --dir=${saved_model_dir} --all

gs://cloud-training-demos-ml/flights/tfx/models/dnn_classifier/export/1554183658/

MetaGraphDef with tag-set: 'serve' contains the following SignatureDefs:

signature_def['predict']:
  The given SavedModel SignatureDef contains the following input(s):
    inputs['arr_lat'] tensor_info:
        dtype: DT_FLOAT
        shape: (-1)
        name: Placeholder_2:0
    inputs['arr_lon'] tensor_info:
        dtype: DT_FLOAT
        shape: (-1)
        name: Placeholder_4:0
    inputs['avg_arr_delay'] tensor_info:
        dtype: DT_FLOAT
        shape: (-1)
        name: Placeholder_10:0
    inputs['avg_dep_delay'] tensor_info:
        dtype: DT_FLOAT
        shape: (-1)
        name: Placeholder_9:0
    inputs['carrier'] tensor_info:
        dtype: DT_STRING
        shape: (-1)
        name: Placeholder_6:0
    inputs['dep_delay'] tensor_info:
        dtype: DT_FLOAT
        shape: (-1)
        name: Placeholder:0
    inputs['dep_lat'] tensor_info:
        dtype: DT_FLOAT
        shape: (-1)
 

### 3.7 Try out saved model

In [26]:
export_dir = os.path.join(model_dir, 'export')
tf.gfile.ListDirectory(export_dir)[-1]
saved_model_dir = os.path.join(export_dir, tf.gfile.ListDirectory(export_dir)[-1])
print(saved_model_dir)
print()

predictor_fn = tf.contrib.predictor.from_saved_model(
    export_dir = saved_model_dir,
    signature_def_key="predict"
)

input = {
        'dep_delay': [14.0],
        'taxiout': [13.0],
        'distance': [319.0],
        'avg_dep_delay': [25.86],
        'avg_arr_delay': [27.0],
        'carrier': ['WN'],
        'dep_lat': [32.85],
        'dep_lon': [-96.85],
        'arr_lat': [31.94],
        'arr_lon': [-102.2], 
        'origin': ['DAL'], 
        'dest': ['MAF']
}

print(input)
print()
output = predictor_fn(input)
print(output)

gs://cloud-training-demos-ml/flights/tfx/models/dnn_classifier/export/1554183658/

{'dep_delay': [14.0], 'dep_lat': [32.85], 'arr_lat': [31.94], 'dest': ['MAF'], 'arr_lon': [-102.2], 'distance': [319.0], 'carrier': ['WN'], 'origin': ['DAL'], 'taxiout': [13.0], 'avg_dep_delay': [25.86], 'avg_arr_delay': [27.0], 'dep_lon': [-96.85]}

{'logits': array([[-1.2914119]], dtype=float32), 'probabilities': array([[0.78438604, 0.21561392]], dtype=float32), 'class_ids': array([[0]]), 'logistic': array([[0.21561392]], dtype=float32), 'classes': array([[b'0']], dtype=object)}


### 3.8 Deploy model to Cloud ML Engine

In [None]:
%%bash
MODEL_NAME="flights"
MODEL_VERSION="tfx"
MODEL_LOCATION=$(gcloud storage ls gs://${BUCKET}/flights/tfx/models/dnn_classifier/export/ | tail -1)
#gcloud ml-engine models create ${MODEL_NAME} --regions $REGION
#gcloud ml-engine versions delete ${MODEL_VERSION} --model ${MODEL_NAME}
gcloud ml-engine versions create ${MODEL_VERSION} --model ${MODEL_NAME} --origin ${MODEL_LOCATION} --runtime-version 1.13

In [30]:
%%writefile input.json
{"dep_delay": 14.0,"taxiout": 13.0,"distance": 319.0,"avg_dep_delay": 25.86,"avg_arr_delay": 27.0, "carrier": "WN","dep_lat": 32.85,"dep_lon": -96.85,"arr_lat": 31.94,"arr_lon": -102.2, "origin": "DAL", "dest": "MAF"}

Writing input.json


In [None]:
%%bash
gcloud ml-engine predict --model=flights --version=tfx --json-instances input.json

### 3.9 Export evaluation saved model

In [37]:
def make_eval_input_receiver_fn():
  receiver_tensors = {'examples': tf.placeholder(dtype=tf.string, shape=[None])}
  columns = tf.decode_csv(receiver_tensors['examples'], record_defaults=DEFAULTS)
  features = dict(zip(CSV_COLUMNS, columns))
  print(features)

  for feature_name in features:
    if features[feature_name].dtype == tf.int32:
      features[feature_name] = tf.cast(features[feature_name], tf.int64)
    features[feature_name] = tf.reshape(features[feature_name], (-1, 1))

  transformed_features = transform_output.transform_raw_features(features)
  features.update(transformed_features)

  return tfma.export.EvalInputReceiver(
    features=features,
    receiver_tensors=receiver_tensors,
    labels=features[TARGET_FEATURE_NAME]
    )

In [38]:
import tensorflow_model_analysis as tfma
eval_model_dir = os.path.join(model_dir, "export/evaluate")
if tf.gfile.Exists(eval_model_dir):
    tf.gfile.DeleteRecursively(eval_model_dir)

tfma.export.export_eval_savedmodel(
        estimator=estimator,
        export_dir_base=eval_model_dir,
        eval_input_receiver_fn=make_eval_input_receiver_fn
)

{'dep_delay': <tf.Tensor 'DecodeCSV:1' shape=(?,) dtype=float32>, 'dep_lat': <tf.Tensor 'DecodeCSV:7' shape=(?,) dtype=float32>, 'arr_lat': <tf.Tensor 'DecodeCSV:9' shape=(?,) dtype=float32>, 'dest': <tf.Tensor 'DecodeCSV:12' shape=(?,) dtype=string>, 'arr_lon': <tf.Tensor 'DecodeCSV:10' shape=(?,) dtype=float32>, 'distance': <tf.Tensor 'DecodeCSV:3' shape=(?,) dtype=float32>, 'avg_arr_delay': <tf.Tensor 'DecodeCSV:5' shape=(?,) dtype=float32>, 'carrier': <tf.Tensor 'DecodeCSV:6' shape=(?,) dtype=string>, 'origin': <tf.Tensor 'DecodeCSV:11' shape=(?,) dtype=string>, 'taxiout': <tf.Tensor 'DecodeCSV:2' shape=(?,) dtype=float32>, 'avg_dep_delay': <tf.Tensor 'DecodeCSV:4' shape=(?,) dtype=float32>, 'ontime': <tf.Tensor 'DecodeCSV:0' shape=(?,) dtype=float32>, 'dep_lon': <tf.Tensor 'DecodeCSV:8' shape=(?,) dtype=float32>}


b'gs://cloud-training-demos-ml/flights/tfx/models/dnn_classifier/export/evaluate/1554184743'

## License

Copyright 2019 Google LLC

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0.

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

---
This is not an official Google product. The sample code provided for educational purposes only.
---