<h1> Robust training and monitoring </h1>

In this notebook, you will refactor the code to call ```train_and_evaluate``` method instead of hand-coding the evaluation of the machine learning pipeline. This ensures that the evaluation is done as part of the training process instead of as a separate step. It also adds in failure-handling that is necessary for robust  distributed training capabilities.

Finally, you will use TensorBoard to monitor the training.



---
Before you start, **make sure that you are logged in with your student account**. Otherwise you may incur Google Cloud charges for using this notebook. 

---

In [0]:
import tensorflow as tf
import numpy as np
import shutil
import tensorflow as tf

print tf.__version__

#@markdown Remember to uncheck "Reset all runtimes before running"

#@markdown As you know, reseting the runtime will delete any files you may have on your notebook file system. 
#@markdown ![](https://i.imgur.com/9dgw0h0.png)

# Copy taxi-*.csv files from github if they are missing from the runtime.
!wget --quiet -nc https://github.com/osipov/training-data-analyst/raw/master/bootcamps/serverless_ml/taxi-11k-datasets.zip  
!unzip -q -n taxi-11k-datasets.zip 

<h2> Input </h2>

Continue reading the data using the `read_dataset` method implemented in the earlier lab. Recall that this implementation is reading data in batches. Instead of using Pandas Dataframes, the method is using `tf.data.TextLineDataset` from the TensorFlow Datasets API.

In [0]:
CSV_COLUMNS = ['fare_amount', 'pickuplon','pickuplat','dropofflon','dropofflat','passengers', 'key']
LABEL_COLUMN = 'fare_amount'
DEFAULTS = [[0.0], [-74.0], [40.0], [-74.0], [40.7], [1.0], ['nokey']]

def read_dataset(filename, mode, batch_size = 512):
  def _input_fn():
    def decode_csv(value_column):
      columns = tf.decode_csv(value_column, record_defaults = DEFAULTS)
      features = dict(zip(CSV_COLUMNS, columns))
      label = features.pop(LABEL_COLUMN)
      return features, label
    
    # Create list of files that match pattern
    file_list = tf.gfile.Glob(filename)

    # Create dataset from file list
    dataset = tf.data.TextLineDataset(file_list).map(decode_csv)
    
    if mode == tf.estimator.ModeKeys.TRAIN:
        num_epochs = None # indefinitely
        dataset = dataset.shuffle(buffer_size = 10 * batch_size)
    else:
        num_epochs = 1 # end-of-input after this
 
    dataset = dataset.repeat(num_epochs).batch(batch_size)
    return dataset.make_one_shot_iterator().get_next()
  return _input_fn

<h2> Create features out of input data </h2>

For now, pass these through as in the earlier lab.

In [0]:
INPUT_COLUMNS = [
    tf.feature_column.numeric_column('pickuplon'),
    tf.feature_column.numeric_column('pickuplat'),
    tf.feature_column.numeric_column('dropofflat'),
    tf.feature_column.numeric_column('dropofflon'),
    tf.feature_column.numeric_column('passengers'),
]

def add_more_features(feats):
  # Nothing to add (yet!)
  return feats

feature_cols = add_more_features(INPUT_COLUMNS)

<h2> train_and_evaluate </h2>

Recall that the `serving_input_fn` is needed to define a prediction interface to your Estimator API based model. This function is used during the evaluation of the model. It will also be used in a later lab by a model that you will deploy into production. The feature placeholders define names and data types for each feature. 

The `features` dictionary provides support for predicting with the model over batches of evaluation and test examples.

In [0]:
def serving_input_fn():
  feature_placeholders = {
    'pickuplon' : tf.placeholder(tf.float32, [None]),
    'pickuplat' : tf.placeholder(tf.float32, [None]),
    'dropofflat' : tf.placeholder(tf.float32, [None]),
    'dropofflon' : tf.placeholder(tf.float32, [None]),
    'passengers' : tf.placeholder(tf.float32, [None]),
  }
  #tf.expand_dims inserts a batch dimensions before
  #the feature values, so that it is possible 
  #to classify multiple rows of feature values as a batch
  features = {
      key: tf.expand_dims(tensor, -1)
      for key, tensor in feature_placeholders.items()
  }
  
  return tf.estimator.export.ServingInputReceiver(features, feature_placeholders)

In [0]:
def train_and_evaluate(output_dir, num_train_steps):
  estimator = tf.estimator.LinearRegressor(
                       model_dir = output_dir,
                       feature_columns = feature_cols)
  
  train_spec=tf.estimator.TrainSpec(
                       input_fn = read_dataset('./taxi-train.csv', mode = tf.estimator.ModeKeys.TRAIN),
                       max_steps = num_train_steps)
  
  exporter = tf.estimator.LatestExporter('exporter', serving_input_fn)
  
  eval_spec=tf.estimator.EvalSpec(
                       input_fn = read_dataset('./taxi-valid.csv', mode = tf.estimator.ModeKeys.EVAL),
                       steps = None,
                       start_delay_secs = 1, # start evaluating after N seconds
                       throttle_secs = 10,  # evaluate every N seconds
                       exporters = exporter)
  
  tf.estimator.train_and_evaluate(estimator, train_spec, eval_spec)

In [0]:
# Run training    
OUTDIR = 'taxi_trained'
shutil.rmtree(OUTDIR, ignore_errors = True) # start fresh each time
train_and_evaluate(OUTDIR, num_train_steps = 5000)

<h2> Monitoring with TensorBoard </h2>

In [0]:
!pip install tensorboard==1.13.0
%load_ext tensorboard.notebook 

In [0]:
#@markdown Run this cell to start TensorBoard.

#@markdown Once the TensorBoard comes up, put this cell in focus, click on the vertical ellipsis in the upper right of this cell, and choose view output full screen.
%tensorboard --logdir $OUTDIR

Copyright 2019 Counter Factual. AI LLC. Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License