<h1> Training and monitoring </h1>

In this notebook, we will use the taxi rides data in New York City to build a Machine Learning model in support of a fare-estimation tool. The idea is to suggest a likely fare to taxi riders so that they are not surprised, and so that they can protest if the charge is much higher than expected.

We will use the linear regression model that comes with Tensorflow to perform the training. We also use TensorBoard to monitor the training.

In [None]:
import tensorflow as tf
from tensorflow.contrib import layers
import shutil

if 'session' in locals() and session is not None:
    print('Close interactive session')
    session.close()

print(tf.__version__)

<h2> Input </h2>

Read data in batches.  Instead of using Pandas, we will use add a filename queue to the TensorFlow graph.

In [None]:
CSV_COLUMNS = ['fare_amount', 'pickuplon','pickuplat','dropofflon','dropofflat','passengers', 'key']
LABEL_COLUMN = 'fare_amount'
DEFAULTS = [[0.0], [-74.0], [40.0], [-74.0], [40.7], [1.0], ['nokey']]

def read_dataset(filename, num_epochs=None, batch_size=512, mode=tf.contrib.learn.ModeKeys.TRAIN):
    def _input_fn():
        filename_queue = tf.train.string_input_producer([filename], num_epochs=num_epochs, shuffle=True)
        reader = tf.TextLineReader()
        _, value = reader.read_up_to(filename_queue, num_records=batch_size)

        value_column = tf.expand_dims(value, -1)
        columns = tf.decode_csv(value_column, record_defaults=DEFAULTS)
        features = dict(zip(CSV_COLUMNS, columns))
        label = features.pop(LABEL_COLUMN)
        return features, label

    return _input_fn

def get_train():
    return read_dataset('./datasets/taxi-train.csv', num_epochs=100, mode=tf.contrib.learn.ModeKeys.TRAIN)

def get_valid():
    return read_dataset('./datasets/taxi-valid.csv', num_epochs=1, mode=tf.contrib.learn.ModeKeys.EVAL)

def get_test():
    return read_dataset('./datasets/taxi-test.csv', num_epochs=1, mode=tf.contrib.learn.ModeKeys.EVAL)

<h2> Create features out of input data </h2>

In [None]:
INPUT_COLUMNS = [
    layers.real_valued_column('pickuplon'),
    layers.real_valued_column('pickuplat'),
    layers.real_valued_column('dropofflat'),
    layers.real_valued_column('dropofflon'),
    layers.real_valued_column('passengers'),
]

feature_cols = INPUT_COLUMNS

<h2> Experiment framework </h2>

In [None]:
import tensorflow.contrib.learn as tflearn
from tensorflow.contrib.learn.python.learn import learn_runner
import tensorflow.contrib.metrics as metrics

def experiment_fn(output_dir):
    return tflearn.Experiment(
        tflearn.LinearRegressor(feature_columns=feature_cols, model_dir=output_dir),
        train_input_fn=get_train(),
        eval_input_fn=get_valid(),
        eval_metrics={
            'rmse': tflearn.MetricSpec(
                metric_fn=metrics.streaming_root_mean_squared_error
            )
        }
    )


shutil.rmtree('taxi_trained', ignore_errors=True) # start fresh each time
learn_runner.run(experiment_fn, 'taxi_trained')

<h2> Monitoring with TensorBoard </h2>

The `learn_runner` function above already helped us to write log files needed.

* Run `tensorboard --logdir=taxi_trained` in command prompt to launch the tool
* Navigate to the URL given, and then go to the "Events" tab and "Graph" tab

Copyright 2017 Google Inc. Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License