<h1> Structured Data Solution </h1>

In this notebook, we will use the structured data package in Datalab to build a model to predict taxifares.

In [6]:
import os
PROJECT = 'cloud-training-demos'    # CHANGE THIS
BUCKET = 'cloud-training-demos-ml'  # CHANGE THIS
REGION = 'us-central1' # CHANGE THIS

os.environ['PROJECT'] = PROJECT # for bash
os.environ['BUCKET'] = BUCKET # for bash
os.environ['REGION'] = REGION # for bash

In [None]:
%bash
echo "project=$PROJECT"
echo "bucket=$BUCKET"
echo "region=$REGION"
gcloud config set project $PROJECT
gcloud config set compute/region $REGION
gcloud beta ml init-project -q

In [1]:
import tensorflow as tf
import datalab.ml as ml
import google.cloud.ml as cml
import datalab_solutions.structured_data as sd
from tensorflow.python.lib.io import file_io
import json
import shutil

print('tf ' + str(tf.__version__))
print('sd ' + str(sd.__version__))
print('cml ' + str(cml.__version__))

No handlers could be found for logger "oauth2client.contrib.multistore_file"


tf 1.0.0
sd 0.0.1
cml 0.1.9.1-alpha


In [13]:
INDIR = '../feateng/sample'
OUTDIR = '.'

# for bash
os.environ['OUTDIR'] = OUTDIR

<h2> Set up schema file </h2>

Schema of training/test. Same format as BigQuery.  STRING/INTEGER/FLOAT only.

In [3]:
%writefile $OUTDIR/taxifare.json
[
    {
        "mode": "NULLABLE",
        "name": "fare_amount",
        "type": "FLOAT"
    }, 
    {
        "mode": "NULLABLE",
        "name": "dayofweek",
        "type": "STRING"
    },
    {
        "mode": "NULLABLE",
        "name": "hourofday",
        "type": "STRING"
    },
    {
        "mode": "NULLABLE",
        "name": "pickuplon",
        "type": "FLOAT"
    },
    {
        "mode": "NULLABLE",
        "name": "pickuplat",
        "type": "FLOAT"
    },
    {
        "mode": "NULLABLE",
        "name": "dropofflon",
        "type": "FLOAT"
    },
    {
        "mode": "NULLABLE",
        "name": "dropofflat",
        "type": "FLOAT"
    },
    {
        "mode": "NULLABLE",
        "name": "passengers",
        "type": "FLOAT"
    },
    {
        "mode": "REQUIRED",
        "name": "key",
        "type": "STRING"
    } 
]

Overwriting ./taxifare.json


<h2> Preprocessing </h2>

The first step of preprocessing is to compute the min, max, etc. for scaling purposes.

In [14]:
!rm -rf $OUTDIR/taxi_preproc $OUTDIR/taxi_trained

In [15]:
train_csv = ml.CsvDataSet(
  file_pattern=os.path.join(INDIR, 'train*'),
  schema_file=os.path.join(OUTDIR, 'taxifare.json'))
sd.local_preprocess(
  dataset=train_csv,
  output_dir=os.path.join(OUTDIR, 'taxi_preproc'),
)

Starting local preprocessing.
Local preprocessing done.


The second step is to specify the feature columns and transformations.  The target and key transforms are required. Everything else is optional.

In [16]:
transforms = {
  "fare_amount": {"transform": "target"},
  "key": {"transform": "key"}, 
  "dayofweek": {"transform": "one_hot"},
  "hourofday": {"transform": "embedding", "embedding_dim": 2}, # group-combine the hour
}
file_io.write_string_to_file(os.path.join(OUTDIR, 'taxi_preproc/transforms.json'),
                             json.dumps(transforms, indent=2))

In [17]:
%bash
ls $OUTDIR/taxi_preproc

numerical_analysis.json
schema.json
transforms.json
vocab_dayofweek.csv
vocab_hourofday.csv
vocab_key.csv


In [19]:
!cat $OUTDIR/taxi_preproc/num*json

{
  "passengers": {
    "max": 6.0,
    "mean": 1.714834159189322,
    "min": 1.0
  },
  "fare_amount": {
    "max": 130.12,
    "mean": 11.346521137449155,
    "min": 2.5
  },
  "pickuplat": {
    "max": 41.591743,
    "mean": 40.751719439603896,
    "min": 40.27725
  },
  "dropofflat": {
    "max": 41.57285,
    "mean": 40.75180645182645,
    "min": 40.303627
  },
  "pickuplon": {
    "max": -73.137393,
    "mean": -73.97476843613309,
    "min": -75.26911
  },
  "dropofflon": {
    "max": -73.137393,
    "mean": -73.97400344112786,
    "min": -74.417107
  }
}

In [20]:
!cat $OUTDIR/taxi_preproc/vocab_day*

Wed
Sun
Fri
Tue
Mon
Thu
Sat

<h2> Local Training and prediction </h2>

Train using the preproprocessed data.

In [22]:
eval_csv = ml.CsvDataSet(
  file_pattern=os.path.join(INDIR, 'valid*'),
  schema_file=os.path.join(OUTDIR, 'taxifare.json'))

shutil.rmtree(os.path.join(OUTDIR, 'taxi_trained'), ignore_errors=True)
sd.local_train(
  train_dataset=train_csv,
  eval_dataset=eval_csv,
  preprocess_output_dir=os.path.join(OUTDIR, 'taxi_preproc'),
  transforms=os.path.join(OUTDIR, 'taxi_preproc/transforms.json'),
  output_dir=os.path.join(OUTDIR, 'taxi_trained'),
  model_type='dnn_regression',
  max_steps=2500,
  layer_sizes=[64, 4]
)

Starting local training.












INFO:tensorflow:Using config: {'_save_checkpoints_secs': 600, '_num_ps_replicas': 0, '_keep_checkpoint_max': 5, '_tf_random_seed': None, '_task_type': None, '_environment': 'local', '_is_chief': True, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7fd5e98bf3d0>, '_tf_config': gpu_options {
  per_process_gpu_memory_fraction: 1.0
}
, '_task_id': 0, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_evaluation_master': '', '_keep_checkpoint_every_n_hours': 10000, '_master': ''}


INFO:tensorflow:Using config: {'_save_checkpoints_secs': 600, '_num_ps_replicas': 0, '_keep_checkpoint_max': 5, '_tf_random_seed': None, '_task_type': None, '_environment': 'local', '_is_chief': True, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7fd5e98bf3d0>, '_tf_config': gpu_options {
  per_process_gpu_memory_fraction: 1.0
}
, '_task_id': 0, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_evaluation_master': '', '_keep_checkpoint_every_n_hours': 10000, '_master': ''}


Instructions for updating:
Monitors are deprecated. Please use tf.train.SessionRunHook.


Instructions for updating:
Monitors are deprecated. Please use tf.train.SessionRunHook.


Instructions for updating:
Please switch to tf.summary.scalar. Note that tf.summary.scalar uses the node name instead of the tag. This means that TensorFlow will automatically de-duplicate summary names based on the scope they are created in. Also, passing a tensor or list of tags to a scalar summary op is no longer supported.


Instructions for updating:
Please switch to tf.summary.scalar. Note that tf.summary.scalar uses the node name instead of the tag. This means that TensorFlow will automatically de-duplicate summary names based on the scope they are created in. Also, passing a tensor or list of tags to a scalar summary op is no longer supported.


INFO:tensorflow:Create CheckpointSaverHook.


INFO:tensorflow:Create CheckpointSaverHook.


INFO:tensorflow:Saving checkpoints for 1 into ./taxi_trained/train/model.ckpt.


INFO:tensorflow:Saving checkpoints for 1 into ./taxi_trained/train/model.ckpt.


INFO:tensorflow:loss = 179.78, step = 1


INFO:tensorflow:loss = 179.78, step = 1


Instructions for updating:
Please switch to tf.summary.scalar. Note that tf.summary.scalar uses the node name instead of the tag. This means that TensorFlow will automatically de-duplicate summary names based on the scope they are created in. Also, passing a tensor or list of tags to a scalar summary op is no longer supported.


Instructions for updating:
Please switch to tf.summary.scalar. Note that tf.summary.scalar uses the node name instead of the tag. This means that TensorFlow will automatically de-duplicate summary names based on the scope they are created in. Also, passing a tensor or list of tags to a scalar summary op is no longer supported.


INFO:tensorflow:Starting evaluation at 2017-02-28-16:38:02


INFO:tensorflow:Starting evaluation at 2017-02-28-16:38:02


INFO:tensorflow:Evaluation [1/100]


INFO:tensorflow:Evaluation [1/100]


INFO:tensorflow:Evaluation [2/100]


INFO:tensorflow:Evaluation [2/100]


INFO:tensorflow:Evaluation [3/100]


INFO:tensorflow:Evaluation [3/100]


INFO:tensorflow:Evaluation [4/100]


INFO:tensorflow:Evaluation [4/100]


INFO:tensorflow:Evaluation [5/100]


INFO:tensorflow:Evaluation [5/100]


INFO:tensorflow:Evaluation [6/100]


INFO:tensorflow:Evaluation [6/100]


INFO:tensorflow:Evaluation [7/100]


INFO:tensorflow:Evaluation [7/100]


INFO:tensorflow:Evaluation [8/100]


INFO:tensorflow:Evaluation [8/100]


INFO:tensorflow:Evaluation [9/100]


INFO:tensorflow:Evaluation [9/100]


INFO:tensorflow:Evaluation [10/100]


INFO:tensorflow:Evaluation [10/100]


INFO:tensorflow:Evaluation [11/100]


INFO:tensorflow:Evaluation [11/100]


INFO:tensorflow:Evaluation [12/100]


INFO:tensorflow:Evaluation [12/100]


INFO:tensorflow:Evaluation [13/100]


INFO:tensorflow:Evaluation [13/100]


INFO:tensorflow:Evaluation [14/100]


INFO:tensorflow:Evaluation [14/100]


INFO:tensorflow:Evaluation [15/100]


INFO:tensorflow:Evaluation [15/100]


INFO:tensorflow:Evaluation [16/100]


INFO:tensorflow:Evaluation [16/100]


INFO:tensorflow:Evaluation [17/100]


INFO:tensorflow:Evaluation [17/100]


INFO:tensorflow:Evaluation [18/100]


INFO:tensorflow:Evaluation [18/100]


INFO:tensorflow:Evaluation [19/100]


INFO:tensorflow:Evaluation [19/100]


INFO:tensorflow:Evaluation [20/100]


INFO:tensorflow:Evaluation [20/100]


INFO:tensorflow:Evaluation [21/100]


INFO:tensorflow:Evaluation [21/100]


INFO:tensorflow:Evaluation [22/100]


INFO:tensorflow:Evaluation [22/100]


INFO:tensorflow:Evaluation [23/100]


INFO:tensorflow:Evaluation [23/100]


INFO:tensorflow:Evaluation [24/100]


INFO:tensorflow:Evaluation [24/100]


INFO:tensorflow:Evaluation [25/100]


INFO:tensorflow:Evaluation [25/100]


INFO:tensorflow:Evaluation [26/100]


INFO:tensorflow:Evaluation [26/100]


INFO:tensorflow:Evaluation [27/100]


INFO:tensorflow:Evaluation [27/100]


INFO:tensorflow:Evaluation [28/100]


INFO:tensorflow:Evaluation [28/100]


INFO:tensorflow:Evaluation [29/100]


INFO:tensorflow:Evaluation [29/100]


INFO:tensorflow:Evaluation [30/100]


INFO:tensorflow:Evaluation [30/100]


INFO:tensorflow:Evaluation [31/100]


INFO:tensorflow:Evaluation [31/100]


INFO:tensorflow:Evaluation [32/100]


INFO:tensorflow:Evaluation [32/100]


INFO:tensorflow:Evaluation [33/100]


INFO:tensorflow:Evaluation [33/100]


INFO:tensorflow:Evaluation [34/100]


INFO:tensorflow:Evaluation [34/100]


INFO:tensorflow:Evaluation [35/100]


INFO:tensorflow:Evaluation [35/100]


INFO:tensorflow:Evaluation [36/100]


INFO:tensorflow:Evaluation [36/100]


INFO:tensorflow:Evaluation [37/100]


INFO:tensorflow:Evaluation [37/100]


INFO:tensorflow:Evaluation [38/100]


INFO:tensorflow:Evaluation [38/100]


INFO:tensorflow:Evaluation [39/100]


INFO:tensorflow:Evaluation [39/100]


INFO:tensorflow:Evaluation [40/100]


INFO:tensorflow:Evaluation [40/100]


INFO:tensorflow:Evaluation [41/100]


INFO:tensorflow:Evaluation [41/100]


INFO:tensorflow:Evaluation [42/100]


INFO:tensorflow:Evaluation [42/100]


INFO:tensorflow:Evaluation [43/100]


INFO:tensorflow:Evaluation [43/100]


INFO:tensorflow:Evaluation [44/100]


INFO:tensorflow:Evaluation [44/100]


INFO:tensorflow:Evaluation [45/100]


INFO:tensorflow:Evaluation [45/100]


INFO:tensorflow:Evaluation [46/100]


INFO:tensorflow:Evaluation [46/100]


INFO:tensorflow:Evaluation [47/100]


INFO:tensorflow:Evaluation [47/100]


INFO:tensorflow:Evaluation [48/100]


INFO:tensorflow:Evaluation [48/100]


INFO:tensorflow:Evaluation [49/100]


INFO:tensorflow:Evaluation [49/100]


INFO:tensorflow:Evaluation [50/100]


INFO:tensorflow:Evaluation [50/100]


INFO:tensorflow:Evaluation [51/100]


INFO:tensorflow:Evaluation [51/100]


INFO:tensorflow:Evaluation [52/100]


INFO:tensorflow:Evaluation [52/100]


INFO:tensorflow:Evaluation [53/100]


INFO:tensorflow:Evaluation [53/100]


INFO:tensorflow:Evaluation [54/100]


INFO:tensorflow:Evaluation [54/100]


INFO:tensorflow:Evaluation [55/100]


INFO:tensorflow:Evaluation [55/100]


INFO:tensorflow:Evaluation [56/100]


INFO:tensorflow:Evaluation [56/100]


INFO:tensorflow:Evaluation [57/100]


INFO:tensorflow:Evaluation [57/100]


INFO:tensorflow:Evaluation [58/100]


INFO:tensorflow:Evaluation [58/100]


INFO:tensorflow:Evaluation [59/100]


INFO:tensorflow:Evaluation [59/100]


INFO:tensorflow:Evaluation [60/100]


INFO:tensorflow:Evaluation [60/100]


INFO:tensorflow:Evaluation [61/100]


INFO:tensorflow:Evaluation [61/100]


INFO:tensorflow:Evaluation [62/100]


INFO:tensorflow:Evaluation [62/100]


INFO:tensorflow:Evaluation [63/100]


INFO:tensorflow:Evaluation [63/100]


INFO:tensorflow:Evaluation [64/100]


INFO:tensorflow:Evaluation [64/100]


INFO:tensorflow:Evaluation [65/100]


INFO:tensorflow:Evaluation [65/100]


INFO:tensorflow:Evaluation [66/100]


INFO:tensorflow:Evaluation [66/100]


INFO:tensorflow:Evaluation [67/100]


INFO:tensorflow:Evaluation [67/100]


INFO:tensorflow:Evaluation [68/100]


INFO:tensorflow:Evaluation [68/100]


INFO:tensorflow:Evaluation [69/100]


INFO:tensorflow:Evaluation [69/100]


INFO:tensorflow:Evaluation [70/100]


INFO:tensorflow:Evaluation [70/100]


INFO:tensorflow:Evaluation [71/100]


INFO:tensorflow:Evaluation [71/100]


INFO:tensorflow:Evaluation [72/100]


INFO:tensorflow:Evaluation [72/100]


INFO:tensorflow:Evaluation [73/100]


INFO:tensorflow:Evaluation [73/100]


INFO:tensorflow:Evaluation [74/100]


INFO:tensorflow:Evaluation [74/100]


INFO:tensorflow:Evaluation [75/100]


INFO:tensorflow:Evaluation [75/100]


INFO:tensorflow:Evaluation [76/100]


INFO:tensorflow:Evaluation [76/100]


INFO:tensorflow:Evaluation [77/100]


INFO:tensorflow:Evaluation [77/100]


INFO:tensorflow:Evaluation [78/100]


INFO:tensorflow:Evaluation [78/100]


INFO:tensorflow:Evaluation [79/100]


INFO:tensorflow:Evaluation [79/100]


INFO:tensorflow:Evaluation [80/100]


INFO:tensorflow:Evaluation [80/100]


INFO:tensorflow:Evaluation [81/100]


INFO:tensorflow:Evaluation [81/100]


INFO:tensorflow:Evaluation [82/100]


INFO:tensorflow:Evaluation [82/100]


INFO:tensorflow:Evaluation [83/100]


INFO:tensorflow:Evaluation [83/100]


INFO:tensorflow:Evaluation [84/100]


INFO:tensorflow:Evaluation [84/100]


INFO:tensorflow:Evaluation [85/100]


INFO:tensorflow:Evaluation [85/100]


INFO:tensorflow:Evaluation [86/100]


INFO:tensorflow:Evaluation [86/100]


INFO:tensorflow:Evaluation [87/100]


INFO:tensorflow:Evaluation [87/100]


INFO:tensorflow:Evaluation [88/100]


INFO:tensorflow:Evaluation [88/100]


INFO:tensorflow:Evaluation [89/100]


INFO:tensorflow:Evaluation [89/100]


INFO:tensorflow:Evaluation [90/100]


INFO:tensorflow:Evaluation [90/100]


INFO:tensorflow:Evaluation [91/100]


INFO:tensorflow:Evaluation [91/100]


INFO:tensorflow:Evaluation [92/100]


INFO:tensorflow:Evaluation [92/100]


INFO:tensorflow:Evaluation [93/100]


INFO:tensorflow:Evaluation [93/100]


INFO:tensorflow:Evaluation [94/100]


INFO:tensorflow:Evaluation [94/100]


INFO:tensorflow:Evaluation [95/100]


INFO:tensorflow:Evaluation [95/100]


INFO:tensorflow:Evaluation [96/100]


INFO:tensorflow:Evaluation [96/100]


INFO:tensorflow:Evaluation [97/100]


INFO:tensorflow:Evaluation [97/100]


INFO:tensorflow:Evaluation [98/100]


INFO:tensorflow:Evaluation [98/100]


INFO:tensorflow:Evaluation [99/100]


INFO:tensorflow:Evaluation [99/100]


INFO:tensorflow:Evaluation [100/100]


INFO:tensorflow:Evaluation [100/100]


INFO:tensorflow:Finished evaluation at 2017-02-28-16:38:02


INFO:tensorflow:Finished evaluation at 2017-02-28-16:38:02


INFO:tensorflow:Saving dict for global step 1: global_step = 1, loss = 190.629


INFO:tensorflow:Saving dict for global step 1: global_step = 1, loss = 190.629






INFO:tensorflow:Validation (step 100): loss = 190.629, global_step = 1


INFO:tensorflow:Validation (step 100): loss = 190.629, global_step = 1


INFO:tensorflow:global_step/sec: 74.662


INFO:tensorflow:global_step/sec: 74.662


INFO:tensorflow:loss = 69.5507, step = 101


INFO:tensorflow:loss = 69.5507, step = 101


INFO:tensorflow:global_step/sec: 137.977


INFO:tensorflow:global_step/sec: 137.977


INFO:tensorflow:loss = 118.545, step = 201


INFO:tensorflow:loss = 118.545, step = 201


INFO:tensorflow:global_step/sec: 300.414


INFO:tensorflow:global_step/sec: 300.414


INFO:tensorflow:loss = 68.7011, step = 301


INFO:tensorflow:loss = 68.7011, step = 301


INFO:tensorflow:global_step/sec: 308.763


INFO:tensorflow:global_step/sec: 308.763


INFO:tensorflow:loss = 119.304, step = 401


INFO:tensorflow:loss = 119.304, step = 401


INFO:tensorflow:global_step/sec: 309.365


INFO:tensorflow:global_step/sec: 309.365


INFO:tensorflow:loss = 59.21, step = 501


INFO:tensorflow:loss = 59.21, step = 501


INFO:tensorflow:global_step/sec: 308.573


INFO:tensorflow:global_step/sec: 308.573


INFO:tensorflow:loss = 113.4, step = 601


INFO:tensorflow:loss = 113.4, step = 601


INFO:tensorflow:global_step/sec: 299.496


INFO:tensorflow:global_step/sec: 299.496


INFO:tensorflow:loss = 51.9584, step = 701


INFO:tensorflow:loss = 51.9584, step = 701


INFO:tensorflow:global_step/sec: 316.617


INFO:tensorflow:global_step/sec: 316.617


INFO:tensorflow:loss = 88.6599, step = 801


INFO:tensorflow:loss = 88.6599, step = 801


INFO:tensorflow:global_step/sec: 304.868


INFO:tensorflow:global_step/sec: 304.868


INFO:tensorflow:loss = 113.445, step = 901


INFO:tensorflow:loss = 113.445, step = 901


INFO:tensorflow:global_step/sec: 307.353


INFO:tensorflow:global_step/sec: 307.353


INFO:tensorflow:loss = 100.481, step = 1001


INFO:tensorflow:loss = 100.481, step = 1001


INFO:tensorflow:global_step/sec: 306.9


INFO:tensorflow:global_step/sec: 306.9


INFO:tensorflow:loss = 76.0784, step = 1101


INFO:tensorflow:loss = 76.0784, step = 1101


INFO:tensorflow:global_step/sec: 157.218


INFO:tensorflow:global_step/sec: 157.218


INFO:tensorflow:loss = 115.233, step = 1201


INFO:tensorflow:loss = 115.233, step = 1201


INFO:tensorflow:global_step/sec: 303.473


INFO:tensorflow:global_step/sec: 303.473


INFO:tensorflow:loss = 94.4111, step = 1301


INFO:tensorflow:loss = 94.4111, step = 1301


INFO:tensorflow:global_step/sec: 305.12


INFO:tensorflow:global_step/sec: 305.12


INFO:tensorflow:loss = 88.9203, step = 1401


INFO:tensorflow:loss = 88.9203, step = 1401


INFO:tensorflow:global_step/sec: 302.649


INFO:tensorflow:global_step/sec: 302.649


INFO:tensorflow:loss = 45.8956, step = 1501


INFO:tensorflow:loss = 45.8956, step = 1501


INFO:tensorflow:global_step/sec: 302.118


INFO:tensorflow:global_step/sec: 302.118


INFO:tensorflow:loss = 79.2905, step = 1601


INFO:tensorflow:loss = 79.2905, step = 1601


INFO:tensorflow:global_step/sec: 305.848


INFO:tensorflow:global_step/sec: 305.848


INFO:tensorflow:loss = 94.6833, step = 1701


INFO:tensorflow:loss = 94.6833, step = 1701


INFO:tensorflow:global_step/sec: 301.63


INFO:tensorflow:global_step/sec: 301.63


INFO:tensorflow:loss = 92.6318, step = 1801


INFO:tensorflow:loss = 92.6318, step = 1801


INFO:tensorflow:global_step/sec: 309.005


INFO:tensorflow:global_step/sec: 309.005


INFO:tensorflow:loss = 141.35, step = 1901


INFO:tensorflow:loss = 141.35, step = 1901


INFO:tensorflow:global_step/sec: 305.967


INFO:tensorflow:global_step/sec: 305.967


INFO:tensorflow:loss = 71.6119, step = 2001


INFO:tensorflow:loss = 71.6119, step = 2001


INFO:tensorflow:global_step/sec: 301.216


INFO:tensorflow:global_step/sec: 301.216


INFO:tensorflow:loss = 78.3532, step = 2101


INFO:tensorflow:loss = 78.3532, step = 2101


INFO:tensorflow:global_step/sec: 299.464


INFO:tensorflow:global_step/sec: 299.464


INFO:tensorflow:loss = 52.048, step = 2201


INFO:tensorflow:loss = 52.048, step = 2201


INFO:tensorflow:global_step/sec: 302.968


INFO:tensorflow:global_step/sec: 302.968


INFO:tensorflow:loss = 56.6322, step = 2301


INFO:tensorflow:loss = 56.6322, step = 2301


INFO:tensorflow:global_step/sec: 303.306


INFO:tensorflow:global_step/sec: 303.306


INFO:tensorflow:loss = 148.738, step = 2401


INFO:tensorflow:loss = 148.738, step = 2401


INFO:tensorflow:Saving checkpoints for 2500 into ./taxi_trained/train/model.ckpt.


INFO:tensorflow:Saving checkpoints for 2500 into ./taxi_trained/train/model.ckpt.


INFO:tensorflow:Loss for final step: 93.5237.


INFO:tensorflow:Loss for final step: 93.5237.


Instructions for updating:
Please switch to tf.summary.scalar. Note that tf.summary.scalar uses the node name instead of the tag. This means that TensorFlow will automatically de-duplicate summary names based on the scope they are created in. Also, passing a tensor or list of tags to a scalar summary op is no longer supported.


Instructions for updating:
Please switch to tf.summary.scalar. Note that tf.summary.scalar uses the node name instead of the tag. This means that TensorFlow will automatically de-duplicate summary names based on the scope they are created in. Also, passing a tensor or list of tags to a scalar summary op is no longer supported.


INFO:tensorflow:Starting evaluation at 2017-02-28-16:38:11


INFO:tensorflow:Starting evaluation at 2017-02-28-16:38:11


INFO:tensorflow:Evaluation [1/100]


INFO:tensorflow:Evaluation [1/100]


INFO:tensorflow:Evaluation [2/100]


INFO:tensorflow:Evaluation [2/100]


INFO:tensorflow:Evaluation [3/100]


INFO:tensorflow:Evaluation [3/100]


INFO:tensorflow:Evaluation [4/100]


INFO:tensorflow:Evaluation [4/100]


INFO:tensorflow:Evaluation [5/100]


INFO:tensorflow:Evaluation [5/100]


INFO:tensorflow:Evaluation [6/100]


INFO:tensorflow:Evaluation [6/100]


INFO:tensorflow:Evaluation [7/100]


INFO:tensorflow:Evaluation [7/100]


INFO:tensorflow:Evaluation [8/100]


INFO:tensorflow:Evaluation [8/100]


INFO:tensorflow:Evaluation [9/100]


INFO:tensorflow:Evaluation [9/100]


INFO:tensorflow:Evaluation [10/100]


INFO:tensorflow:Evaluation [10/100]


INFO:tensorflow:Evaluation [11/100]


INFO:tensorflow:Evaluation [11/100]


INFO:tensorflow:Evaluation [12/100]


INFO:tensorflow:Evaluation [12/100]


INFO:tensorflow:Evaluation [13/100]


INFO:tensorflow:Evaluation [13/100]


INFO:tensorflow:Evaluation [14/100]


INFO:tensorflow:Evaluation [14/100]


INFO:tensorflow:Evaluation [15/100]


INFO:tensorflow:Evaluation [15/100]


INFO:tensorflow:Evaluation [16/100]


INFO:tensorflow:Evaluation [16/100]


INFO:tensorflow:Evaluation [17/100]


INFO:tensorflow:Evaluation [17/100]


INFO:tensorflow:Evaluation [18/100]


INFO:tensorflow:Evaluation [18/100]


INFO:tensorflow:Evaluation [19/100]


INFO:tensorflow:Evaluation [19/100]


INFO:tensorflow:Evaluation [20/100]


INFO:tensorflow:Evaluation [20/100]


INFO:tensorflow:Evaluation [21/100]


INFO:tensorflow:Evaluation [21/100]


INFO:tensorflow:Evaluation [22/100]


INFO:tensorflow:Evaluation [22/100]


INFO:tensorflow:Evaluation [23/100]


INFO:tensorflow:Evaluation [23/100]


INFO:tensorflow:Evaluation [24/100]


INFO:tensorflow:Evaluation [24/100]


INFO:tensorflow:Evaluation [25/100]


INFO:tensorflow:Evaluation [25/100]


INFO:tensorflow:Evaluation [26/100]


INFO:tensorflow:Evaluation [26/100]


INFO:tensorflow:Evaluation [27/100]


INFO:tensorflow:Evaluation [27/100]


INFO:tensorflow:Evaluation [28/100]


INFO:tensorflow:Evaluation [28/100]


INFO:tensorflow:Evaluation [29/100]


INFO:tensorflow:Evaluation [29/100]


INFO:tensorflow:Evaluation [30/100]


INFO:tensorflow:Evaluation [30/100]


INFO:tensorflow:Evaluation [31/100]


INFO:tensorflow:Evaluation [31/100]


INFO:tensorflow:Evaluation [32/100]


INFO:tensorflow:Evaluation [32/100]


INFO:tensorflow:Evaluation [33/100]


INFO:tensorflow:Evaluation [33/100]


INFO:tensorflow:Evaluation [34/100]


INFO:tensorflow:Evaluation [34/100]


INFO:tensorflow:Evaluation [35/100]


INFO:tensorflow:Evaluation [35/100]


INFO:tensorflow:Evaluation [36/100]


INFO:tensorflow:Evaluation [36/100]


INFO:tensorflow:Evaluation [37/100]


INFO:tensorflow:Evaluation [37/100]


INFO:tensorflow:Evaluation [38/100]


INFO:tensorflow:Evaluation [38/100]


INFO:tensorflow:Evaluation [39/100]


INFO:tensorflow:Evaluation [39/100]


INFO:tensorflow:Evaluation [40/100]


INFO:tensorflow:Evaluation [40/100]


INFO:tensorflow:Evaluation [41/100]


INFO:tensorflow:Evaluation [41/100]


INFO:tensorflow:Evaluation [42/100]


INFO:tensorflow:Evaluation [42/100]


INFO:tensorflow:Evaluation [43/100]


INFO:tensorflow:Evaluation [43/100]


INFO:tensorflow:Evaluation [44/100]


INFO:tensorflow:Evaluation [44/100]


INFO:tensorflow:Evaluation [45/100]


INFO:tensorflow:Evaluation [45/100]


INFO:tensorflow:Evaluation [46/100]


INFO:tensorflow:Evaluation [46/100]


INFO:tensorflow:Evaluation [47/100]


INFO:tensorflow:Evaluation [47/100]


INFO:tensorflow:Evaluation [48/100]


INFO:tensorflow:Evaluation [48/100]


INFO:tensorflow:Evaluation [49/100]


INFO:tensorflow:Evaluation [49/100]


INFO:tensorflow:Evaluation [50/100]


INFO:tensorflow:Evaluation [50/100]


INFO:tensorflow:Evaluation [51/100]


INFO:tensorflow:Evaluation [51/100]


INFO:tensorflow:Evaluation [52/100]


INFO:tensorflow:Evaluation [52/100]


INFO:tensorflow:Evaluation [53/100]


INFO:tensorflow:Evaluation [53/100]


INFO:tensorflow:Evaluation [54/100]


INFO:tensorflow:Evaluation [54/100]


INFO:tensorflow:Evaluation [55/100]


INFO:tensorflow:Evaluation [55/100]


INFO:tensorflow:Evaluation [56/100]


INFO:tensorflow:Evaluation [56/100]


INFO:tensorflow:Evaluation [57/100]


INFO:tensorflow:Evaluation [57/100]


INFO:tensorflow:Evaluation [58/100]


INFO:tensorflow:Evaluation [58/100]


INFO:tensorflow:Evaluation [59/100]


INFO:tensorflow:Evaluation [59/100]


INFO:tensorflow:Evaluation [60/100]


INFO:tensorflow:Evaluation [60/100]


INFO:tensorflow:Evaluation [61/100]


INFO:tensorflow:Evaluation [61/100]


INFO:tensorflow:Evaluation [62/100]


INFO:tensorflow:Evaluation [62/100]


INFO:tensorflow:Evaluation [63/100]


INFO:tensorflow:Evaluation [63/100]


INFO:tensorflow:Evaluation [64/100]


INFO:tensorflow:Evaluation [64/100]


INFO:tensorflow:Evaluation [65/100]


INFO:tensorflow:Evaluation [65/100]


INFO:tensorflow:Evaluation [66/100]


INFO:tensorflow:Evaluation [66/100]


INFO:tensorflow:Evaluation [67/100]


INFO:tensorflow:Evaluation [67/100]


INFO:tensorflow:Evaluation [68/100]


INFO:tensorflow:Evaluation [68/100]


INFO:tensorflow:Evaluation [69/100]


INFO:tensorflow:Evaluation [69/100]


INFO:tensorflow:Evaluation [70/100]


INFO:tensorflow:Evaluation [70/100]


INFO:tensorflow:Evaluation [71/100]


INFO:tensorflow:Evaluation [71/100]


INFO:tensorflow:Evaluation [72/100]


INFO:tensorflow:Evaluation [72/100]


INFO:tensorflow:Evaluation [73/100]


INFO:tensorflow:Evaluation [73/100]


INFO:tensorflow:Evaluation [74/100]


INFO:tensorflow:Evaluation [74/100]


INFO:tensorflow:Evaluation [75/100]


INFO:tensorflow:Evaluation [75/100]


INFO:tensorflow:Evaluation [76/100]


INFO:tensorflow:Evaluation [76/100]


INFO:tensorflow:Evaluation [77/100]


INFO:tensorflow:Evaluation [77/100]


INFO:tensorflow:Evaluation [78/100]


INFO:tensorflow:Evaluation [78/100]


INFO:tensorflow:Evaluation [79/100]


INFO:tensorflow:Evaluation [79/100]


INFO:tensorflow:Evaluation [80/100]


INFO:tensorflow:Evaluation [80/100]


INFO:tensorflow:Evaluation [81/100]


INFO:tensorflow:Evaluation [81/100]


INFO:tensorflow:Evaluation [82/100]


INFO:tensorflow:Evaluation [82/100]


INFO:tensorflow:Evaluation [83/100]


INFO:tensorflow:Evaluation [83/100]


INFO:tensorflow:Evaluation [84/100]


INFO:tensorflow:Evaluation [84/100]


INFO:tensorflow:Evaluation [85/100]


INFO:tensorflow:Evaluation [85/100]


INFO:tensorflow:Evaluation [86/100]


INFO:tensorflow:Evaluation [86/100]


INFO:tensorflow:Evaluation [87/100]


INFO:tensorflow:Evaluation [87/100]


INFO:tensorflow:Evaluation [88/100]


INFO:tensorflow:Evaluation [88/100]


INFO:tensorflow:Evaluation [89/100]


INFO:tensorflow:Evaluation [89/100]


INFO:tensorflow:Evaluation [90/100]


INFO:tensorflow:Evaluation [90/100]


INFO:tensorflow:Evaluation [91/100]


INFO:tensorflow:Evaluation [91/100]


INFO:tensorflow:Evaluation [92/100]


INFO:tensorflow:Evaluation [92/100]


INFO:tensorflow:Evaluation [93/100]


INFO:tensorflow:Evaluation [93/100]


INFO:tensorflow:Evaluation [94/100]


INFO:tensorflow:Evaluation [94/100]


INFO:tensorflow:Evaluation [95/100]


INFO:tensorflow:Evaluation [95/100]


INFO:tensorflow:Evaluation [96/100]


INFO:tensorflow:Evaluation [96/100]


INFO:tensorflow:Evaluation [97/100]


INFO:tensorflow:Evaluation [97/100]


INFO:tensorflow:Evaluation [98/100]


INFO:tensorflow:Evaluation [98/100]


INFO:tensorflow:Evaluation [99/100]


INFO:tensorflow:Evaluation [99/100]


INFO:tensorflow:Evaluation [100/100]


INFO:tensorflow:Evaluation [100/100]


INFO:tensorflow:Finished evaluation at 2017-02-28-16:38:12


INFO:tensorflow:Finished evaluation at 2017-02-28-16:38:12


INFO:tensorflow:Saving dict for global step 2500: global_step = 2500, loss = 84.0451


INFO:tensorflow:Saving dict for global step 2500: global_step = 2500, loss = 84.0451






INFO:tensorflow:Assets added to graph.


INFO:tensorflow:Assets added to graph.


INFO:tensorflow:No assets to write.


INFO:tensorflow:No assets to write.


INFO:tensorflow:SavedModel written to: ./taxi_trained/train/export/intermediate_evaluation_models/1488299892834/saved_model.pb


INFO:tensorflow:SavedModel written to: ./taxi_trained/train/export/intermediate_evaluation_models/1488299892834/saved_model.pb


INFO:tensorflow:Assets added to graph.


INFO:tensorflow:Assets added to graph.


INFO:tensorflow:No assets to write.


INFO:tensorflow:No assets to write.


INFO:tensorflow:SavedModel written to: ./taxi_trained/train/export/intermediate_prediction_models/1488299893441/saved_model.pb


INFO:tensorflow:SavedModel written to: ./taxi_trained/train/export/intermediate_prediction_models/1488299893441/saved_model.pb


Local training done.


In [23]:
!ls taxi_trained

evaluation_model  model  train


In [25]:
sd.local_predict(
  training_ouput_dir=os.path.join(OUTDIR, 'taxi_trained'),
  data=['Sun,0,-73.984685,40.769262,-73.991065,40.728145,5.0,row_01',
        'Mon,12,-74.006927,40.739993,-73.950025,40.773403,1.0,row_02',
        'Tues,8,-73.977345,40.779387,-73.97615,40.778867,1.0,row_03',
        'Fri,17,-73.97136,40.794413,-73.99623,40.74524,1.0,row_04',
        'Sun,0,-73.997642,40.763853,-73.99485,40.750282,1.0,row_05',
        'Sun,0,-74.004538,40.742202,-73.955823,40.773485,1.0,row_06',
        'Sun,0,-74.000589,40.73731,-73.985902,40.692725,1.0,row_07',
        'Sun,0,-73.995432,40.72114,-73.992403,40.719745,1.0,row_08',
        'Sun,0,-73.945033,40.779203,-73.952037,40.766802,1.0,row_09',
        'Sun,0,-73.968592,40.693262,-73.99231,40.694317,1.0,row_10']
)

Starting local prediction.
Local prediction done.


Unnamed: 0,key_from_input,predicted_target
0,row_01,11.724672
1,row_02,11.658803
2,row_03,11.299863
3,row_04,11.72335
4,row_05,11.683121
5,row_06,11.679772
6,row_07,11.680917
7,row_08,11.680292
8,row_09,11.678738
9,row_10,11.677299


In [32]:
shutil.rmtree(os.path.join(OUTDIR,'batch_predict'), ignore_errors=True)
sd.local_batch_predict(
  training_ouput_dir=os.path.join(OUTDIR, 'taxi_trained'),
  prediction_input_file=os.path.join(INDIR, 'valid*'),
  output_dir=os.path.join(OUTDIR, 'batch_predict'),
  output_format='csv',
  mode='evaluation'   # mode=prediction if you don't have target column
)

Starting local batch prediction.
Local batch prediction done.


In [34]:
!head $OUTDIR/batch_predict/predictions-00000-*

2009-04-28 01:01:23.000000-73.980640.750940.7533-73.9696,11.4058189392,4.90000009537
2009-10-27 02:46:24.000000-73.980840.758840.7605-73.9931,11.1602220535,4.90000009537
2010-10-05 02:09:00.000000-74.000140.730240.7526-74.0039,11.1707496643,7.69999980927
2012-05-08 03:50:00.000000-74.003840.73840.7591-73.9904,11.6874761581,6.90000009537
2012-05-08 03:50:00.000000-73.988840.736540.7193-73.9436,11.727016449,12.1000003815
2012-12-04 06:52:02.000000-73.982240.731840.7725-73.951,11.7066316605,11.0
2012-11-20 06:53:30.000000-73.94740.771940.7554-73.9759,11.7086172104,9.0
2014-05-13 07:33:52.000000-73.962340.79540.7853-73.973,10.7292032242,5.0
2011-05-17 07:53:55.000000-73.970540.752340.7467-73.9817,10.7170114517,5.30000019073
2012-10-09 07:47:46.000000-73.973940.747740.7256-74.0055,10.7183790207,13.5


<h2> Cloud preprocessing and training </h2>

In the above cells, change INDIR and OUTDIR to be GCS.

Change the calls from local_predict to cloud_predict. That's it.



Copyright 2016 Google Inc. Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License