# Introduction to Feature Engineering

**Learning Objectives**
  * Improve the accuracy of a model by using feature engineering
  * Understand there's two places to do feature engineering in Tensorflow
    1. In the input functions
    2. Using the `tf.feature_column` module

We'll illustrate feature engineering using a new dataset and a new task. 

**Task**: To estimate the value of a house.

**Dataset**: The data is based on 1990 census data from California. This data is at the city block level, so these features reflect the total number of rooms in that block, or the total number of people who live on that block, respectively.

## Set Up
In this first cell, we'll load the necessary libraries.

In [1]:
import math
import shutil
import numpy as np
import pandas as pd
import tensorflow as tf

print(tf.__version__)

  from ._conv import register_converters as _register_converters


1.12.0


Next, we'll load our data set.

In [2]:
df = pd.read_csv("https://storage.googleapis.com/ml_universities/california_housing_train.csv", sep=",")

## Examine Data

It's a good idea to get to know your data a little bit before you work with it. `df.head()` prints the first 5 rows of a dataframe.

Note median_income is measure in 10s of thousands

In [3]:
df.head()

Unnamed: 0,longitude,latitude,housing_median_age,total_rooms,total_bedrooms,population,households,median_income,median_house_value
0,-114.31,34.19,15.0,5612.0,1283.0,1015.0,472.0,1.4936,66900.0
1,-114.47,34.4,19.0,7650.0,1901.0,1129.0,463.0,1.82,80100.0
2,-114.56,33.69,17.0,720.0,174.0,333.0,117.0,1.6509,85700.0
3,-114.57,33.64,14.0,1501.0,337.0,515.0,226.0,3.1917,73400.0
4,-114.57,33.57,20.0,1454.0,326.0,624.0,262.0,1.925,65500.0


It's also useful to understand the distribution of each column. `df.describe()` will calculate the count, mean, standard deviation, max, min, and various quantiles.

In [4]:
df.describe()

Unnamed: 0,longitude,latitude,housing_median_age,total_rooms,total_bedrooms,population,households,median_income,median_house_value
count,17000.0,17000.0,17000.0,17000.0,17000.0,17000.0,17000.0,17000.0,17000.0
mean,-119.562108,35.625225,28.589353,2643.664412,539.410824,1429.573941,501.221941,3.883578,207300.912353
std,2.005166,2.13734,12.586937,2179.947071,421.499452,1147.852959,384.520841,1.908157,115983.764387
min,-124.35,32.54,1.0,2.0,1.0,3.0,1.0,0.4999,14999.0
25%,-121.79,33.93,18.0,1462.0,297.0,790.0,282.0,2.566375,119400.0
50%,-118.49,34.25,29.0,2127.0,434.0,1167.0,409.0,3.5446,180400.0
75%,-118.0,37.72,37.0,3151.25,648.25,1721.0,605.25,4.767,265000.0
max,-114.31,41.95,52.0,37937.0,6445.0,35682.0,6082.0,15.0001,500001.0


## Create Training and Evaluation Datasets

If your data is all in memory, a quick and easy way to create a train/evaluation split is using a random number generator. But be sure to seed the random generator so that you get the same split every time!

In [5]:
np.random.seed(seed=1) # to ensure reproducible split
msk = np.random.rand(len(df)) < 0.8
traindf = df[msk]
evaldf = df[~msk]

In [6]:
traindf.describe()

Unnamed: 0,longitude,latitude,housing_median_age,total_rooms,total_bedrooms,population,households,median_income,median_house_value
count,13612.0,13612.0,13612.0,13612.0,13612.0,13612.0,13612.0,13612.0,13612.0
mean,-119.553292,35.616189,28.665736,2632.034308,536.032912,1423.30025,498.113797,3.902213,207986.538863
std,2.00211,2.135425,12.594345,2163.255535,416.662192,1126.02708,379.281269,1.924587,116514.341708
min,-124.3,32.54,1.0,8.0,1.0,3.0,1.0,0.4999,14999.0
25%,-121.77,33.93,18.0,1461.0,296.0,787.0,281.0,2.574275,119600.0
50%,-118.48,34.24,29.0,2117.5,432.0,1168.0,408.0,3.5519,180800.0
75%,-118.0,37.71,37.0,3146.0,644.25,1715.0,602.0,4.79475,266300.0
max,-114.31,41.95,52.0,37937.0,5471.0,35682.0,5189.0,15.0001,500001.0


In [7]:
evaldf.describe()

Unnamed: 0,longitude,latitude,housing_median_age,total_rooms,total_bedrooms,population,households,median_income,median_house_value
count,3388.0,3388.0,3388.0,3388.0,3388.0,3388.0,3388.0,3388.0,3388.0
mean,-119.59753,35.661526,28.282468,2690.390791,552.98229,1454.779811,513.709563,3.808708,204546.264168
std,2.017306,2.144947,12.554301,2245.478132,440.201298,1231.504194,404.706736,1.839038,113802.453246
min,-124.35,32.55,2.0,2.0,2.0,6.0,2.0,0.4999,22500.0
25%,-121.83,33.93,18.0,1467.0,300.0,796.0,283.75,2.5398,118800.0
50%,-118.58,34.28,28.0,2171.5,441.0,1160.0,414.0,3.5156,178650.0
75%,-118.0,37.74,37.0,3167.25,667.0,1756.25,615.25,4.667375,258825.0
max,-114.61,41.86,52.0,32627.0,6445.0,28566.0,6082.0,15.0001,500001.0


## Input Functions
Read from Pandas dataframe. Same as 03_tensorflow/c_estimator.ipynb

In [8]:
LABEL_SCALE = 1
def train_input_fn(df, batch_size=128):
    #1. Convert dataframe into correct (features,label) format for Estimator API
    dataset = tf.data.Dataset.from_tensor_slices((dict(df), df['median_house_value']/LABEL_SCALE))
    
    #2. Shuffle, repeat, and batch the examples.
    dataset = dataset.shuffle(1000).repeat().batch(batch_size)
   
    return dataset

def eval_input_fn(df, batch_size=128):
    #1. Convert dataframe into correct (features,label) format for Estimator API
    dataset = tf.data.Dataset.from_tensor_slices((dict(df), df['median_house_value']/LABEL_SCALE))

    #2.Batch the examples.
    dataset = dataset.batch(batch_size)
   
    return dataset

## Define Feature Columns

Simply pass all features through unchanged using `tf.feature_column.numeric_column()`. No feature engineering. 

In [9]:
feature_names = list(df.columns.values)[:-1] 
print(feature_names)
feat_cols = [tf.feature_column.numeric_column(feature_name)
             for feature_name in feature_names]

['longitude', 'latitude', 'housing_median_age', 'total_rooms', 'total_bedrooms', 'population', 'households', 'median_income']


## Train and Evaluate

Here we introduce two new customizations:

1. Setting Learning Rate
   - Previously when using premade estimators we have accepted the default optimizer type and learning rate. However in practice we almost always want to tune our learning rate explicitly. To do so we specify the `optimizer` argument when initializing an estimator. We can specify any of the optimizers in the `tf.train` package such as [AdamOptimizer](https://www.tensorflow.org/api_docs/python/tf/train/AdamOptimizer) and [AdagradOptimizer](https://www.tensorflow.org/api_docs/python/tf/train/AdagradOptimizer). The first argument when initializing each is the learning rate.


2. Custom Evaluation Metric (RMSE)
   - The default evaluation metric `average_loss` is MSE, but we want RMSE. Previously we just took the square root of the final `average_loss`. However it would be better if we could calculate RMSE not just at the end, but for every intermediate checkpoint and plot the change over time in TensorBoard. [`tf.contrib.estimator.add_metrics()`](https://www.tensorflow.org/api_docs/python/tf/contrib/estimator/add_metrics) allows us to do this. We wrap our estimator with it, and provide a custom evaluation function.

In [10]:
# Create estimator train and evaluate function
def train_and_evaluate(output_dir, num_train_steps, model_type, learning_rate=.001):
  if model_type == 'Linear':
    estimator = tf.estimator.LinearRegressor(
        model_dir = output_dir, 
        feature_columns = feat_cols,
        optimizer = tf.train.AdamOptimizer(learning_rate),
        config = tf.estimator.RunConfig(
          tf_random_seed=1, # for reproducibility
          save_checkpoints_steps=max(10,num_train_steps//10) # checkpoint every N steps
        )
    ) 
    
  elif model_type == 'DNN':
    estimator = tf.estimator.DNNRegressor(
        model_dir = output_dir, 
        feature_columns = feat_cols,
        hidden_units = [10,10],
        optimizer = tf.train.AdamOptimizer(learning_rate),
        config = tf.estimator.RunConfig(
          tf_random_seed=1, # for reproducibility
          save_checkpoints_steps=max(10,num_train_steps//10) # checkpoint every N steps
        )
    ) 
    
  else:
    print('Invalid Model Type: Use "Linear" or "DNN"')
    return
  
  # Add custom evaluation metric
  def my_rmse(labels, predictions):
    pred_values = tf.squeeze(tf.cast(predictions['predictions'],tf.float64),axis=-1)
    return {'rmse': tf.metrics.root_mean_squared_error(labels, pred_values)}
  estimator = tf.contrib.estimator.add_metrics(estimator, my_rmse)  
                                          
  train_spec = tf.estimator.TrainSpec(input_fn = lambda:train_input_fn(traindf), 
                                      max_steps = num_train_steps)
  eval_spec = tf.estimator.EvalSpec(input_fn = lambda:eval_input_fn(evaldf), 
                                    steps = None, 
                                    start_delay_secs = 1, # start evaluating after N seconds, 
                                    throttle_secs = 1)  # evaluate every N seconds
  tf.estimator.train_and_evaluate(estimator, train_spec, eval_spec)
  return estimator

In [11]:
# Launch tensorboard
from google.datalab.ml import TensorBoard

OUTDIR = './trained_model'
TensorBoard().start(OUTDIR)

91396

## Results (w/o Feature Engineering)
Steps: 20000, Batch Size: 128
- Baseline (always predict mean)
  - RMSE: 114K
- Linear Regressor
  - RMSE: 170K (LR: 0.001)
  - RMSE: 75K (LR: 10)
- DNN w/ hidden_units = [10,10]
  - RMSE: 99K (LR: 0.001)
  - RMSE: 68K (LR: 0.1)
  
Note the reported RMSE is for the best evaluation checkpoint, not neccesarily the last one.

You can see the importance of tuning learning rate. Using the default learning rate of 0.001 the linear regression model doesn't even beat our baseline. Intuitively it makes sense the we need a high learning rate because the average magnitude of our labels is 200,000 so we'll need large weights. Learning large weights with a step size of 0.001 would take a very long time.
  


In [None]:
%%time
# Run the model
shutil.rmtree(OUTDIR, ignore_errors = True)
estimator = train_and_evaluate(OUTDIR, 20000,'Linkear',learning_rate=0.1)

## Predict
Let's see what the predicted values were for the records in our evaluation dataset.

In [None]:
predictions = estimator.predict(input_fn=lambda:eval_input_fn(evaldf))
for items in predictions:
  print(items)

# Introducing Feature Engineering

Now we have a baseline RMSE of 68K without using feature engineering to beat. 

Here's where we apply our human intution to help out our model. What features might be useful in predicting housing price?

Location comes to mind. We have latitude and longitude, but it's not in a very learnable format. Let's make a grid.

Let's also divide by number of house hold

## TODO:
1. First just add number of house hold divide by features
2. Then just lat/lon grid
3. Then both

## Training and Evaluation

In this exercise, we'll be trying to predict `median_house_value` It will be our label (sometimes also called a target).

We'll modify the feature_cols and input function to represent the features you want to use.

We divide `total_rooms` by `households` to get `avg_rooms_per_house` which we excect to positively correlate with `median_house_value`. 

We also divide `population` by `total_rooms` to get `avg_persons_per_room` which we expect to negatively correlate with `median_house_value`.

In [11]:
def train_input_fn(df, batch_size=128):
    #1. Add engineered features
    df = add_more_features(df)
    
    #2. Convert dataframe into correct (features,label) format for Estimator API
    dataset = tf.data.Dataset.from_tensor_slices((dict(df), df['median_house_value']))
    
    #3. Shuffle, repeat, and batch the examples.
    dataset = dataset.shuffle(1000).repeat().batch(batch_size)
   
    return dataset

def eval_input_fn(df, batch_size=128):
    #1. Add engineered features
    df = add_more_features(df)
    
    #2. Convert dataframe into correct (features,label) format for Estimator API
    dataset = tf.data.Dataset.from_tensor_slices((dict(df), df['median_house_value']))

    #3.Batch the examples.
    dataset = dataset.batch(batch_size)
   
    return dataset

In [12]:
def add_more_features(df):
  df['avg_rooms_per_house'] = df['total_rooms'] / df['households'] #expect positive correlation
  df['avg_bedrooms_per_house'] = df['total_bedrooms'] / df['households'] 
  df['avg_persons_per_room'] = df['population'] / df['total_rooms'] #expect negative correlation
  return df

In [13]:
# Define your feature columns
bucketized_longitude = tf.feature_column.bucketized_column(tf.feature_column.numeric_column('longitude'), boundaries = np.arange(-124, -114, 1).tolist())
bucketized_latitude = tf.feature_column.bucketized_column(tf.feature_column.numeric_column('latitude'), boundaries = np.arange(32.0, 42, 1).tolist())
grid = tf.feature_column.crossed_column([bucketized_longitude,bucketized_latitude],200)

feat_cols = [
    tf.feature_column.numeric_column('housing_median_age'),
    tf.feature_column.numeric_column('longitude'),
    tf.feature_column.numeric_column('latitude'),
    #tf.feature_column.numeric_column('population'),
    tf.feature_column.embedding_column(bucketized_longitude,5),
    tf.feature_column.embedding_column(bucketized_latitude,5),
    tf.feature_column.embedding_column(grid,10),
    tf.feature_column.numeric_column('avg_bedrooms_per_house'),
    tf.feature_column.numeric_column('avg_rooms_per_house'),
    tf.feature_column.numeric_column('avg_persons_per_room'),
    tf.feature_column.numeric_column('median_income')
]

In [14]:
# Launch tensorboard
from google.datalab.ml import TensorBoard

OUTDIR = './trained_model'
TensorBoard().start(OUTDIR)

3409

## Results (w/ Feature Engineering)

No improvement with just blah.
Steps: 20000, Batch Size: 128, Dividing Features by block
- Linear Regressor
  - RMSE: 170K (LR: 0.001)
  - RMSE: 75K (LR: 10)
- DNN w/ hidden_units = [10,10]
  - RMSE: 99K (LR: 0.001)
  - RMSE: 68K (LR: 0.1)
  
Note the reported RMSE is for the best evaluation checkpoint, not neccesarily the last one.

You can see the importance of tuning learning rate. Using the default learning rate of 0.001 the linear regression model doesn't even beat our baseline. Intuitively it makes sense the we need a high learning rate because the average magnitude of our labels is 200,000 so we'll need large weights. Learning large weights with a step size of 0.001 would take a very long time.
  


In [15]:
%%time
# Run the model
shutil.rmtree(OUTDIR, ignore_errors = True)

estimator = train_and_evaluate('./trained_model/linear_default', 20000,'Linear')
estimator = train_and_evaluate('./trained_model/linear_lr', 20000,'Linear',10)
estimator = train_and_evaluate('./trained_model/dnn_default', 20000,'DNN')
estimator = train_and_evaluate('./trained_model/dnn_lr', 20000,'DNN',0.1)

INFO:tensorflow:Using config: {'_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7fe11d265048>, '_task_id': 0, '_keep_checkpoint_max': 5, '_num_ps_replicas': 0, '_global_id_in_cluster': 0, '_eval_distribute': None, '_keep_checkpoint_every_n_hours': 10000, '_master': '', '_is_chief': True, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_num_worker_replicas': 1, '_protocol': None, '_tf_random_seed': 1, '_save_checkpoints_secs': None, '_device_fn': None, '_save_checkpoints_steps': 2000, '_task_type': 'worker', '_log_step_count_steps': 100, '_evaluation_master': '', '_train_distribute': None, '_save_summary_steps': 100, '_model_dir': './trained_model/linear_default', '_experimental_distribute': None}
INFO:tensorflow:Using config: {'_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7fe11d265f98>, '_task_id': 0, '_keep_c

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  from ipykernel import kernelapp as app
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  app.launch_new_instance()
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy


INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Saving checkpoints for 0 into ./trained_model/linear_default/model.ckpt.
INFO:tensorflow:loss = 3202315700000.0, step = 1
INFO:tensorflow:global_step/sec: 109.789
INFO:tensorflow:loss = 8733471500000.0, step = 101 (0.913 sec)
INFO:tensorflow:global_step/sec: 131.897
INFO:tensorflow:loss = 10859849000000.0, step = 201 (0.759 sec)
INFO:tensorflow:global_step/sec: 129.993
INFO:tensorflow:loss = 8457152400000.0, step = 301 (0.769 sec)
INFO:tensorflow:global_step/sec: 137.217
INFO:tensorflow:loss = 7909246000000.0, step = 401 (0.728 sec)
INFO:tensorflow:global_step/sec: 137.084
INFO:tensorflow:loss = 4943359000000.0, step = 501 (0.729 sec)
INFO:tensorflow:global_step/sec: 135.744
INFO:tensorflow:loss = 2933652500000.0, step = 601 (0.737 

INFO:tensorflow:global_step/sec: 149.668
INFO:tensorflow:loss = 10455095000000.0, step = 5901 (0.669 sec)
INFO:tensorflow:Saving checkpoints for 6000 into ./trained_model/linear_default/model.ckpt.
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Starting evaluation at 2019-01-02-23:07:21
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from ./trained_model/linear_default/model.ckpt-6000
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Finished evaluation at 2019-01-02-23:07:22
INFO:tensorflow:Saving dict for global step 6000: average_loss = 54047910000.0, global_step = 6000, label/mean = 204546.2, loss = 6782012000000.0, prediction/mean = 1786.9614, rmse = 232482.06
INFO:tensorflow:Saving 'checkpoint_path' summary for global step 6000: ./trained_model/linear_default/model.ckpt-6000
INFO:tensorflow:gl

INFO:tensorflow:global_step/sec: 131.289
INFO:tensorflow:loss = 5827243000000.0, step = 11201 (0.763 sec)
INFO:tensorflow:global_step/sec: 136.425
INFO:tensorflow:loss = 6883742000000.0, step = 11301 (0.733 sec)
INFO:tensorflow:global_step/sec: 133.74
INFO:tensorflow:loss = 6713432300000.0, step = 11401 (0.747 sec)
INFO:tensorflow:global_step/sec: 147.597
INFO:tensorflow:loss = 7856056000000.0, step = 11501 (0.678 sec)
INFO:tensorflow:global_step/sec: 146.116
INFO:tensorflow:loss = 4670694000000.0, step = 11601 (0.685 sec)
INFO:tensorflow:global_step/sec: 150.471
INFO:tensorflow:loss = 4176750300000.0, step = 11701 (0.670 sec)
INFO:tensorflow:global_step/sec: 126.311
INFO:tensorflow:loss = 8909448000000.0, step = 11801 (0.787 sec)
INFO:tensorflow:global_step/sec: 130.156
INFO:tensorflow:loss = 11466189000000.0, step = 11901 (0.768 sec)
INFO:tensorflow:Saving checkpoints for 12000 into ./trained_model/linear_default/model.ckpt.
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Calling m

INFO:tensorflow:global_step/sec: 119.716
INFO:tensorflow:loss = 6993227000000.0, step = 16501 (0.835 sec)
INFO:tensorflow:global_step/sec: 138.455
INFO:tensorflow:loss = 6021540000000.0, step = 16601 (0.721 sec)
INFO:tensorflow:global_step/sec: 142.591
INFO:tensorflow:loss = 5651534000000.0, step = 16701 (0.700 sec)
INFO:tensorflow:global_step/sec: 152.074
INFO:tensorflow:loss = 7145743500000.0, step = 16801 (0.659 sec)
INFO:tensorflow:global_step/sec: 149.679
INFO:tensorflow:loss = 7981561300000.0, step = 16901 (0.667 sec)
INFO:tensorflow:global_step/sec: 144.163
INFO:tensorflow:loss = 8506992000000.0, step = 17001 (0.693 sec)
INFO:tensorflow:global_step/sec: 150.381
INFO:tensorflow:loss = 8701414000000.0, step = 17101 (0.665 sec)
INFO:tensorflow:global_step/sec: 149.038
INFO:tensorflow:loss = 7259306300000.0, step = 17201 (0.671 sec)
INFO:tensorflow:global_step/sec: 128.04
INFO:tensorflow:loss = 3603578700000.0, step = 17301 (0.781 sec)
INFO:tensorflow:global_step/sec: 120.36
INFO:te

INFO:tensorflow:loss = 1430511200000.0, step = 301 (0.695 sec)
INFO:tensorflow:global_step/sec: 132.181
INFO:tensorflow:loss = 1262571000000.0, step = 401 (0.751 sec)
INFO:tensorflow:global_step/sec: 143.95
INFO:tensorflow:loss = 874754200000.0, step = 501 (0.693 sec)
INFO:tensorflow:global_step/sec: 151.14
INFO:tensorflow:loss = 622370200000.0, step = 601 (0.663 sec)
INFO:tensorflow:global_step/sec: 152.412
INFO:tensorflow:loss = 764148840000.0, step = 701 (0.655 sec)
INFO:tensorflow:global_step/sec: 114.88
INFO:tensorflow:loss = 829398840000.0, step = 801 (0.871 sec)
INFO:tensorflow:global_step/sec: 130.107
INFO:tensorflow:loss = 1365506800000.0, step = 901 (0.771 sec)
INFO:tensorflow:global_step/sec: 134.322
INFO:tensorflow:loss = 1546966000000.0, step = 1001 (0.742 sec)
INFO:tensorflow:global_step/sec: 151.438
INFO:tensorflow:loss = 1252975200000.0, step = 1101 (0.660 sec)
INFO:tensorflow:global_step/sec: 157.876
INFO:tensorflow:loss = 682463100000.0, step = 1201 (0.635 sec)
INFO:t

INFO:tensorflow:Saving 'checkpoint_path' summary for global step 6000: ./trained_model/linear_lr/model.ckpt-6000
INFO:tensorflow:global_step/sec: 41.8673
INFO:tensorflow:loss = 1199964400000.0, step = 6001 (2.387 sec)
INFO:tensorflow:global_step/sec: 163.283
INFO:tensorflow:loss = 611143300000.0, step = 6101 (0.613 sec)
INFO:tensorflow:global_step/sec: 157.759
INFO:tensorflow:loss = 624837100000.0, step = 6201 (0.633 sec)
INFO:tensorflow:global_step/sec: 164.606
INFO:tensorflow:loss = 601607000000.0, step = 6301 (0.607 sec)
INFO:tensorflow:global_step/sec: 172.89
INFO:tensorflow:loss = 504669500000.0, step = 6401 (0.579 sec)
INFO:tensorflow:global_step/sec: 167.128
INFO:tensorflow:loss = 620628300000.0, step = 6501 (0.598 sec)
INFO:tensorflow:global_step/sec: 139.435
INFO:tensorflow:loss = 623436900000.0, step = 6601 (0.720 sec)
INFO:tensorflow:global_step/sec: 166.081
INFO:tensorflow:loss = 432788900000.0, step = 6701 (0.598 sec)
INFO:tensorflow:global_step/sec: 167.597
INFO:tensorflo

INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Starting evaluation at 2019-01-02-23:10:54
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from ./trained_model/linear_lr/model.ckpt-12000
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Finished evaluation at 2019-01-02-23:10:55
INFO:tensorflow:Saving dict for global step 12000: average_loss = 5631852500.0, global_step = 12000, label/mean = 204546.2, loss = 706693230000.0, prediction/mean = 205003.38, rmse = 75045.67
INFO:tensorflow:Saving 'checkpoint_path' summary for global step 12000: ./trained_model/linear_lr/model.ckpt-12000
INFO:tensorflow:global_step/sec: 32.4615
INFO:tensorflow:loss = 598358900000.0, step = 12001 (3.080 sec)
INFO:tensorflow:global_step/sec: 125.119
INFO:tensorflow:loss = 564719800000.0, step = 12101 (0.800 sec)
INFO:tensorflow:global_step/sec: 130.785
INFO:te

INFO:tensorflow:global_step/sec: 166.139
INFO:tensorflow:loss = 453862620000.0, step = 17501 (0.599 sec)
INFO:tensorflow:global_step/sec: 167.066
INFO:tensorflow:loss = 973125450000.0, step = 17601 (0.599 sec)
INFO:tensorflow:global_step/sec: 166.149
INFO:tensorflow:loss = 1141382200000.0, step = 17701 (0.604 sec)
INFO:tensorflow:global_step/sec: 160.412
INFO:tensorflow:loss = 1094645060000.0, step = 17801 (0.621 sec)
INFO:tensorflow:global_step/sec: 129.893
INFO:tensorflow:loss = 834383600000.0, step = 17901 (0.770 sec)
INFO:tensorflow:Saving checkpoints for 18000 into ./trained_model/linear_lr/model.ckpt.
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Starting evaluation at 2019-01-02-23:11:41
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from ./trained_model/linear_lr/model.ckpt-18000
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Don

INFO:tensorflow:global_step/sec: 167.449
INFO:tensorflow:loss = 3307447800000.0, step = 1401 (0.597 sec)
INFO:tensorflow:global_step/sec: 153.326
INFO:tensorflow:loss = 2383873800000.0, step = 1501 (0.654 sec)
INFO:tensorflow:global_step/sec: 127.373
INFO:tensorflow:loss = 1878941500000.0, step = 1601 (0.784 sec)
INFO:tensorflow:global_step/sec: 168.37
INFO:tensorflow:loss = 3013471000000.0, step = 1701 (0.597 sec)
INFO:tensorflow:global_step/sec: 160.101
INFO:tensorflow:loss = 2357803000000.0, step = 1801 (0.623 sec)
INFO:tensorflow:global_step/sec: 157.893
INFO:tensorflow:loss = 3284622400000.0, step = 1901 (0.631 sec)
INFO:tensorflow:Saving checkpoints for 2000 into ./trained_model/dnn_default/model.ckpt.
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Starting evaluation at 2019-01-02-23:12:13
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters fro

INFO:tensorflow:global_step/sec: 162.415
INFO:tensorflow:loss = 1199754400000.0, step = 6801 (0.614 sec)
INFO:tensorflow:global_step/sec: 160.364
INFO:tensorflow:loss = 1661970400000.0, step = 6901 (0.623 sec)
INFO:tensorflow:global_step/sec: 164.161
INFO:tensorflow:loss = 1591571100000.0, step = 7001 (0.609 sec)
INFO:tensorflow:global_step/sec: 161.666
INFO:tensorflow:loss = 1173322300000.0, step = 7101 (0.618 sec)
INFO:tensorflow:global_step/sec: 155.674
INFO:tensorflow:loss = 732173760000.0, step = 7201 (0.643 sec)
INFO:tensorflow:global_step/sec: 157.654
INFO:tensorflow:loss = 642056450000.0, step = 7301 (0.634 sec)
INFO:tensorflow:global_step/sec: 159.244
INFO:tensorflow:loss = 772491200000.0, step = 7401 (0.628 sec)
INFO:tensorflow:global_step/sec: 162.055
INFO:tensorflow:loss = 1065518100000.0, step = 7501 (0.617 sec)
INFO:tensorflow:global_step/sec: 157.355
INFO:tensorflow:loss = 1674727000000.0, step = 7601 (0.636 sec)
INFO:tensorflow:global_step/sec: 161.036
INFO:tensorflow:l

INFO:tensorflow:loss = 1167616900000.0, step = 12101 (0.622 sec)
INFO:tensorflow:global_step/sec: 166.914
INFO:tensorflow:loss = 874753800000.0, step = 12201 (0.601 sec)
INFO:tensorflow:global_step/sec: 163.381
INFO:tensorflow:loss = 514626850000.0, step = 12301 (0.610 sec)
INFO:tensorflow:global_step/sec: 164.893
INFO:tensorflow:loss = 818266440000.0, step = 12401 (0.606 sec)
INFO:tensorflow:global_step/sec: 165.95
INFO:tensorflow:loss = 1103498300000.0, step = 12501 (0.608 sec)
INFO:tensorflow:global_step/sec: 155.655
INFO:tensorflow:loss = 1088746600000.0, step = 12601 (0.639 sec)
INFO:tensorflow:global_step/sec: 154.248
INFO:tensorflow:loss = 1814673600000.0, step = 12701 (0.647 sec)
INFO:tensorflow:global_step/sec: 164.356
INFO:tensorflow:loss = 1178434300000.0, step = 12801 (0.609 sec)
INFO:tensorflow:global_step/sec: 160.413
INFO:tensorflow:loss = 1126233500000.0, step = 12901 (0.623 sec)
INFO:tensorflow:global_step/sec: 167.446
INFO:tensorflow:loss = 1264762500000.0, step = 130

INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from ./trained_model/dnn_default/model.ckpt-18000
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Finished evaluation at 2019-01-02-23:14:10
INFO:tensorflow:Saving dict for global step 18000: average_loss = 7299149000.0, global_step = 18000, label/mean = 204546.2, loss = 915908000000.0, prediction/mean = 210344.75, rmse = 85435.055
INFO:tensorflow:Saving 'checkpoint_path' summary for global step 18000: ./trained_model/dnn_default/model.ckpt-18000
INFO:tensorflow:global_step/sec: 36.9367
INFO:tensorflow:loss = 1197333800000.0, step = 18001 (2.707 sec)
INFO:tensorflow:global_step/sec: 161.092
INFO:tensorflow:loss = 737204600000.0, step = 18101 (0.621 sec)
INFO:tensorflow:global_step/sec: 164.485
INFO:tensorflow:loss = 729903460000.0, step = 18201 (0.608 sec)
INFO:tensorflow:global_step/sec: 160.756
INFO:tensorflow:loss = 1010178650000.0, step = 18301 (0.622 sec)
IN

INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Starting evaluation at 2019-01-02-23:14:41
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from ./trained_model/dnn_lr/model.ckpt-2000
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Finished evaluation at 2019-01-02-23:14:41
INFO:tensorflow:Saving dict for global step 2000: average_loss = 5814097400.0, global_step = 2000, label/mean = 204546.2, loss = 729561560000.0, prediction/mean = 229310.25, rmse = 76250.23
INFO:tensorflow:Saving 'checkpoint_path' summary for global step 2000: ./trained_model/dnn_lr/model.ckpt-2000
INFO:tensorflow:global_step/sec: 39.5924
INFO:tensorflow:loss = 630198900000.0, step = 2001 (2.528 sec)
INFO:tensorflow:global_step/sec: 163.325
INFO:tensorflow:loss = 345236500000.0, step = 2101 (0.608 sec)
INFO:tensorflow:global_step/sec: 161.157
INFO:tensorflow:loss = 379564620000.0, step = 2201 (0.620 sec)
INFO:tensorflow:global_step/se

INFO:tensorflow:loss = 1015404800000.0, step = 7601 (0.602 sec)
INFO:tensorflow:global_step/sec: 163.504
INFO:tensorflow:loss = 920050900000.0, step = 7701 (0.612 sec)
INFO:tensorflow:global_step/sec: 165.655
INFO:tensorflow:loss = 655987830000.0, step = 7801 (0.603 sec)
INFO:tensorflow:global_step/sec: 163.233
INFO:tensorflow:loss = 460355080000.0, step = 7901 (0.612 sec)
INFO:tensorflow:Saving checkpoints for 8000 into ./trained_model/dnn_lr/model.ckpt.
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Starting evaluation at 2019-01-02-23:15:23
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from ./trained_model/dnn_lr/model.ckpt-8000
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Finished evaluation at 2019-01-02-23:15:24
INFO:tensorflow:Saving dict for global step 8000: average_loss = 5022493000

INFO:tensorflow:loss = 700949700000.0, step = 13001 (0.680 sec)
INFO:tensorflow:global_step/sec: 148.702
INFO:tensorflow:loss = 416865880000.0, step = 13101 (0.670 sec)
INFO:tensorflow:global_step/sec: 125.772
INFO:tensorflow:loss = 795385600000.0, step = 13201 (0.796 sec)
INFO:tensorflow:global_step/sec: 141.04
INFO:tensorflow:loss = 406293200000.0, step = 13301 (0.711 sec)
INFO:tensorflow:global_step/sec: 145.138
INFO:tensorflow:loss = 728502300000.0, step = 13401 (0.686 sec)
INFO:tensorflow:global_step/sec: 152.071
INFO:tensorflow:loss = 599200760000.0, step = 13501 (0.658 sec)
INFO:tensorflow:global_step/sec: 145.447
INFO:tensorflow:loss = 941998340000.0, step = 13601 (0.687 sec)
INFO:tensorflow:global_step/sec: 119.261
INFO:tensorflow:loss = 535324600000.0, step = 13701 (0.841 sec)
INFO:tensorflow:global_step/sec: 141.793
INFO:tensorflow:loss = 463323330000.0, step = 13801 (0.703 sec)
INFO:tensorflow:global_step/sec: 144.068
INFO:tensorflow:loss = 288174770000.0, step = 13901 (0.6

INFO:tensorflow:loss = 527342500000.0, step = 18401 (0.607 sec)
INFO:tensorflow:global_step/sec: 167.21
INFO:tensorflow:loss = 580661500000.0, step = 18501 (0.598 sec)
INFO:tensorflow:global_step/sec: 158.187
INFO:tensorflow:loss = 618228300000.0, step = 18601 (0.636 sec)
INFO:tensorflow:global_step/sec: 164.52
INFO:tensorflow:loss = 815723050000.0, step = 18701 (0.608 sec)
INFO:tensorflow:global_step/sec: 167.263
INFO:tensorflow:loss = 506373960000.0, step = 18801 (0.595 sec)
INFO:tensorflow:global_step/sec: 161.154
INFO:tensorflow:loss = 429848170000.0, step = 18901 (0.621 sec)
INFO:tensorflow:global_step/sec: 154.09
INFO:tensorflow:loss = 356277600000.0, step = 19001 (0.648 sec)
INFO:tensorflow:global_step/sec: 161.017
INFO:tensorflow:loss = 375343020000.0, step = 19101 (0.621 sec)
INFO:tensorflow:global_step/sec: 161.872
INFO:tensorflow:loss = 409912020000.0, step = 19201 (0.618 sec)
INFO:tensorflow:global_step/sec: 162.853
INFO:tensorflow:loss = 847005500000.0, step = 19301 (0.614

 ## Plan
 
 3. Establish LinearRegressor performance w/o feature engineering.
 4. Beat it with feature engineering
 

In [16]:
if len(TensorBoard.list())>0:
  [TensorBoard().stop(pid)for pid in TensorBoard.list()['pid']]
else: print('No TensorBoard instances to stop')