# Neural Network

**Learning Objectives:**
  * Use the `DNNRegressor` class in TensorFlow to predict median housing price

The data is based on 1990 census data from California. This data is at the city block level, so these features reflect the total number of rooms in that block, or the total number of people who live on that block, respectively.
<p>
Let's use a set of features to predict house value.

## Set Up
In this first cell, we'll load the necessary libraries.

In [1]:
import math
import shutil
import numpy as np
import pandas as pd
import tensorflow as tf

tf.logging.set_verbosity(tf.logging.INFO)
pd.options.display.max_rows = 10
pd.options.display.float_format = '{:.1f}'.format

  from ._conv import register_converters as _register_converters


Next, we'll load our data set.

In [2]:
df = pd.read_csv("https://storage.googleapis.com/ml_universities/california_housing_train.csv", sep=",")

## Examine the data

It's a good idea to get to know your data a little bit before you work with it.

We'll print out a quick summary of a few useful statistics on each column.

This will include things like mean, standard deviation, max, min, and various quantiles.

In [3]:
df.head()

Unnamed: 0,longitude,latitude,housing_median_age,total_rooms,total_bedrooms,population,households,median_income,median_house_value
0,-114.3,34.2,15.0,5612.0,1283.0,1015.0,472.0,1.5,66900.0
1,-114.5,34.4,19.0,7650.0,1901.0,1129.0,463.0,1.8,80100.0
2,-114.6,33.7,17.0,720.0,174.0,333.0,117.0,1.7,85700.0
3,-114.6,33.6,14.0,1501.0,337.0,515.0,226.0,3.2,73400.0
4,-114.6,33.6,20.0,1454.0,326.0,624.0,262.0,1.9,65500.0


In [4]:
df.describe()

Unnamed: 0,longitude,latitude,housing_median_age,total_rooms,total_bedrooms,population,households,median_income,median_house_value
count,17000.0,17000.0,17000.0,17000.0,17000.0,17000.0,17000.0,17000.0,17000.0
mean,-119.6,35.6,28.6,2643.7,539.4,1429.6,501.2,3.9,207300.9
std,2.0,2.1,12.6,2179.9,421.5,1147.9,384.5,1.9,115983.8
min,-124.3,32.5,1.0,2.0,1.0,3.0,1.0,0.5,14999.0
25%,-121.8,33.9,18.0,1462.0,297.0,790.0,282.0,2.6,119400.0
50%,-118.5,34.2,29.0,2127.0,434.0,1167.0,409.0,3.5,180400.0
75%,-118.0,37.7,37.0,3151.2,648.2,1721.0,605.2,4.8,265000.0
max,-114.3,42.0,52.0,37937.0,6445.0,35682.0,6082.0,15.0,500001.0


This data is at the city block level, so these features reflect the total number of rooms in that block, or the total number of people who live on that block, respectively.  Let's create a different, more appropriate feature.  Because we are predicing the price of a single house, we should try to make all our features correspond to a single house as well

In [5]:
df['num_rooms'] = df['total_rooms'] / df['households']
df['num_bedrooms'] = df['total_bedrooms'] / df['households']
df['persons_per_house'] = df['population'] / df['households']
df.describe()

Unnamed: 0,longitude,latitude,housing_median_age,total_rooms,total_bedrooms,population,households,median_income,median_house_value,num_rooms,num_bedrooms,persons_per_house
count,17000.0,17000.0,17000.0,17000.0,17000.0,17000.0,17000.0,17000.0,17000.0,17000.0,17000.0,17000.0
mean,-119.6,35.6,28.6,2643.7,539.4,1429.6,501.2,3.9,207300.9,5.4,1.1,3.0
std,2.0,2.1,12.6,2179.9,421.5,1147.9,384.5,1.9,115983.8,2.5,0.5,4.0
min,-124.3,32.5,1.0,2.0,1.0,3.0,1.0,0.5,14999.0,0.8,0.3,0.7
25%,-121.8,33.9,18.0,1462.0,297.0,790.0,282.0,2.6,119400.0,4.4,1.0,2.4
50%,-118.5,34.2,29.0,2127.0,434.0,1167.0,409.0,3.5,180400.0,5.2,1.0,2.8
75%,-118.0,37.7,37.0,3151.2,648.2,1721.0,605.2,4.8,265000.0,6.1,1.1,3.3
max,-114.3,42.0,52.0,37937.0,6445.0,35682.0,6082.0,15.0,500001.0,141.9,34.1,502.5


In [6]:
df.drop(['total_rooms', 'total_bedrooms', 'population', 'households'], axis = 1, inplace = True)
df.describe()

Unnamed: 0,longitude,latitude,housing_median_age,median_income,median_house_value,num_rooms,num_bedrooms,persons_per_house
count,17000.0,17000.0,17000.0,17000.0,17000.0,17000.0,17000.0,17000.0
mean,-119.6,35.6,28.6,3.9,207300.9,5.4,1.1,3.0
std,2.0,2.1,12.6,1.9,115983.8,2.5,0.5,4.0
min,-124.3,32.5,1.0,0.5,14999.0,0.8,0.3,0.7
25%,-121.8,33.9,18.0,2.6,119400.0,4.4,1.0,2.4
50%,-118.5,34.2,29.0,3.5,180400.0,5.2,1.0,2.8
75%,-118.0,37.7,37.0,4.8,265000.0,6.1,1.1,3.3
max,-114.3,42.0,52.0,15.0,500001.0,141.9,34.1,502.5


## Build a neural network model

In this exercise, we'll be trying to predict `median_house_value`. It will be our label (sometimes also called a target). We'll use the remaining columns as our input features.

To train our model, we'll first use the [LinearRegressor](https://www.tensorflow.org/api_docs/python/tf/contrib/learn/LinearRegressor) interface. Then, we'll change to DNNRegressor


In [7]:
featcols = {
  colname : tf.feature_column.numeric_column(colname) \
    for colname in 'housing_median_age,median_income,num_rooms,num_bedrooms,persons_per_house'.split(',')
}
# Bucketize lat, lon so it's not so high-res; California is mostly N-S, so more lats than lons
featcols['longitude'] = tf.feature_column.bucketized_column(tf.feature_column.numeric_column('longitude'),
                                                   np.linspace(-124.3, -114.3, 5).tolist())
featcols['latitude'] = tf.feature_column.bucketized_column(tf.feature_column.numeric_column('latitude'),
                                                  np.linspace(32.5, 42, 10).tolist())

In [8]:
featcols.keys()

dict_keys(['num_bedrooms', 'latitude', 'longitude', 'num_rooms', 'persons_per_house', 'housing_median_age', 'median_income'])

In [9]:
# Split into train and eval
msk = np.random.rand(len(df)) < 0.8
traindf = df[msk]
evaldf = df[~msk]

SCALE = 100000
BATCH_SIZE= 100
OUTDIR = './housing_trained'
train_input_fn = tf.estimator.inputs.pandas_input_fn(x = traindf[list(featcols.keys())],
                                                    y = traindf["median_house_value"] / SCALE,
                                                    num_epochs = None,
                                                    batch_size = BATCH_SIZE,
                                                    shuffle = True)
eval_input_fn = tf.estimator.inputs.pandas_input_fn(x = evaldf[list(featcols.keys())],
                                                    y = evaldf["median_house_value"] / SCALE,  # note the scaling
                                                    num_epochs = 1, 
                                                    batch_size = len(evaldf), 
                                                    shuffle=False)

In [10]:
# Linear Regressor
def train_and_evaluate(output_dir, num_train_steps):
  myopt = tf.train.FtrlOptimizer(learning_rate = 0.01) # note the learning rate
  estimator = tf.estimator.LinearRegressor(
                       model_dir = output_dir, 
                       feature_columns = featcols.values(),
                       optimizer = myopt)
  
  #Add rmse evaluation metric
  def rmse(labels, predictions):
    pred_values = tf.cast(predictions['predictions'],tf.float64)
    return {'rmse': tf.metrics.root_mean_squared_error(labels*SCALE, pred_values*SCALE)}
  estimator = tf.contrib.estimator.add_metrics(estimator,rmse)
  
  train_spec=tf.estimator.TrainSpec(
                       input_fn = train_input_fn,
                       max_steps = num_train_steps)
  eval_spec=tf.estimator.EvalSpec(
                       input_fn = eval_input_fn,
                       steps = None,
                       start_delay_secs = 1, # start evaluating after N seconds
                       throttle_secs = 10,  # evaluate every N seconds
                       )
  tf.estimator.train_and_evaluate(estimator, train_spec, eval_spec)

# Run training    
shutil.rmtree(OUTDIR, ignore_errors = True) # start fresh each time
train_and_evaluate(OUTDIR, num_train_steps = (100 * len(traindf)) / BATCH_SIZE) 

INFO:tensorflow:Using default config.
INFO:tensorflow:Using config: {'_keep_checkpoint_every_n_hours': 10000, '_global_id_in_cluster': 0, '_save_checkpoints_secs': 600, '_session_config': None, '_train_distribute': None, '_save_summary_steps': 100, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7f94aefd02e8>, '_master': '', '_num_worker_replicas': 1, '_model_dir': './housing_trained', '_evaluation_master': '', '_task_type': 'worker', '_save_checkpoints_steps': None, '_keep_checkpoint_max': 5, '_task_id': 0, '_is_chief': True, '_log_step_count_steps': 100, '_tf_random_seed': None, '_num_ps_replicas': 0, '_service': None}
INFO:tensorflow:Using config: {'_session_config': None, '_global_id_in_cluster': 0, '_save_checkpoints_secs': 600, '_keep_checkpoint_every_n_hours': 10000, '_train_distribute': None, '_save_summary_steps': 100, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7f948cadffd0>, '_master': '', '_num_worker_repli

INFO:tensorflow:Loss for final step: 36.749588.
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Starting evaluation at 2018-12-08-10:26:49
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from ./housing_trained/model.ckpt-5251
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Finished evaluation at 2018-12-08-10:26:49
INFO:tensorflow:Saving dict for global step 5251: average_loss = 1.2204146, global_step = 5251, loss = 4222.635, rmse = 110472.36
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from ./housing_trained/model.ckpt-5251
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done

INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from ./housing_trained/model.ckpt-10408
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Saving checkpoints for 10409 into ./housing_trained/model.ckpt.
INFO:tensorflow:step = 10409, loss = 57.62966
INFO:tensorflow:global_step/sec: 161.266
INFO:tensorflow:step = 10509, loss = 57.504402 (0.627 sec)
INFO:tensorflow:global_step/sec: 256.391
INFO:tensorflow:step = 10609, loss = 32.896187 (0.388 sec)
INFO:tensorflow:global_step/sec: 231.694
INFO:tensorflow:step = 10709, loss = 95.74439 (0.432 sec)
INFO:tensorflow:global_step/sec: 252.404
INFO:tensorflow:step = 10809, loss = 43.661137 (0.397 sec)
INFO:tensorflow:global_step/sec: 239.817
INFO:tensorflow:step = 10909, loss = 55.049583 (0.416 sec)
INFO:tensorflow:global_step/sec: 236.618
INFO:tensorflow:step = 11009, loss = 32.73818 (0.428 sec)
INFO:tensorflow:global_step/sec: 252.358
INFO:tensorflow:step = 11109, loss = 7

In [11]:
# DNN Regressor
def train_and_evaluate(output_dir, num_train_steps):
  myopt = tf.train.FtrlOptimizer(learning_rate = 0.01) # note the learning rate
  estimator = tf.estimator.DNNRegressor(model_dir = output_dir,
                                hidden_units = [100, 50, 20],
                                feature_columns = featcols.values(),
                                optimizer = myopt,
                                dropout = 0.1)
  
  #Add rmse evaluation metric
  def rmse(labels, predictions):
    pred_values = tf.cast(predictions['predictions'],tf.float64)
    return {'rmse': tf.metrics.root_mean_squared_error(labels*SCALE, pred_values*SCALE)}
  estimator = tf.contrib.estimator.add_metrics(estimator,rmse)
  
  train_spec=tf.estimator.TrainSpec(
                       input_fn = train_input_fn,
                       max_steps = num_train_steps)
  eval_spec=tf.estimator.EvalSpec(
                       input_fn = eval_input_fn,
                       steps = None,
                       start_delay_secs = 1, # start evaluating after N seconds
                       throttle_secs = 10,  # evaluate every N seconds
                       )
  tf.estimator.train_and_evaluate(estimator, train_spec, eval_spec)

# Run training    
shutil.rmtree(OUTDIR, ignore_errors = True) # start fresh each time
train_and_evaluate(OUTDIR, num_train_steps = (100 * len(traindf)) / BATCH_SIZE) 

INFO:tensorflow:Using default config.
INFO:tensorflow:Using config: {'_keep_checkpoint_every_n_hours': 10000, '_global_id_in_cluster': 0, '_save_checkpoints_secs': 600, '_session_config': None, '_train_distribute': None, '_save_summary_steps': 100, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7f9482eb2fd0>, '_master': '', '_num_worker_replicas': 1, '_model_dir': './housing_trained', '_evaluation_master': '', '_task_type': 'worker', '_save_checkpoints_steps': None, '_keep_checkpoint_max': 5, '_task_id': 0, '_is_chief': True, '_log_step_count_steps': 100, '_tf_random_seed': None, '_num_ps_replicas': 0, '_service': None}
INFO:tensorflow:Using config: {'_session_config': None, '_global_id_in_cluster': 0, '_save_checkpoints_secs': 600, '_keep_checkpoint_every_n_hours': 10000, '_train_distribute': None, '_save_summary_steps': 100, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7f9482eb2f60>, '_master': '', '_num_worker_repli

INFO:tensorflow:step = 4149, loss = 101.62576
INFO:tensorflow:global_step/sec: 128.096
INFO:tensorflow:step = 4249, loss = 63.10124 (0.792 sec)
INFO:tensorflow:global_step/sec: 126.646
INFO:tensorflow:step = 4349, loss = 105.900055 (0.782 sec)
INFO:tensorflow:global_step/sec: 120.844
INFO:tensorflow:step = 4449, loss = 50.320786 (0.828 sec)
INFO:tensorflow:global_step/sec: 140.715
INFO:tensorflow:step = 4549, loss = 41.070656 (0.710 sec)
INFO:tensorflow:global_step/sec: 122.794
INFO:tensorflow:step = 4649, loss = 52.449574 (0.818 sec)
INFO:tensorflow:global_step/sec: 93.4849
INFO:tensorflow:step = 4749, loss = 69.87997 (1.067 sec)
INFO:tensorflow:global_step/sec: 79.2553
INFO:tensorflow:step = 4849, loss = 62.083588 (1.262 sec)
INFO:tensorflow:global_step/sec: 83.7826
INFO:tensorflow:step = 4949, loss = 29.992378 (1.195 sec)
INFO:tensorflow:Saving checkpoints for 5035 into ./housing_trained/model.ckpt.
INFO:tensorflow:Loss for final step: 53.168922.
INFO:tensorflow:Calling model_fn.
IN

INFO:tensorflow:global_step/sec: 239.406
INFO:tensorflow:step = 7741, loss = 54.94063 (0.418 sec)
INFO:tensorflow:global_step/sec: 229.318
INFO:tensorflow:step = 7841, loss = 47.675694 (0.436 sec)
INFO:tensorflow:global_step/sec: 231.192
INFO:tensorflow:step = 7941, loss = 35.19042 (0.433 sec)
INFO:tensorflow:global_step/sec: 211.467
INFO:tensorflow:step = 8041, loss = 83.72525 (0.473 sec)
INFO:tensorflow:global_step/sec: 231.423
INFO:tensorflow:step = 8141, loss = 45.49218 (0.432 sec)
INFO:tensorflow:global_step/sec: 238.452
INFO:tensorflow:step = 8241, loss = 84.90021 (0.419 sec)
INFO:tensorflow:global_step/sec: 210.875
INFO:tensorflow:step = 8341, loss = 52.43736 (0.474 sec)
INFO:tensorflow:global_step/sec: 237.162
INFO:tensorflow:step = 8441, loss = 71.84572 (0.421 sec)
INFO:tensorflow:global_step/sec: 238.412
INFO:tensorflow:step = 8541, loss = 52.828506 (0.419 sec)
INFO:tensorflow:global_step/sec: 241.325
INFO:tensorflow:step = 8641, loss = 96.5738 (0.415 sec)
INFO:tensorflow:glo

INFO:tensorflow:Loss for final step: 37.765224.
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Starting evaluation at 2018-12-08-10:29:57
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from ./housing_trained/model.ckpt-13540
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Finished evaluation at 2018-12-08-10:29:57
INFO:tensorflow:Saving dict for global step 13540: average_loss = 0.41815835, global_step = 13540, loss = 1446.8279, rmse = 64665.168


In [12]:
from google.datalab.ml import TensorBoard
pid = TensorBoard().start(OUTDIR)

In [13]:
TensorBoard().stop(pid)