# Neural Network

**Learning Objectives:**
  * Use the `DNNRegressor` class in TensorFlow to predict median housing price

The data is based on 1990 census data from California. This data is at the city block level, so these features reflect the total number of rooms in that block, or the total number of people who live on that block, respectively.
<p>
Let's use a set of features to predict house value.

## Set Up
In this first cell, we'll load the necessary libraries.

In [1]:
import math
import shutil
import numpy as np
import pandas as pd
import tensorflow as tf

tf.logging.set_verbosity(tf.logging.INFO)
pd.options.display.max_rows = 10
pd.options.display.float_format = '{:.1f}'.format

  from ._conv import register_converters as _register_converters


Next, we'll load our data set.

In [2]:
df = pd.read_csv("https://storage.googleapis.com/ml_universities/california_housing_train.csv", sep=",")

## Examine the data

It's a good idea to get to know your data a little bit before you work with it.

We'll print out a quick summary of a few useful statistics on each column.

This will include things like mean, standard deviation, max, min, and various quantiles.

In [3]:
df.head()

Unnamed: 0,longitude,latitude,housing_median_age,total_rooms,total_bedrooms,population,households,median_income,median_house_value
0,-114.3,34.2,15.0,5612.0,1283.0,1015.0,472.0,1.5,66900.0
1,-114.5,34.4,19.0,7650.0,1901.0,1129.0,463.0,1.8,80100.0
2,-114.6,33.7,17.0,720.0,174.0,333.0,117.0,1.7,85700.0
3,-114.6,33.6,14.0,1501.0,337.0,515.0,226.0,3.2,73400.0
4,-114.6,33.6,20.0,1454.0,326.0,624.0,262.0,1.9,65500.0


In [4]:
df.describe()

Unnamed: 0,longitude,latitude,housing_median_age,total_rooms,total_bedrooms,population,households,median_income,median_house_value
count,17000.0,17000.0,17000.0,17000.0,17000.0,17000.0,17000.0,17000.0,17000.0
mean,-119.6,35.6,28.6,2643.7,539.4,1429.6,501.2,3.9,207300.9
std,2.0,2.1,12.6,2179.9,421.5,1147.9,384.5,1.9,115983.8
min,-124.3,32.5,1.0,2.0,1.0,3.0,1.0,0.5,14999.0
25%,-121.8,33.9,18.0,1462.0,297.0,790.0,282.0,2.6,119400.0
50%,-118.5,34.2,29.0,2127.0,434.0,1167.0,409.0,3.5,180400.0
75%,-118.0,37.7,37.0,3151.2,648.2,1721.0,605.2,4.8,265000.0
max,-114.3,42.0,52.0,37937.0,6445.0,35682.0,6082.0,15.0,500001.0


This data is at the city block level, so these features reflect the total number of rooms in that block, or the total number of people who live on that block, respectively.  Let's create a different, more appropriate feature.  Because we are predicing the price of a single house, we should try to make all our features correspond to a single house as well

In [5]:
df['num_rooms'] = df['total_rooms'] / df['households']
df['num_bedrooms'] = df['total_bedrooms'] / df['households']
df['persons_per_house'] = df['population'] / df['households']
df.describe()

Unnamed: 0,longitude,latitude,housing_median_age,total_rooms,total_bedrooms,population,households,median_income,median_house_value,num_rooms,num_bedrooms,persons_per_house
count,17000.0,17000.0,17000.0,17000.0,17000.0,17000.0,17000.0,17000.0,17000.0,17000.0,17000.0,17000.0
mean,-119.6,35.6,28.6,2643.7,539.4,1429.6,501.2,3.9,207300.9,5.4,1.1,3.0
std,2.0,2.1,12.6,2179.9,421.5,1147.9,384.5,1.9,115983.8,2.5,0.5,4.0
min,-124.3,32.5,1.0,2.0,1.0,3.0,1.0,0.5,14999.0,0.8,0.3,0.7
25%,-121.8,33.9,18.0,1462.0,297.0,790.0,282.0,2.6,119400.0,4.4,1.0,2.4
50%,-118.5,34.2,29.0,2127.0,434.0,1167.0,409.0,3.5,180400.0,5.2,1.0,2.8
75%,-118.0,37.7,37.0,3151.2,648.2,1721.0,605.2,4.8,265000.0,6.1,1.1,3.3
max,-114.3,42.0,52.0,37937.0,6445.0,35682.0,6082.0,15.0,500001.0,141.9,34.1,502.5


In [6]:
df.drop(['total_rooms', 'total_bedrooms', 'population', 'households'], axis = 1, inplace = True)
df.describe()

Unnamed: 0,longitude,latitude,housing_median_age,median_income,median_house_value,num_rooms,num_bedrooms,persons_per_house
count,17000.0,17000.0,17000.0,17000.0,17000.0,17000.0,17000.0,17000.0
mean,-119.6,35.6,28.6,3.9,207300.9,5.4,1.1,3.0
std,2.0,2.1,12.6,1.9,115983.8,2.5,0.5,4.0
min,-124.3,32.5,1.0,0.5,14999.0,0.8,0.3,0.7
25%,-121.8,33.9,18.0,2.6,119400.0,4.4,1.0,2.4
50%,-118.5,34.2,29.0,3.5,180400.0,5.2,1.0,2.8
75%,-118.0,37.7,37.0,4.8,265000.0,6.1,1.1,3.3
max,-114.3,42.0,52.0,15.0,500001.0,141.9,34.1,502.5


## Build a neural network model

In this exercise, we'll be trying to predict `median_house_value`. It will be our label (sometimes also called a target). We'll use the remaining columns as our input features.

To train our model, we'll first use the [LinearRegressor](https://www.tensorflow.org/api_docs/python/tf/contrib/learn/LinearRegressor) interface. Then, we'll change to DNNRegressor


In [7]:
featcols = {
  colname : tf.feature_column.numeric_column(colname) \
    for colname in 'housing_median_age,median_income,num_rooms,num_bedrooms,persons_per_house'.split(',')
}
# Bucketize lat, lon so it's not so high-res; California is mostly N-S, so more lats than lons
featcols['longitude'] = tf.feature_column.bucketized_column(tf.feature_column.numeric_column('longitude'),
                                                   np.linspace(-124.3, -114.3, 5).tolist())
featcols['latitude'] = tf.feature_column.bucketized_column(tf.feature_column.numeric_column('latitude'),
                                                  np.linspace(32.5, 42, 10).tolist())

In [8]:
featcols.keys()

dict_keys(['num_bedrooms', 'housing_median_age', 'latitude', 'num_rooms', 'longitude', 'persons_per_house', 'median_income'])

In [9]:
# Split into train and eval
msk = np.random.rand(len(df)) < 0.8
traindf = df[msk]
evaldf = df[~msk]

SCALE = 100000
BATCH_SIZE= 100
OUTDIR = './housing_trained'
train_input_fn = tf.estimator.inputs.pandas_input_fn(x = traindf[list(featcols.keys())],
                                                    y = traindf["median_house_value"] / SCALE,
                                                    num_epochs = None,
                                                    batch_size = BATCH_SIZE,
                                                    shuffle = True)
eval_input_fn = tf.estimator.inputs.pandas_input_fn(x = evaldf[list(featcols.keys())],
                                                    y = evaldf["median_house_value"] / SCALE,  # note the scaling
                                                    num_epochs = 1, 
                                                    batch_size = len(evaldf), 
                                                    shuffle=False)

In [10]:
# Linear Regressor
def train_and_evaluate(output_dir, num_train_steps):
  myopt = tf.train.FtrlOptimizer(learning_rate = 0.01) # note the learning rate
  estimator = tf.estimator.LinearRegressor(
                       model_dir = output_dir, 
                       feature_columns = featcols.values(),
                       optimizer = myopt)
  
  #Add rmse evaluation metric
  def rmse(labels, predictions):
    pred_values = tf.cast(predictions['predictions'],tf.float64)
    return {'rmse': tf.metrics.root_mean_squared_error(labels*SCALE, pred_values*SCALE)}
  estimator = tf.contrib.estimator.add_metrics(estimator,rmse)
  
  train_spec=tf.estimator.TrainSpec(
                       input_fn = train_input_fn,
                       max_steps = num_train_steps)
  eval_spec=tf.estimator.EvalSpec(
                       input_fn = eval_input_fn,
                       steps = None,
                       start_delay_secs = 1, # start evaluating after N seconds
                       throttle_secs = 10,  # evaluate every N seconds
                       )
  tf.estimator.train_and_evaluate(estimator, train_spec, eval_spec)

# Run training    
shutil.rmtree(OUTDIR, ignore_errors = True) # start fresh each time
train_and_evaluate(OUTDIR, num_train_steps = (100 * len(traindf)) / BATCH_SIZE) 

INFO:tensorflow:Using default config.
INFO:tensorflow:Using config: {'_keep_checkpoint_every_n_hours': 10000, '_save_checkpoints_secs': 600, '_model_dir': './housing_trained', '_train_distribute': None, '_master': '', '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7f93416bf3c8>, '_log_step_count_steps': 100, '_is_chief': True, '_num_worker_replicas': 1, '_session_config': None, '_global_id_in_cluster': 0, '_num_ps_replicas': 0, '_tf_random_seed': None, '_task_id': 0, '_service': None, '_save_summary_steps': 100, '_evaluation_master': '', '_save_checkpoints_steps': None, '_task_type': 'worker', '_keep_checkpoint_max': 5}
INFO:tensorflow:Using config: {'_keep_checkpoint_every_n_hours': 10000, '_save_checkpoints_secs': 600, '_model_dir': './housing_trained', '_train_distribute': None, '_master': '', '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7f93416bf320>, '_log_step_count_steps': 100, '_is_chief': True, '_num_worker_re

INFO:tensorflow:loss = 45.69444, step = 5253 (0.276 sec)
INFO:tensorflow:global_step/sec: 354.542
INFO:tensorflow:loss = 34.16141, step = 5353 (0.282 sec)
INFO:tensorflow:global_step/sec: 336.206
INFO:tensorflow:loss = 93.425224, step = 5453 (0.298 sec)
INFO:tensorflow:global_step/sec: 343.737
INFO:tensorflow:loss = 43.893612, step = 5553 (0.290 sec)
INFO:tensorflow:global_step/sec: 366.23
INFO:tensorflow:loss = 58.852104, step = 5653 (0.273 sec)
INFO:tensorflow:global_step/sec: 332.712
INFO:tensorflow:loss = 62.025593, step = 5753 (0.301 sec)
INFO:tensorflow:Saving checkpoints for 5787 into ./housing_trained/model.ckpt.
INFO:tensorflow:Loss for final step: 56.766956.
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Starting evaluation at 2019-04-28-18:43:36
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from ./housing_trained/model.ckpt-5787
INFO:

INFO:tensorflow:global_step/sec: 363.607
INFO:tensorflow:loss = 23.70868, step = 11280 (0.277 sec)
INFO:tensorflow:global_step/sec: 324.383
INFO:tensorflow:loss = 111.21272, step = 11380 (0.306 sec)
INFO:tensorflow:global_step/sec: 349.135
INFO:tensorflow:loss = 71.865204, step = 11480 (0.286 sec)
INFO:tensorflow:global_step/sec: 360.441
INFO:tensorflow:loss = 64.65696, step = 11580 (0.279 sec)
INFO:tensorflow:global_step/sec: 326.573
INFO:tensorflow:loss = 43.515, step = 11680 (0.306 sec)
INFO:tensorflow:global_step/sec: 346.572
INFO:tensorflow:loss = 34.305725, step = 11780 (0.289 sec)
INFO:tensorflow:global_step/sec: 358.646
INFO:tensorflow:loss = 57.984554, step = 11880 (0.278 sec)
INFO:tensorflow:Saving checkpoints for 11905 into ./housing_trained/model.ckpt.
INFO:tensorflow:Loss for final step: 37.83465.
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Starting evaluat

In [11]:
# DNN Regressor
def train_and_evaluate(output_dir, num_train_steps):
  myopt = tf.train.FtrlOptimizer(learning_rate = 0.01) # note the learning rate
  estimator = tf.estimator.DNNRegressor(model_dir = output_dir,
                                hidden_units = [100, 50, 20],
                                feature_columns = featcols.values(),
                                optimizer = myopt,
                                dropout = 0.1)
  
  #Add rmse evaluation metric
  def rmse(labels, predictions):
    pred_values = tf.cast(predictions['predictions'],tf.float64)
    return {'rmse': tf.metrics.root_mean_squared_error(labels*SCALE, pred_values*SCALE)}
  estimator = tf.contrib.estimator.add_metrics(estimator,rmse)
  
  train_spec=tf.estimator.TrainSpec(
                       input_fn = train_input_fn,
                       max_steps = num_train_steps)
  eval_spec=tf.estimator.EvalSpec(
                       input_fn = eval_input_fn,
                       steps = None,
                       start_delay_secs = 1, # start evaluating after N seconds
                       throttle_secs = 10,  # evaluate every N seconds
                       )
  tf.estimator.train_and_evaluate(estimator, train_spec, eval_spec)

# Run training    
shutil.rmtree(OUTDIR, ignore_errors = True) # start fresh each time
tf.summary.FileWriterCache.clear() # ensure filewriter cache is clear for TensorBoard events file
train_and_evaluate(OUTDIR, num_train_steps = (100 * len(traindf)) / BATCH_SIZE) 

INFO:tensorflow:Using default config.
INFO:tensorflow:Using config: {'_keep_checkpoint_every_n_hours': 10000, '_save_checkpoints_secs': 600, '_model_dir': './housing_trained', '_train_distribute': None, '_master': '', '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7f933c4c9780>, '_log_step_count_steps': 100, '_is_chief': True, '_num_worker_replicas': 1, '_session_config': None, '_global_id_in_cluster': 0, '_num_ps_replicas': 0, '_tf_random_seed': None, '_task_id': 0, '_service': None, '_save_summary_steps': 100, '_evaluation_master': '', '_save_checkpoints_steps': None, '_task_type': 'worker', '_keep_checkpoint_max': 5}
INFO:tensorflow:Using config: {'_keep_checkpoint_every_n_hours': 10000, '_save_checkpoints_secs': 600, '_model_dir': './housing_trained', '_train_distribute': None, '_master': '', '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7f933c4c96a0>, '_log_step_count_steps': 100, '_is_chief': True, '_num_worker_re

INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from ./housing_trained/model.ckpt-4837
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Finished evaluation at 2019-04-28-18:44:31
INFO:tensorflow:Saving dict for global step 4837: average_loss = 0.48232996, global_step = 4837, loss = 1602.3002, rmse = 69449.984
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from ./housing_trained/model.ckpt-4837
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Saving checkpoints for 4838 into ./housing_trained/model.ckpt.
INFO:tensorflow:loss = 65.21829, step = 4838
INFO:tensorflow:global_step/sec: 207.535
INFO:tensorflow:loss = 45.48132, step = 4938 (0.486 sec)
INFO:

INFO:tensorflow:loss = 48.083508, step = 9865
INFO:tensorflow:global_step/sec: 180.319
INFO:tensorflow:loss = 57.85352, step = 9965 (0.562 sec)
INFO:tensorflow:global_step/sec: 236.658
INFO:tensorflow:loss = 19.79258, step = 10065 (0.421 sec)
INFO:tensorflow:global_step/sec: 234.898
INFO:tensorflow:loss = 62.91823, step = 10165 (0.425 sec)
INFO:tensorflow:global_step/sec: 261.342
INFO:tensorflow:loss = 75.387825, step = 10265 (0.382 sec)
INFO:tensorflow:global_step/sec: 222.371
INFO:tensorflow:loss = 53.780617, step = 10365 (0.450 sec)
INFO:tensorflow:global_step/sec: 248.949
INFO:tensorflow:loss = 24.527044, step = 10465 (0.402 sec)
INFO:tensorflow:global_step/sec: 227.612
INFO:tensorflow:loss = 34.75299, step = 10565 (0.439 sec)
INFO:tensorflow:global_step/sec: 244.279
INFO:tensorflow:loss = 73.86379, step = 10665 (0.410 sec)
INFO:tensorflow:global_step/sec: 247.435
INFO:tensorflow:loss = 70.20312, step = 10765 (0.404 sec)
INFO:tensorflow:global_step/sec: 226.169
INFO:tensorflow:loss

In [12]:
from google.datalab.ml import TensorBoard
pid = TensorBoard().start(OUTDIR)

In [13]:
TensorBoard().stop(pid)