# Neural Network

**Learning Objectives:**
  * Use the `DNNRegressor` class in TensorFlow to predict median housing price

The data is based on 1990 census data from California. This data is at the city block level, so these features reflect the total number of rooms in that block, or the total number of people who live on that block, respectively.
<p>
Let's use a set of features to predict house value.

## Set Up
In this first cell, we'll load the necessary libraries.

In [1]:
import math
import shutil
import numpy as np
import pandas as pd
import tensorflow as tf

tf.logging.set_verbosity(tf.logging.INFO)
pd.options.display.max_rows = 10
pd.options.display.float_format = '{:.1f}'.format

  from ._conv import register_converters as _register_converters


Next, we'll load our data set.

In [2]:
df = pd.read_csv("https://storage.googleapis.com/ml_universities/california_housing_train.csv", sep=",")

## Examine the data

It's a good idea to get to know your data a little bit before you work with it.

We'll print out a quick summary of a few useful statistics on each column.

This will include things like mean, standard deviation, max, min, and various quantiles.

In [3]:
df.head()

Unnamed: 0,longitude,latitude,housing_median_age,total_rooms,total_bedrooms,population,households,median_income,median_house_value
0,-114.3,34.2,15.0,5612.0,1283.0,1015.0,472.0,1.5,66900.0
1,-114.5,34.4,19.0,7650.0,1901.0,1129.0,463.0,1.8,80100.0
2,-114.6,33.7,17.0,720.0,174.0,333.0,117.0,1.7,85700.0
3,-114.6,33.6,14.0,1501.0,337.0,515.0,226.0,3.2,73400.0
4,-114.6,33.6,20.0,1454.0,326.0,624.0,262.0,1.9,65500.0


In [4]:
df.describe()

Unnamed: 0,longitude,latitude,housing_median_age,total_rooms,total_bedrooms,population,households,median_income,median_house_value
count,17000.0,17000.0,17000.0,17000.0,17000.0,17000.0,17000.0,17000.0,17000.0
mean,-119.6,35.6,28.6,2643.7,539.4,1429.6,501.2,3.9,207300.9
std,2.0,2.1,12.6,2179.9,421.5,1147.9,384.5,1.9,115983.8
min,-124.3,32.5,1.0,2.0,1.0,3.0,1.0,0.5,14999.0
25%,-121.8,33.9,18.0,1462.0,297.0,790.0,282.0,2.6,119400.0
50%,-118.5,34.2,29.0,2127.0,434.0,1167.0,409.0,3.5,180400.0
75%,-118.0,37.7,37.0,3151.2,648.2,1721.0,605.2,4.8,265000.0
max,-114.3,42.0,52.0,37937.0,6445.0,35682.0,6082.0,15.0,500001.0


This data is at the city block level, so these features reflect the total number of rooms in that block, or the total number of people who live on that block, respectively.  Let's create a different, more appropriate feature.  Because we are predicing the price of a single house, we should try to make all our features correspond to a single house as well

In [5]:
df['num_rooms'] = df['total_rooms'] / df['households']
df['num_bedrooms'] = df['total_bedrooms'] / df['households']
df['persons_per_house'] = df['population'] / df['households']
df.describe()

Unnamed: 0,longitude,latitude,housing_median_age,total_rooms,total_bedrooms,population,households,median_income,median_house_value,num_rooms,num_bedrooms,persons_per_house
count,17000.0,17000.0,17000.0,17000.0,17000.0,17000.0,17000.0,17000.0,17000.0,17000.0,17000.0,17000.0
mean,-119.6,35.6,28.6,2643.7,539.4,1429.6,501.2,3.9,207300.9,5.4,1.1,3.0
std,2.0,2.1,12.6,2179.9,421.5,1147.9,384.5,1.9,115983.8,2.5,0.5,4.0
min,-124.3,32.5,1.0,2.0,1.0,3.0,1.0,0.5,14999.0,0.8,0.3,0.7
25%,-121.8,33.9,18.0,1462.0,297.0,790.0,282.0,2.6,119400.0,4.4,1.0,2.4
50%,-118.5,34.2,29.0,2127.0,434.0,1167.0,409.0,3.5,180400.0,5.2,1.0,2.8
75%,-118.0,37.7,37.0,3151.2,648.2,1721.0,605.2,4.8,265000.0,6.1,1.1,3.3
max,-114.3,42.0,52.0,37937.0,6445.0,35682.0,6082.0,15.0,500001.0,141.9,34.1,502.5


In [6]:
df.drop(['total_rooms', 'total_bedrooms', 'population', 'households'], axis = 1, inplace = True)
df.describe()

Unnamed: 0,longitude,latitude,housing_median_age,median_income,median_house_value,num_rooms,num_bedrooms,persons_per_house
count,17000.0,17000.0,17000.0,17000.0,17000.0,17000.0,17000.0,17000.0
mean,-119.6,35.6,28.6,3.9,207300.9,5.4,1.1,3.0
std,2.0,2.1,12.6,1.9,115983.8,2.5,0.5,4.0
min,-124.3,32.5,1.0,0.5,14999.0,0.8,0.3,0.7
25%,-121.8,33.9,18.0,2.6,119400.0,4.4,1.0,2.4
50%,-118.5,34.2,29.0,3.5,180400.0,5.2,1.0,2.8
75%,-118.0,37.7,37.0,4.8,265000.0,6.1,1.1,3.3
max,-114.3,42.0,52.0,15.0,500001.0,141.9,34.1,502.5


## Build a neural network model

In this exercise, we'll be trying to predict `median_house_value`. It will be our label (sometimes also called a target). We'll use the remaining columns as our input features.

To train our model, we'll first use the [LinearRegressor](https://www.tensorflow.org/api_docs/python/tf/contrib/learn/LinearRegressor) interface. Then, we'll change to DNNRegressor


In [7]:
featcols = {
  colname : tf.feature_column.numeric_column(colname) \
    for colname in 'housing_median_age,median_income,num_rooms,num_bedrooms,persons_per_house'.split(',')
}
# Bucketize lat, lon so it's not so high-res; California is mostly N-S, so more lats than lons
featcols['longitude'] = tf.feature_column.bucketized_column(tf.feature_column.numeric_column('longitude'),
                                                   np.linspace(-124.3, -114.3, 5).tolist())
featcols['latitude'] = tf.feature_column.bucketized_column(tf.feature_column.numeric_column('latitude'),
                                                  np.linspace(32.5, 42, 10).tolist())

In [8]:
featcols.keys()

dict_keys(['longitude', 'housing_median_age', 'num_bedrooms', 'num_rooms', 'median_income', 'persons_per_house', 'latitude'])

In [9]:
# Split into train and eval
msk = np.random.rand(len(df)) < 0.8
traindf = df[msk]
evaldf = df[~msk]

SCALE = 100000
BATCH_SIZE= 100
OUTDIR = './housing_trained'
train_input_fn = tf.estimator.inputs.pandas_input_fn(x = traindf[list(featcols.keys())],
                                                    y = traindf["median_house_value"] / SCALE,
                                                    num_epochs = None,
                                                    batch_size = BATCH_SIZE,
                                                    shuffle = True)
eval_input_fn = tf.estimator.inputs.pandas_input_fn(x = evaldf[list(featcols.keys())],
                                                    y = evaldf["median_house_value"] / SCALE,  # note the scaling
                                                    num_epochs = 1, 
                                                    batch_size = len(evaldf), 
                                                    shuffle=False)

In [10]:
# Linear Regressor
def train_and_evaluate(output_dir, num_train_steps):
  myopt = tf.train.FtrlOptimizer(learning_rate = 0.01) # note the learning rate
  estimator = tf.estimator.LinearRegressor(
                       model_dir = output_dir, 
                       feature_columns = featcols.values(),
                       optimizer = myopt)
  
  #Add rmse evaluation metric
  def rmse(labels, predictions):
    pred_values = tf.cast(predictions['predictions'],tf.float64)
    return {'rmse': tf.metrics.root_mean_squared_error(labels*SCALE, pred_values*SCALE)}
  estimator = tf.contrib.estimator.add_metrics(estimator,rmse)
  
  train_spec=tf.estimator.TrainSpec(
                       input_fn = train_input_fn,
                       max_steps = num_train_steps)
  eval_spec=tf.estimator.EvalSpec(
                       input_fn = eval_input_fn,
                       steps = None,
                       start_delay_secs = 1, # start evaluating after N seconds
                       throttle_secs = 10,  # evaluate every N seconds
                       )
  tf.estimator.train_and_evaluate(estimator, train_spec, eval_spec)

# Run training    
shutil.rmtree(OUTDIR, ignore_errors = True) # start fresh each time
train_and_evaluate(OUTDIR, num_train_steps = (100 * len(traindf)) / BATCH_SIZE) 

INFO:tensorflow:Using default config.
INFO:tensorflow:Using config: {'_train_distribute': None, '_tf_random_seed': None, '_log_step_count_steps': 100, '_session_config': None, '_evaluation_master': '', '_save_checkpoints_secs': 600, '_num_worker_replicas': 1, '_global_id_in_cluster': 0, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7f5b506603c8>, '_task_id': 0, '_save_checkpoints_steps': None, '_task_type': 'worker', '_model_dir': './housing_trained', '_num_ps_replicas': 0, '_service': None, '_keep_checkpoint_every_n_hours': 10000, '_keep_checkpoint_max': 5, '_save_summary_steps': 100, '_master': '', '_is_chief': True}
INFO:tensorflow:Using config: {'_train_distribute': None, '_tf_random_seed': None, '_log_step_count_steps': 100, '_session_config': None, '_evaluation_master': '', '_save_checkpoints_secs': 600, '_num_worker_replicas': 1, '_global_id_in_cluster': 0, '_task_type': 'worker', '_task_id': 0, '_cluster_spec': <tensorflow.python.training.serve

INFO:tensorflow:Finished evaluation at 2019-02-02-17:07:37
INFO:tensorflow:Saving dict for global step 4699: average_loss = 0.59582, global_step = 4699, loss = 2008.5093, rmse = 77189.38
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from ./housing_trained/model.ckpt-4699
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Saving checkpoints for 4700 into ./housing_trained/model.ckpt.
INFO:tensorflow:step = 4700, loss = 46.35079
INFO:tensorflow:global_step/sec: 182.351
INFO:tensorflow:step = 4800, loss = 145.38248 (0.553 sec)
INFO:tensorflow:global_step/sec: 302.016
INFO:tensorflow:step = 4900, loss = 36.628647 (0.330 sec)
INFO:tensorflow:global_step/sec: 285.103
INFO:tensorflow:step = 5000, loss = 39.54201 (0.351 sec)
INFO:te

INFO:tensorflow:step = 10031, loss = 62.619583 (0.355 sec)
INFO:tensorflow:global_step/sec: 282.861
INFO:tensorflow:step = 10131, loss = 52.038105 (0.353 sec)
INFO:tensorflow:global_step/sec: 280.657
INFO:tensorflow:step = 10231, loss = 37.92759 (0.356 sec)
INFO:tensorflow:global_step/sec: 277.37
INFO:tensorflow:step = 10331, loss = 27.432978 (0.360 sec)
INFO:tensorflow:global_step/sec: 280.049
INFO:tensorflow:step = 10431, loss = 58.690197 (0.357 sec)
INFO:tensorflow:global_step/sec: 293.284
INFO:tensorflow:step = 10531, loss = 55.228306 (0.342 sec)
INFO:tensorflow:global_step/sec: 284.808
INFO:tensorflow:step = 10631, loss = 44.201004 (0.354 sec)
INFO:tensorflow:global_step/sec: 284.061
INFO:tensorflow:step = 10731, loss = 24.746 (0.348 sec)
INFO:tensorflow:global_step/sec: 295.432
INFO:tensorflow:step = 10831, loss = 24.57989 (0.338 sec)
INFO:tensorflow:global_step/sec: 292.005
INFO:tensorflow:step = 10931, loss = 119.93046 (0.345 sec)
INFO:tensorflow:global_step/sec: 245.176
INFO:t

In [11]:
# DNN Regressor
def train_and_evaluate(output_dir, num_train_steps):
  myopt = tf.train.FtrlOptimizer(learning_rate = 0.01) # note the learning rate
  estimator = tf.estimator.DNNRegressor(model_dir = output_dir,
                                hidden_units = [100, 50, 20],
                                feature_columns = featcols.values(),
                                optimizer = myopt,
                                dropout = 0.1)
  
  #Add rmse evaluation metric
  def rmse(labels, predictions):
    pred_values = tf.cast(predictions['predictions'],tf.float64)
    return {'rmse': tf.metrics.root_mean_squared_error(labels*SCALE, pred_values*SCALE)}
  estimator = tf.contrib.estimator.add_metrics(estimator,rmse)
  
  train_spec=tf.estimator.TrainSpec(
                       input_fn = train_input_fn,
                       max_steps = num_train_steps)
  eval_spec=tf.estimator.EvalSpec(
                       input_fn = eval_input_fn,
                       steps = None,
                       start_delay_secs = 1, # start evaluating after N seconds
                       throttle_secs = 10,  # evaluate every N seconds
                       )
  tf.estimator.train_and_evaluate(estimator, train_spec, eval_spec)

# Run training    
shutil.rmtree(OUTDIR, ignore_errors = True) # start fresh each time
train_and_evaluate(OUTDIR, num_train_steps = (100 * len(traindf)) / BATCH_SIZE) 

INFO:tensorflow:Using default config.
INFO:tensorflow:Using config: {'_train_distribute': None, '_tf_random_seed': None, '_log_step_count_steps': 100, '_session_config': None, '_evaluation_master': '', '_save_checkpoints_secs': 600, '_num_worker_replicas': 1, '_global_id_in_cluster': 0, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7f5b50652320>, '_task_id': 0, '_save_checkpoints_steps': None, '_task_type': 'worker', '_model_dir': './housing_trained', '_num_ps_replicas': 0, '_service': None, '_keep_checkpoint_every_n_hours': 10000, '_keep_checkpoint_max': 5, '_save_summary_steps': 100, '_master': '', '_is_chief': True}
INFO:tensorflow:Using config: {'_train_distribute': None, '_tf_random_seed': None, '_log_step_count_steps': 100, '_session_config': None, '_evaluation_master': '', '_save_checkpoints_secs': 600, '_num_worker_replicas': 1, '_global_id_in_cluster': 0, '_task_type': 'worker', '_task_id': 0, '_cluster_spec': <tensorflow.python.training.serve

INFO:tensorflow:global_step/sec: 227.166
INFO:tensorflow:step = 4149, loss = 82.24969 (0.441 sec)
INFO:tensorflow:global_step/sec: 215.133
INFO:tensorflow:step = 4249, loss = 76.4896 (0.462 sec)
INFO:tensorflow:global_step/sec: 273.239
INFO:tensorflow:step = 4349, loss = 81.23194 (0.366 sec)
INFO:tensorflow:global_step/sec: 250.613
INFO:tensorflow:step = 4449, loss = 31.472317 (0.399 sec)
INFO:tensorflow:global_step/sec: 262.418
INFO:tensorflow:step = 4549, loss = 53.00525 (0.381 sec)
INFO:tensorflow:global_step/sec: 243.144
INFO:tensorflow:step = 4649, loss = 52.434288 (0.412 sec)
INFO:tensorflow:global_step/sec: 216.314
INFO:tensorflow:step = 4749, loss = 59.213055 (0.462 sec)
INFO:tensorflow:global_step/sec: 229.092
INFO:tensorflow:step = 4849, loss = 52.155476 (0.437 sec)
INFO:tensorflow:global_step/sec: 219.828
INFO:tensorflow:step = 4949, loss = 35.350407 (0.454 sec)
INFO:tensorflow:global_step/sec: 249.995
INFO:tensorflow:step = 5049, loss = 33.156456 (0.400 sec)
INFO:tensorflow

INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from ./housing_trained/model.ckpt-9726
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Finished evaluation at 2019-02-02-17:09:22
INFO:tensorflow:Saving dict for global step 9726: average_loss = 0.43431535, global_step = 9726, loss = 1464.077, rmse = 65902.61
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from ./housing_trained/model.ckpt-9726
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Saving checkpoints for 9727 into ./housing_trained/model.ckpt.
INFO:tensorflow:step = 9727, loss = 70.819176
INFO:tensorflow:global_step/sec: 168.543
INFO:tensorflow:step = 9827, loss = 50.502472 (0.598 sec)
INFO:

In [12]:
from google.datalab.ml import TensorBoard
pid = TensorBoard().start(OUTDIR)

In [13]:
TensorBoard().stop(pid)