# Hand tuning hyperparameters

**Learning Objectives:**
  * Use the `LinearRegressor` class in TensorFlow to predict median housing price, at the granularity of city blocks, based on one input feature
  * Evaluate the accuracy of a model's predictions using Root Mean Squared Error (RMSE)
  * Improve the accuracy of a model by hand-tuning its hyperparameters

The data is based on 1990 census data from California. This data is at the city block level, so these features reflect the total number of rooms in that block, or the total number of people who live on that block, respectively.  Using only one input feature -- the number of rooms -- predict house value.

## Set Up
In this first cell, we'll load the necessary libraries.

In [2]:
import math
import shutil
import numpy as np
import pandas as pd
import tensorflow as tf

tf.logging.set_verbosity(tf.logging.INFO)
pd.options.display.max_rows = 10
pd.options.display.float_format = '{:.1f}'.format

Next, we'll load our data set.

In [3]:
df = pd.read_csv("https://storage.googleapis.com/ml_universities/california_housing_train.csv", sep=",")

## Examine the data

It's a good idea to get to know your data a little bit before you work with it.

We'll print out a quick summary of a few useful statistics on each column.

This will include things like mean, standard deviation, max, min, and various quantiles.

In [4]:
df.head()

Unnamed: 0,longitude,latitude,housing_median_age,total_rooms,total_bedrooms,population,households,median_income,median_house_value
0,-114.3,34.2,15.0,5612.0,1283.0,1015.0,472.0,1.5,66900.0
1,-114.5,34.4,19.0,7650.0,1901.0,1129.0,463.0,1.8,80100.0
2,-114.6,33.7,17.0,720.0,174.0,333.0,117.0,1.7,85700.0
3,-114.6,33.6,14.0,1501.0,337.0,515.0,226.0,3.2,73400.0
4,-114.6,33.6,20.0,1454.0,326.0,624.0,262.0,1.9,65500.0


In [5]:
df.describe()

Unnamed: 0,longitude,latitude,housing_median_age,total_rooms,total_bedrooms,population,households,median_income,median_house_value
count,17000.0,17000.0,17000.0,17000.0,17000.0,17000.0,17000.0,17000.0,17000.0
mean,-119.6,35.6,28.6,2643.7,539.4,1429.6,501.2,3.9,207300.9
std,2.0,2.1,12.6,2179.9,421.5,1147.9,384.5,1.9,115983.8
min,-124.3,32.5,1.0,2.0,1.0,3.0,1.0,0.5,14999.0
25%,-121.8,33.9,18.0,1462.0,297.0,790.0,282.0,2.6,119400.0
50%,-118.5,34.2,29.0,2127.0,434.0,1167.0,409.0,3.5,180400.0
75%,-118.0,37.7,37.0,3151.2,648.2,1721.0,605.2,4.8,265000.0
max,-114.3,42.0,52.0,37937.0,6445.0,35682.0,6082.0,15.0,500001.0


In this exercise, we'll be trying to predict median_house_value. It will be our label (sometimes also called a target). Can we use total_rooms as our input feature?  What's going on with the values for that feature?

This data is at the city block level, so these features reflect the total number of rooms in that block, or the total number of people who live on that block, respectively.  Let's create a different, more appropriate feature.  Because we are predicing the price of a single house, we should try to make all our features correspond to a single house as well

In [6]:
df['num_rooms'] = df['total_rooms'] / df['households']
df.describe()

Unnamed: 0,longitude,latitude,housing_median_age,total_rooms,total_bedrooms,population,households,median_income,median_house_value,num_rooms
count,17000.0,17000.0,17000.0,17000.0,17000.0,17000.0,17000.0,17000.0,17000.0,17000.0
mean,-119.6,35.6,28.6,2643.7,539.4,1429.6,501.2,3.9,207300.9,5.4
std,2.0,2.1,12.6,2179.9,421.5,1147.9,384.5,1.9,115983.8,2.5
min,-124.3,32.5,1.0,2.0,1.0,3.0,1.0,0.5,14999.0,0.8
25%,-121.8,33.9,18.0,1462.0,297.0,790.0,282.0,2.6,119400.0,4.4
50%,-118.5,34.2,29.0,2127.0,434.0,1167.0,409.0,3.5,180400.0,5.2
75%,-118.0,37.7,37.0,3151.2,648.2,1721.0,605.2,4.8,265000.0,6.1
max,-114.3,42.0,52.0,37937.0,6445.0,35682.0,6082.0,15.0,500001.0,141.9


## Build the first model

In this exercise, we'll be trying to predict `median_house_value`. It will be our label (sometimes also called a target). We'll use `num_rooms` as our input feature.

To train our model, we'll use the [LinearRegressor](https://www.tensorflow.org/api_docs/python/tf/contrib/learn/LinearRegressor) interface. The Estimator takes care of a lot of the plumbing, and exposes a convenient way to interact with data, training, and evaluation.


In [7]:
train_fn = tf.estimator.inputs.pandas_input_fn(x = df[["num_rooms"]],
                                              y = df["median_house_value"],
                                              num_epochs = 1,
                                              shuffle = True)

features = [tf.feature_column.numeric_column('num_rooms')]
outdir = './housing_trained'
shutil.rmtree(outdir, ignore_errors = True) # start fresh each time
model = tf.estimator.LinearRegressor(model_dir = outdir, feature_columns = features)
model.train(input_fn = train_fn, steps = 100)
def print_rmse(model, name, input_fn):
  metrics = model.evaluate(input_fn = input_fn, steps = 1)
  print 'RMSE on {} dataset = {}'.format(name, np.sqrt(metrics['loss']))
print_rmse(model, 'training', train_fn)

INFO:tensorflow:Using default config.
INFO:tensorflow:Using config: {'_save_checkpoints_secs': 600, '_session_config': None, '_keep_checkpoint_max': 5, '_task_type': 'worker', '_is_chief': True, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7f21487f3f10>, '_save_checkpoints_steps': None, '_keep_checkpoint_every_n_hours': 10000, '_service': None, '_num_ps_replicas': 0, '_tf_random_seed': None, '_master': '', '_num_worker_replicas': 1, '_task_id': 0, '_log_step_count_steps': 100, '_model_dir': './housing_trained', '_save_summary_steps': 100}
INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Saving checkpoints for 1 into ./housing_trained/model.ckpt.
INFO:tensorflow:loss = 10836183000000.0, step = 1
INFO:tensorflow:Saving checkpoints for 100 into ./housing_trained/model.ckpt.
INFO:tensorflow:Loss for final step: 5202707000000.0.
INFO:tensorflow:Starting evaluation at 2018-02-06-06:02:37
INFO:tensorflow:Restoring parameters from ./housing_trained

## 1. Scale the output
Let's scale the target values so that the default parameters are more appropriate.  Note that the RMSE here is now in 100000s so if you get RMSE=0.9, it really means RMSE=90000.

In [8]:
SCALE = 100000
train_fn = tf.estimator.inputs.pandas_input_fn(x = df[["num_rooms"]],
                                              y = df["median_house_value"] / SCALE,  # note the scaling
                                              num_epochs = 1,
                                              shuffle = True)

features = [tf.feature_column.numeric_column('num_rooms')]
outdir = './housing_trained'
shutil.rmtree(outdir, ignore_errors = True) # start fresh each time
model = tf.estimator.LinearRegressor(model_dir = outdir, feature_columns = features)
model.train(input_fn = train_fn, steps = 100)
print_rmse(model, 'training', train_fn)

INFO:tensorflow:Using default config.
INFO:tensorflow:Using config: {'_save_checkpoints_secs': 600, '_session_config': None, '_keep_checkpoint_max': 5, '_task_type': 'worker', '_is_chief': True, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7f21487f3bd0>, '_save_checkpoints_steps': None, '_keep_checkpoint_every_n_hours': 10000, '_service': None, '_num_ps_replicas': 0, '_tf_random_seed': None, '_master': '', '_num_worker_replicas': 1, '_task_id': 0, '_log_step_count_steps': 100, '_model_dir': './housing_trained', '_save_summary_steps': 100}
INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Saving checkpoints for 1 into ./housing_trained/model.ckpt.
INFO:tensorflow:loss = 357.06677, step = 1
INFO:tensorflow:Saving checkpoints for 100 into ./housing_trained/model.ckpt.
INFO:tensorflow:Loss for final step: 73.85011.
INFO:tensorflow:Starting evaluation at 2018-02-06-06:02:50
INFO:tensorflow:Restoring parameters from ./housing_trained/model.ckpt-10

## 2. Change learning rate and batch size
Can you come up with better parameters?  (of course, we are doing this only on the training dataset -- normally, you evaluate on a separate validation dataset).

In [10]:
# I got RMSE = 0.528. Did you do better?
SCALE = 100000
train_fn = tf.estimator.inputs.pandas_input_fn(x = df[["num_rooms"]],
                                              y = df["median_house_value"] / SCALE,  # note the scaling
                                              num_epochs = 1,
                                              batch_size = 10, # note the batch size
                                              shuffle = True)

features = [tf.feature_column.numeric_column('num_rooms')]
outdir = './housing_trained'
shutil.rmtree(outdir, ignore_errors = True) # start fresh each time
myopt = tf.train.FtrlOptimizer(learning_rate = 0.01) # note the learning rate
model = tf.estimator.LinearRegressor(model_dir = outdir,
                                   feature_columns = features,
                                   optimizer = myopt)
model.train(input_fn = train_fn, steps = 10000)  # note: more steps, since batchsize is smaller
print_rmse(model, 'training', train_fn)

INFO:tensorflow:Using default config.
INFO:tensorflow:Using config: {'_save_checkpoints_secs': 600, '_session_config': None, '_keep_checkpoint_max': 5, '_task_type': 'worker', '_is_chief': True, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7f21347b9ad0>, '_save_checkpoints_steps': None, '_keep_checkpoint_every_n_hours': 10000, '_service': None, '_num_ps_replicas': 0, '_tf_random_seed': None, '_master': '', '_num_worker_replicas': 1, '_task_id': 0, '_log_step_count_steps': 100, '_model_dir': './housing_trained', '_save_summary_steps': 100}
INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Saving checkpoints for 1 into ./housing_trained/model.ckpt.
INFO:tensorflow:loss = 97.87912, step = 1
INFO:tensorflow:global_step/sec: 431.079
INFO:tensorflow:loss = 9.867794, step = 101 (0.234 sec)
INFO:tensorflow:global_step/sec: 450.944
INFO:tensorflow:loss = 16.647133, step = 201 (0.221 sec)
INFO:tensorflow:global_step/sec: 487.467
INFO:tensorflow:loss =

### Is there a standard method for tuning the model?

This is a commonly asked question. The short answer is that the effects of different hyperparameters is data dependent.  So there are no hard and fast rules; you'll need to run tests on your data.

Here are a few rules of thumb that may help guide you:

 * Training error should steadily decrease, steeply at first, and should eventually plateau as training converges.
 * If the training has not converged, try running it for longer.
 * If the training error decreases too slowly, increasing the learning rate may help it decrease faster.
   * But sometimes the exact opposite may happen if the learning rate is too high.
 * If the training error varies wildly, try decreasing the learning rate.
   * Lower learning rate plus larger number of steps or larger batch size is often a good combination.
 * Very small batch sizes can also cause instability.  First try larger values like 100 or 1000, and decrease until you see degradation.

Again, never go strictly by these rules of thumb, because the effects are data dependent.  Always experiment and verify.

### 3: Try adding more features

See if you can do any better by adding more features.

Don't take more than 5 minutes on this portion.