# DNN Regression Example

#### Note: This data contains information about all the block groups in California from 1990 Census. In this sample a block group on average includes 1425.5 individuals living in a geographically compact area. The task is to aproximate the median house value of each block from the values of the rest of the variables.

Data was obtained from the LIACC repository. The original page where the data can be found is:
http://www.dcc.fc.up.pt/~ltorgo/Regression/DataSets.html

Features:
- housingMedianAge: continuous
- totalRooms: continuous
- totalBedrooms: continuous
- population: continuous
- households: continuous
- medianIncome: continuous
- medianHouseValue: continuous

### Data Input

In [57]:
import pandas as pd

In [58]:
housing_data = pd.read_csv('cal_housing_clean.csv')

In [59]:
# Check dataframe input
housing_data.head()

Unnamed: 0,housingMedianAge,totalRooms,totalBedrooms,population,households,medianIncome,medianHouseValue
0,41.0,880.0,129.0,322.0,126.0,8.3252,452600.0
1,21.0,7099.0,1106.0,2401.0,1138.0,8.3014,358500.0
2,52.0,1467.0,190.0,496.0,177.0,7.2574,352100.0
3,52.0,1274.0,235.0,558.0,219.0,5.6431,341300.0
4,52.0,1627.0,280.0,565.0,259.0,3.8462,342200.0


### Train/Test Split Data

#### Note: 70% training and 30% testing split

In [60]:
# Initalize predicition column
y_val = housing_data['medianHouseValue']

In [61]:
# Remove predicition column 
x_data = housing_data.drop('medianHouseValue', axis = 1)

In [62]:
from sklearn.model_selection import train_test_split

In [63]:
X_train, X_test, y_train, y_test = train_test_split(x_data, y_val, test_size = 0.3, random_state = 101)

### Scale the Feature Data

In [64]:
from sklearn.preprocessing import MinMaxScaler

In [65]:
scaler = MinMaxScaler()

In [66]:
scaler.fit(X_train)

MinMaxScaler(copy=True, feature_range=(0, 1))

In [67]:
# Reset to scaled version using dataframe format
X_train = pd.DataFrame(data = scaler.transform(X_train),
                      columns = X_train.columns,
                      index = X_train.index)

In [68]:
X_test = pd.DataFrame(data = scaler.transform(X_test),
                     columns = X_test.columns,
                     index = X_test.index)

### Create Feature Columns

In [69]:
housing_data.columns

Index(['housingMedianAge', 'totalRooms', 'totalBedrooms', 'population',
       'households', 'medianIncome', 'medianHouseValue'],
      dtype='object')

In [70]:
import tensorflow as tf

In [71]:
age = tf.feature_column.numeric_column('housingMedianAge')
rooms = tf.feature_column.numeric_column('totalRooms')
bedrooms = tf.feature_column.numeric_column('totalBedrooms')
population = tf.feature_column.numeric_column('population')
households = tf.feature_column.numeric_column('households')
income = tf.feature_column.numeric_column('medianIncome')

In [72]:
feat_cols = [age, rooms, bedrooms, population, households, income]

### Create Input Function for Estimator Object
#### Note: Alter batch size and epochs if needed

In [73]:
input_func = tf.estimator.inputs.pandas_input_fn(x = X_train, 
                                                y = y_train,
                                                batch_size = 10,
                                                num_epochs = 1000,
                                                shuffle = True)

### Create DNN Regression Model
#### Note: Alter hidden units if needed

In [74]:
# 3 hidden layers w/ 6 neurons each
model = tf.estimator.DNNRegressor(hidden_units = [6,6,6], feature_columns = feat_cols)

INFO:tensorflow:Using default config.
INFO:tensorflow:Using config: {'_log_step_count_steps': 100, '_session_config': None, '_model_dir': '/var/folders/yp/0yy3s27j0q5f2frjp3k88yr80000gn/T/tmp1uo_ug7v', '_tf_random_seed': 1, '_keep_checkpoint_every_n_hours': 10000, '_save_checkpoints_secs': 600, '_keep_checkpoint_max': 5, '_save_summary_steps': 100, '_save_checkpoints_steps': None}


#### Note: Training steps can be increased for further improvement

In [75]:
model.train(input_fn = input_func, steps = 20000)

INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Saving checkpoints for 1 into /var/folders/yp/0yy3s27j0q5f2frjp3k88yr80000gn/T/tmp1uo_ug7v/model.ckpt.
INFO:tensorflow:loss = 4.52518e+11, step = 1
INFO:tensorflow:global_step/sec: 890.989
INFO:tensorflow:loss = 6.66012e+11, step = 101 (0.113 sec)
INFO:tensorflow:global_step/sec: 911.692
INFO:tensorflow:loss = 8.70011e+11, step = 201 (0.110 sec)
INFO:tensorflow:global_step/sec: 911.851
INFO:tensorflow:loss = 4.1281e+11, step = 301 (0.110 sec)
INFO:tensorflow:global_step/sec: 851.558
INFO:tensorflow:loss = 4.18364e+11, step = 401 (0.117 sec)
INFO:tensorflow:global_step/sec: 879.848
INFO:tensorflow:loss = 5.90268e+11, step = 501 (0.114 sec)
INFO:tensorflow:global_step/sec: 950.759
INFO:tensorflow:loss = 1.65518e+11, step = 601 (0.105 sec)
INFO:tensorflow:global_step/sec: 903.172
INFO:tensorflow:loss = 2.87331e+11, step = 701 (0.111 sec)
INFO:tensorflow:global_step/sec: 897.481
INFO:tensorflow:loss = 5.23003e+11, step = 801 (0.11

INFO:tensorflow:loss = 4.88424e+10, step = 8001 (0.097 sec)
INFO:tensorflow:global_step/sec: 992.349
INFO:tensorflow:loss = 1.44787e+11, step = 8101 (0.101 sec)
INFO:tensorflow:global_step/sec: 939.603
INFO:tensorflow:loss = 1.01882e+11, step = 8201 (0.106 sec)
INFO:tensorflow:global_step/sec: 973.292
INFO:tensorflow:loss = 1.08853e+11, step = 8301 (0.102 sec)
INFO:tensorflow:global_step/sec: 986.475
INFO:tensorflow:loss = 5.4604e+10, step = 8401 (0.102 sec)
INFO:tensorflow:global_step/sec: 983.332
INFO:tensorflow:loss = 1.27446e+11, step = 8501 (0.102 sec)
INFO:tensorflow:global_step/sec: 988.593
INFO:tensorflow:loss = 5.97934e+10, step = 8601 (0.101 sec)
INFO:tensorflow:global_step/sec: 999.27
INFO:tensorflow:loss = 5.48318e+10, step = 8701 (0.100 sec)
INFO:tensorflow:global_step/sec: 942.508
INFO:tensorflow:loss = 8.90457e+10, step = 8801 (0.106 sec)
INFO:tensorflow:global_step/sec: 985.288
INFO:tensorflow:loss = 4.23766e+10, step = 8901 (0.101 sec)
INFO:tensorflow:global_step/sec: 

INFO:tensorflow:loss = 6.52513e+10, step = 16101 (0.111 sec)
INFO:tensorflow:global_step/sec: 1018.69
INFO:tensorflow:loss = 7.85111e+10, step = 16201 (0.098 sec)
INFO:tensorflow:global_step/sec: 981.431
INFO:tensorflow:loss = 1.02372e+11, step = 16301 (0.102 sec)
INFO:tensorflow:global_step/sec: 985.008
INFO:tensorflow:loss = 4.59429e+10, step = 16401 (0.101 sec)
INFO:tensorflow:global_step/sec: 1025.21
INFO:tensorflow:loss = 5.53386e+10, step = 16501 (0.097 sec)
INFO:tensorflow:global_step/sec: 951.601
INFO:tensorflow:loss = 5.99645e+10, step = 16601 (0.106 sec)
INFO:tensorflow:global_step/sec: 941.07
INFO:tensorflow:loss = 5.31319e+10, step = 16701 (0.105 sec)
INFO:tensorflow:global_step/sec: 1014.79
INFO:tensorflow:loss = 8.97756e+10, step = 16801 (0.099 sec)
INFO:tensorflow:global_step/sec: 1020.76
INFO:tensorflow:loss = 1.07401e+11, step = 16901 (0.098 sec)
INFO:tensorflow:global_step/sec: 977.001
INFO:tensorflow:loss = 1.38021e+11, step = 17001 (0.103 sec)
INFO:tensorflow:global

<tensorflow.python.estimator.canned.dnn.DNNRegressor at 0x10f5a8828>

### Create Prediction Input Function and Predicition List for Testing

In [81]:
predict_input_func = tf.estimator.inputs.pandas_input_fn(x = X_test,
                                                        batch_size = 10,
                                                        num_epochs = 1,
                                                        shuffle = False)

In [82]:
pred_gen = model.predict(predict_input_func)

In [83]:
predictions = list(pred_gen)

INFO:tensorflow:Restoring parameters from /var/folders/yp/0yy3s27j0q5f2frjp3k88yr80000gn/T/tmp1uo_ug7v/model.ckpt-20000


In [84]:
predictions

[{'predictions': array([ 237465.40625], dtype=float32)},
 {'predictions': array([ 311020.1875], dtype=float32)},
 {'predictions': array([ 213563.15625], dtype=float32)},
 {'predictions': array([ 188354.75], dtype=float32)},
 {'predictions': array([ 280090.9375], dtype=float32)},
 {'predictions': array([ 200541.765625], dtype=float32)},
 {'predictions': array([ 225655.0625], dtype=float32)},
 {'predictions': array([ 206938.328125], dtype=float32)},
 {'predictions': array([ 220092.984375], dtype=float32)},
 {'predictions': array([ 197388.53125], dtype=float32)},
 {'predictions': array([ 206271.1875], dtype=float32)},
 {'predictions': array([ 224631.53125], dtype=float32)},
 {'predictions': array([ 193763.109375], dtype=float32)},
 {'predictions': array([ 180074.453125], dtype=float32)},
 {'predictions': array([ 260124.4375], dtype=float32)},
 {'predictions': array([ 180516.53125], dtype=float32)},
 {'predictions': array([ 200643.34375], dtype=float32)},
 {'predictions': array([ 190507.98

### Calculate RMSE

In [85]:
final_preds = []

for pred in predictions:
    final_preds.append(pred['predictions'])

In [86]:
from sklearn.metrics import mean_squared_error

In [87]:
mean_squared_error(y_test, final_preds)**0.5

99140.673592170046

~$100,000

In [88]:
# Compare
housing_data.describe().transpose()

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
housingMedianAge,20640.0,28.639486,12.585558,1.0,18.0,29.0,37.0,52.0
totalRooms,20640.0,2635.763081,2181.615252,2.0,1447.75,2127.0,3148.0,39320.0
totalBedrooms,20640.0,537.898014,421.247906,1.0,295.0,435.0,647.0,6445.0
population,20640.0,1425.476744,1132.462122,3.0,787.0,1166.0,1725.0,35682.0
households,20640.0,499.53968,382.329753,1.0,280.0,409.0,605.0,6082.0
medianIncome,20640.0,3.870671,1.899822,0.4999,2.5634,3.5348,4.74325,15.0001
medianHouseValue,20640.0,206855.816909,115395.615874,14999.0,119600.0,179700.0,264725.0,500001.0
