# Median House Value Assesment Activity

This California Housing Prices dataset has been downloaded from StatLib repository (http://lib.stat.cmu.edu/datasets/). It is based on data from the 1990 California census, what is not important for deep learning. The original dataset appeared in R. Kelley Pace and Ronald Barry, “Sparse Spatial Autoregressions,” Statistics & Probability Letters 33, no. 3 (1997): 291–297.

<b>MedianHouseValuePreparedCleanAttributes.csv</b><br>The original dataset contained 20,640 instances, which is cleaned, preprocessed and prepared in this notebook. After this phase of data preparation, a final dataset of 20,433 instances are obtained with 8 attributes individually normalized with a min-max scaling, $\frac{x-min}{max-min}$ (InputsMedianHouseValueNormalized.csv): $longitude$ and $latitude$ (location), $median age$, $total rooms$, $total bedrooms$, $population$, $households$ and $median income$.  

From this data, the classification problem consists on estimating the median house value, categorized into the following 10 clases (price intervals in thousand dollards): [15.0, 82.3], [82.4, 107.3], [107.4, 133.9], [134.0, 157.3], [157.4, 179.7], [179.8, 209.4], [209.5, 241.9], [242.0, 290.0], [290.1, 376.6] and [376.7, 500.0]. Each class is labelled from 0 (the cheapest) to 9 (the most expensive), and one-hot encoded in <b>MedianHouseValueOneHotEncodedClasses.csv</b> file.

In [1]:
import numpy as np
import pandas as pd
import tensorflow as tf
import matplotlib.pyplot as plt
from tqdm import tqdm

In [2]:
%run 1.ReadingData.py

x_train: (16346, 8)
t_train: (16346, 10)
x_dev: (2043, 8)
t_dev: (2043, 10)
x_test: (2044, 8)
t_test: (2044, 10)


## Initialization

In [3]:
INPUTS = x_train.shape[1]
OUTPUTS = t_train.shape[1]
NUM_TRAINING_EXAMPLES = int(round(x_train.shape[0]/1))
NUM_DEV_EXAMPLES = int (round (x_dev.shape[0]/1))
NUM_TEST_EXAMPLES = int (round (x_test.shape[0]/1))

Some data is displayed to test the correctness:

In [None]:
INPUTS #Should be 8

In [None]:
OUTPUTS #Should be 10

In [None]:
NUM_TRAINING_EXAMPLES #16346

In [None]:
NUM_DEV_EXAMPLES #2043

In [None]:
NUM_TEST_EXAMPLES #2044

In [None]:
x_train[:5]

In [None]:
t_train[:5]

In [None]:
x_dev[:5]

In [None]:
t_dev[:5]

In [None]:
x_test[:5]

In [None]:
t_test[:5]

In [4]:
print(type(x_train))
print('Training shape:', x_train.shape)
print('Training samples: ', x_train.shape[0])
print('Validation samples: ', x_test.shape[0])

<class 'numpy.ndarray'>
Training shape: (16346, 8)
Training samples:  16346
Validation samples:  2044


## Hyperparameters

Some hyperparameters given as example (they may not be the right ones):

In [5]:
n_epochs = 5000 
learning_rate = 0.1
batch_size = 200
n_neurons_per_layer = [150,80,40,10] 

In [6]:
X = tf.placeholder (dtype=tf.float32, shape=(None,INPUTS),name="X")
t = tf.placeholder (dtype=tf.float32, shape=(None,OUTPUTS), name="t")

In [7]:
hidden_layers = []
hidden_layers.append(tf.layers.dense (X, n_neurons_per_layer[0], 
                                      activation = tf.nn.relu))
for layer in range(1,len(n_neurons_per_layer)):
    hidden_layers.append(tf.layers.dense (hidden_layers[layer-1], 
                 n_neurons_per_layer[layer], activation = tf.nn.relu))
net_out = tf.layers.dense (hidden_layers[len(n_neurons_per_layer)-1], OUTPUTS)
y = tf.nn.softmax (logits=net_out, name="y")

In [8]:
for layer in range(len(n_neurons_per_layer)): print (hidden_layers[layer])

Tensor("dense/Relu:0", shape=(?, 150), dtype=float32)
Tensor("dense_1/Relu:0", shape=(?, 80), dtype=float32)
Tensor("dense_2/Relu:0", shape=(?, 40), dtype=float32)
Tensor("dense_3/Relu:0", shape=(?, 10), dtype=float32)


In [9]:
cross_entropy = tf.nn.softmax_cross_entropy_with_logits_v2 (labels=t, logits=net_out)
mean_log_loss = tf.reduce_mean (cross_entropy, name="cost")

In [13]:
train_step = tf.train.GradientDescentOptimizer (learning_rate).minimize(mean_log_loss)

In [14]:
correct_predictions = tf.equal(tf.argmax(y,1),tf.argmax(t,1))
accuracy = tf.reduce_mean(tf.cast(correct_predictions,tf.float32))

In [15]:
initial_learning_rate = 0.1
decay_steps = 10000
decay_rate = 0.96
global_step = tf.Variable(0, trainable=False, name="global_step")
learning_rate = tf.train.exponential_decay(initial_learning_rate,
global_step, decay_steps,
decay_rate)
optimizer = tf.train.MomentumOptimizer(learning_rate, momentum=0.9) 
train_step = optimizer.minimize(mean_log_

In [16]:
init = tf.global_variables_initializer()
accuracy_train_history = []
with tf.Session() as sess:
    sess.run(init)
    for epoch in tqdm(range(n_epochs)):
        offset = (epoch * batch_size) % (NUM_TRAINING_EXAMPLES - batch_size)
        sess.run (train_step, feed_dict={X: x_train[offset:(offset+batch_size)],
                                         t: t_train[offset:(offset+batch_size)]})
            
    accuracy_test = accuracy.eval(feed_dict={X: x_test[:NUM_TEST_EXAMPLES],
                                              t: t_test[:NUM_TEST_EXAMPLES]})
    test_predictions = y.eval(feed_dict={X: x_test[:NUM_TEST_EXAMPLES]})
    
    test_correct_preditions = correct_predictions.eval (feed_dict=
                                    {X: x_test[:NUM_TEST_EXAMPLES],
                                     t: t_test[:NUM_TEST_EXAMPLES]}
    )   


100%|█████████████████████████████████████████████████████████████████████████████| 5000/5000 [00:08<00:00, 585.89it/s]


In [18]:
"Accuracy for the TEST set: " + str(accuracy_test)

'Accuracy for the TEST set: 0.3483366'

In [None]:
tensor_board = TensorBoard('./logs')

In [None]:
test_predictions

In [None]:
test_rounded_predictions=np.round(test_predictions)
indices = np.argmax(test_predictions,1)
for row, index in zip(test_rounded_predictions, indices): row[index]=1
test_rounded_predictions[:10]

In [None]:
test_correct_preditions[:10]