# Median House Value Assesment Activity

This California Housing Prices dataset has been downloaded from StatLib repository (http://lib.stat.cmu.edu/datasets/). It is based on data from the 1990 California census, what is not important for deep learning. The original dataset appeared in R. Kelley Pace and Ronald Barry, “Sparse Spatial Autoregressions,” Statistics & Probability Letters 33, no. 3 (1997): 291–297.

<b>MedianHouseValuePreparedCleanAttributes.csv</b><br>The original dataset contained 20,640 instances, which is cleaned, preprocessed and prepared in this notebook. After this phase of data preparation, a final dataset of 20,433 instances are obtained with 8 attributes individually normalized with a min-max scaling, $\frac{x-min}{max-min}$ (InputsMedianHouseValueNormalized.csv): $longitude$ and $latitude$ (location), $median age$, $total rooms$, $total bedrooms$, $population$, $households$ and $median income$.  

From this data, the classification problem consists on estimating the median house value, categorized into the following 10 clases (price intervals in thousand dollards): [15.0, 82.3], [82.4, 107.3], [107.4, 133.9], [134.0, 157.3], [157.4, 179.7], [179.8, 209.4], [209.5, 241.9], [242.0, 290.0], [290.1, 376.6] and [376.7, 500.0]. Each class is labelled from 0 (the cheapest) to 9 (the most expensive), and one-hot encoded in <b>MedianHouseValueOneHotEncodedClasses.csv</b> file.

In [None]:
import numpy as np
import pandas as pd
import tensorflow as tf
import matplotlib.pyplot as plt
from tqdm import tqdm

In [None]:
%run 1.ReadingData.py

## Initialization

In [None]:
INPUTS = x_train.shape[1]
OUTPUTS = t_train.shape[1]
NUM_TRAINING_EXAMPLES = int(round(x_train.shape[0]/1))
NUM_DEV_EXAMPLES = int (round (x_dev.shape[0]/1))
NUM_TEST_EXAMPLES = int (round (x_test.shape[0]/1))


# Network Parameters
n_hidden_1 = 256 # 1st layer number of neurons
n_hidden_2 = 256 # 2nd layer number of neurons
n_input = x_train.shape[1] # data input
n_classes = t_train.shape[1] # total classes (0-9 digits)

Some data is displayed to test the correctness:

In [None]:
INPUTS #Should be 8

In [None]:
OUTPUTS #Should be 10

In [None]:
NUM_TRAINING_EXAMPLES #16346

In [None]:
NUM_DEV_EXAMPLES #2043

In [None]:
NUM_TEST_EXAMPLES #2044

## Hyperparameters

Some hyperparameters given as example (they may not be the right ones):

In [None]:
# Hyper-Parameters
learning_rate = 0.001
training_epochs = 150
batch_size = 32
display_step = 1

## Architecture

In [None]:
import tensorflow as tf



# tf Graph input
X = tf.placeholder("float", [None, n_input])
Y = tf.placeholder("float", [None, n_classes])

# Store layers weight & bias
weights = {
    'h1': tf.Variable(tf.random_normal([n_input, n_hidden_1])),
    'h2': tf.Variable(tf.random_normal([n_hidden_1, n_hidden_2])),
    'out': tf.Variable(tf.random_normal([n_hidden_2, n_classes]))
}
biases = {
    'b1': tf.Variable(tf.random_normal([n_hidden_1])),
    'b2': tf.Variable(tf.random_normal([n_hidden_2])),
    'out': tf.Variable(tf.random_normal([n_classes]))
}


# Create model
def multilayer_perceptron(x):
    # Hidden fully connected layer with 256 neurons
    layer_1 = tf.add(tf.matmul(x, weights['h1']), biases['b1'])
    # Hidden fully connected layer with 256 neurons
    layer_2 = tf.add(tf.matmul(layer_1, weights['h2']), biases['b2'])
    # Output fully connected layer with a neuron for each class
    out_layer = tf.matmul(layer_2, weights['out']) + biases['out']
    return out_layer

# Construct model
logits = multilayer_perceptron(X)

## Define loss and 3 different optimizers

In [None]:
# Define loss and 3 different optimizers
loss_op = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(
    logits=logits, labels=Y))

# Adam
adam = tf.train.AdamOptimizer(learning_rate=learning_rate)

# Training algorithm --> Gradient Descent method with a softmax function at the outputs
grad_descent = tf.train.GradientDescentOptimizer (learning_rate)


# Exponential decay -->

# Values
initial_learning_rate = 0.1
decay_steps = 10000
decay_rate = 0.96

# For the computation graph:
global_step = tf.Variable(0, trainable=False, name="global_step")

exp_decay = tf.train.exponential_decay(initial_learning_rate, global_step, decay_steps,
decay_rate)

optimizer = [adam,grad_descent,exp_decay] 

train_op = optimizer[0].minimize(loss_op)

## Training

In [None]:
# Initializing the variables
init = tf.global_variables_initializer()

with tf.Session() as sess:
    sess.run(init)

    # Training cycle
    for epoch in range(training_epochs):
        avg_cost = 0.
        total_batch = int(NUM_TRAINING_EXAMPLES/batch_size)
        # Loop over all batches
        for i in range(total_batch):
            batch_x, batch_y = x_train[offset:(offset+batch_size)],t_train[offset:(offset+batch_size)]
            # Run optimization op (backprop) and cost op (to get loss value)
            _, c = sess.run([train_op, loss_op], feed_dict={X: batch_x,
                                                            Y: batch_y})
            # Compute average loss
            avg_cost += c / total_batch

        # Display logs per epoch step
        if epoch % display_step == 0:
            print("Epoch:", '%04d' % (epoch+1), "cost={:.9f}".format(avg_cost))
    print("Optimization Finished!")
    
    # Test dev
    pred = tf.nn.softmax(logits)  # Apply softmax to logits
    correct_prediction = tf.equal(tf.argmax(pred, 1), tf.argmax(Y, 1))
    # Calculate accuracy
    accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))
    print("Accuracy:", accuracy.eval({X: x_dev, Y: t_dev}))
    
    # Uncomment when doing the final Acurracy Measure with the Test Set
    
    # # Test model
    # pred = tf.nn.softmax(logits)  # Apply softmax to logits
    # correct_prediction = tf.equal(tf.argmax(pred, 1), tf.argmax(Y, 1))
    # # Calculate accuracy
    # accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))
    # print("Accuracy:", accuracy.eval({X: x_test, Y: t_test}))

