# Median House Value Assesment Activity

This California Housing Prices dataset has been downloaded from StatLib repository (http://lib.stat.cmu.edu/datasets/). It is based on data from the 1990 California census, what is not important for deep learning. The original dataset appeared in R. Kelley Pace and Ronald Barry, “Sparse Spatial Autoregressions,” Statistics & Probability Letters 33, no. 3 (1997): 291–297.

<b>MedianHouseValuePreparedCleanAttributes.csv</b><br>The original dataset contained 20,640 instances, which is cleaned, preprocessed and prepared in this notebook. After this phase of data preparation, a final dataset of 20,433 instances are obtained with 8 attributes individually normalized with a min-max scaling, $\frac{x-min}{max-min}$ (InputsMedianHouseValueNormalized.csv): $longitude$ and $latitude$ (location), $median age$, $total rooms$, $total bedrooms$, $population$, $households$ and $median income$.  

From this data, the classification problem consists on estimating the median house value, categorized into the following 10 clases (price intervals in thousand dollards): [15.0, 82.3], [82.4, 107.3], [107.4, 133.9], [134.0, 157.3], [157.4, 179.7], [179.8, 209.4], [209.5, 241.9], [242.0, 290.0], [290.1, 376.6] and [376.7, 500.0]. Each class is labelled from 0 (the cheapest) to 9 (the most expensive), and one-hot encoded in <b>MedianHouseValueOneHotEncodedClasses.csv</b> file.

In [1]:
import numpy as np
import pandas as pd
import tensorflow as tf
import matplotlib.pyplot as plt
from tqdm import tqdm

ModuleNotFoundError: No module named 'tensorflow'

In [2]:
%run 1.ReadingData.py

x_train: (16346, 8)
t_train: (16346, 10)
x_dev: (2043, 8)
t_dev: (2043, 10)
x_test: (2044, 8)
t_test: (2044, 10)


## Initialization

In [3]:
INPUTS = x_train.shape[1]
OUTPUTS = t_train.shape[1]
NUM_TRAINING_EXAMPLES = int(round(x_train.shape[0]/1))
NUM_DEV_EXAMPLES = int (round (x_dev.shape[0]/1))
NUM_TEST_EXAMPLES = int (round (x_test.shape[0]/1))

Some data is displayed to test the correctness:

In [4]:
INPUTS #Should be 8

8

In [5]:
OUTPUTS #Should be 10

10

In [6]:
NUM_TRAINING_EXAMPLES #16346

16346

In [7]:
NUM_DEV_EXAMPLES #2043

2043

In [8]:
NUM_TEST_EXAMPLES #2044

2044

In [9]:
x_train[:5]

array([[-0.50996016,  0.01381509,  0.80392157, -0.84821201, -0.80571074,
        -0.92174669, -0.79871732, -0.55233721],
       [-0.5059761 ,  0.0053135 , -0.41176471, -0.83234142, -0.8603352 ,
        -0.91373077, -0.86120704, -0.14394284],
       [-0.55577689,  0.1370882 ,  0.1372549 , -0.835953  , -0.77281192,
        -0.92953838, -0.77536589, -0.4999931 ],
       [ 0.39442231, -0.70031881,  0.05882353, -0.91617071, -0.93513346,
        -0.96894532, -0.93093241, -0.01197225],
       [ 0.21713147, -0.70244421,  0.60784314, -0.98077217, -0.96741155,
        -0.9771294 , -0.96743957, -0.93509055]])

In [10]:
t_train[:5]

array([[0., 0., 0., 0., 0., 0., 0., 1., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 1., 0.],
       [0., 0., 0., 0., 0., 0., 0., 1., 0., 0.],
       [0., 0., 0., 0., 0., 1., 0., 0., 0., 0.],
       [0., 1., 0., 0., 0., 0., 0., 0., 0., 0.]])

In [11]:
x_dev[:5]

array([[ 0.33665339, -0.67693943,  0.64705882, -0.96128999, -0.93234016,
        -0.95246504, -0.93520802, -0.83044372],
       [ 0.1812749 , -0.68544102,  0.29411765, -0.84775421, -0.70794538,
        -0.85296673, -0.68886696, -0.75996193],
       [ 0.12549801, -0.30286929, -0.25490196, -0.76397579, -0.59683426,
        -0.96328372, -0.91909225, -0.73878981],
       [ 0.45418327, -0.98512221, -0.01960784, -0.92059616, -0.91154562,
        -0.93598475, -0.8973853 , -0.53633743],
       [ 0.29880478, -0.65993624, -0.01960784, -0.58024315, -0.6424581 ,
        -0.80621654, -0.6283506 , -0.16892181]])

In [12]:
t_dev[:5]

array([[1., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 1.],
       [1., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 1., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 1., 0.]])

In [13]:
x_test[:5]

array([[-0.25498008, -0.45589798, -0.37254902, -0.8577242 , -0.80260708,
        -0.88721657, -0.80562407, -0.64777038],
       [-0.59760956,  0.15834219, -0.88235294, -0.79668345, -0.6654252 ,
        -0.91098405, -0.70860056, -0.52950994],
       [ 0.56175299, -0.50903294, -0.45098039, -0.83992065, -0.77901924,
        -0.92836122, -0.84048676, -0.79309251],
       [ 0.28685259, -0.73645058, -0.41176471, -0.67699273, -0.45810056,
        -0.71512655, -0.45798388, -0.67123212],
       [ 0.27689243, -0.7088204 , -0.33333333, -0.55063838, -0.52638113,
        -0.77897363, -0.52310475, -0.29046496]])

In [14]:
t_test[:5]

array([[0., 0., 0., 1., 0., 0., 0., 0., 0., 0.],
       [0., 0., 1., 0., 0., 0., 0., 0., 0., 0.],
       [1., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 1., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 1., 0.]])

## Hyperparameters

Some hyperparameters given as example (they may not be the right ones):

In [12]:
n_epochs = 10000 
learning_rate = 0.1
batch_size = 200
n_neurons_per_layer = [100,50,25,10] 