## Predicting House Prices

This tutorial guides you through a regression predictive modeling application: predict the median price of homes in a given Boston suburb in the mid-1970.

# Tools

Keras
Matplot (to graph training history)
Numpy (for data types and mathematical operations)
Scikitlearn (for data normalization)

## Step 1: Load the Boston dataset
We first load the [Boston House Price](https://archive.ics.uci.edu/ml/machine-learning-databases/housing/) dataset, a well-studied problem in machine learning that involves the prediction of a house price in thousands of dollars given details of the house and its neighborhood.

It’s a set of 506 data points that are splitted in the next section into 404 training samples and 102 test samples. The dataset describes 13 numerical properties (such as crime rate, proportion of nonretail business acres, chemical concentrations and more) of houses in Boston suburbs and is concerned with modeling the price of houses in those suburbs in thousands of dollars. 

The following codes load the numerical properties and the according labels. The variables x_train and train_targets form the training set, the data that the model will learn from. The model will then be tested on the test set (test_data and test_targets variables).

In [None]:
import keras
from keras.datasets import boston_housing
(train_data, train_targets), (test_data, test_targets) = boston_housing.load_data() # The dataset will be downloaded the first time.

## Step 2: Preprocess input data for neural network

Show shape (dimensions) of the train and test sets.

In [17]:
print(train_data.shape)
print(test_data.shape)

(404, 13)
(102, 13)


We see that there are 404 and 102 samples for the train and test sets respectively with 13 numerical features each one.

A sample of the first 1 row is listed below.

In [18]:
train_data[0]

array([  1.23247,   0.     ,   8.14   ,   0.     ,   0.538  ,   6.142  ,
        91.7    ,   3.9769 ,   4.     , 307.     ,  21.     , 396.9    ,
        18.72   ])

As we note all the input attributes vary in their scales because they measure different quantities. It's a good practice normalize the data before modeling it using a neural network model: for each feature in the input data (a column in the input data matrix), you subtract the mean of the feature and divide by the standard deviation, so that the feature is centered around 0 and has a unit standard deviation. This is easily done in Numpy.

In [20]:
mean = train_data.mean(axis=0)
train_data -= mean
std = train_data.std(axis=0)
train_data /= std
test_data -= mean
test_data /= std

Note that the quantities used for normalizing the test data are computed using the training data. You should never use in your workflow any quantity computed on the test data, even for something as simple as data normalization.

## Step 3: Define model's arquitecture and layers


Because so few samples are available, we’ll use a very small network. In general, the less training data you have, the worse overfit-ting will be, and using a small network is one way to mitigate overfitting.
<br><br>
Keras support two models: Sequential, a linear stack of layers, and Graph, a directed acyclic graph of layers.

Let's start by declaring a sequencial model, which is the most common network architecture by far.

In [None]:
from keras.models import Sequential
from keras.layers import Dense

model = models.Sequential()


Now declare the input layer.

In [23]:
from keras.layers import Dense

model.add(layers.Dense(64, activation='relu', input_shape=(13,)))

13

The first parameter correspond to the dimensionality of the output space.
The input shape (second parameter) should be the same of the training input (in this case the number of features).
The activation is a function that update the weights of the network to optimize the loss values.

At this point we can add more layers to our model like we're building legos.

The next codes add the second hidden layer to the model:

In [None]:
model.add(layers.Dense(64, activation='relu'))

The next codes add the output layer to the model:

In [None]:
model.add(layers.Dense(1))

The network ends with a single unit and no activation (it will be a linear layer) so the network is free to predict values in any range. This is a typical setup for scalar regression (a regression where you’re trying to predict a single continuous value).

The resulting model is a neural network with two hidden layer with 64 neurons each one, where a rectifier activation function is used for the neuron's weight optimization whose has a output of 1.

## Step 4: Compile model

At this point only we need to define the loss function and the optimizer, and then the model will be ready to train.

Compile a model means the declaration of the loss function and the optimizer (SGD, Adam, etc.).

In [None]:
model.compile(optimizer='rmsprop', loss='mse', metrics=['mae'])

 with two hidden layers, each with 64 units