# Executive Summary

A model below is a common type of machine learning problem known as regression, which consists of predicting a continous value instead of a discrete label: for instance, predicting the temperature tommorrow, given meteorological data.

Model is based on the boston housing price dataset, where we will attempt to predict the median price of homes in a givven boston suburb in the mid-1970s.

# Data summary

The dataset has few data points:
    Total = 506 
        trainig samples 404 and test samples 102
        
Each feature in the input data (for example, the crime rate) has a different scale. For instance, some values are proportions, which take values between 0 and 1; others take values between 1 and 12, others between 0 and 100, and so on.

# Imports

In [1]:
from keras.datasets import boston_housing
from keras import models
from keras import layers
import numpy as np
import pandas as pd


np.set_printoptions(threshold=np.inf)

Using TensorFlow backend.


# Data Loading

In [2]:
(train_data, train_targets), (test_data, train_targets) = boston_housing.load_data()

In [3]:
# Numpy array filled with train data
type(train_data)

numpy.ndarray

In [18]:
df = pd.DataFrame(train_data)
df.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12
0,-0.272246,-0.483615,-0.435762,-0.256833,-0.165227,-0.176443,0.813062,0.116698,-0.626249,-0.59517,1.1485,0.448077,0.82522
1,-0.403427,2.991784,-1.333912,-0.256833,-1.215182,1.894346,-1.910361,1.247585,-0.856463,-0.348433,-1.718189,0.431906,-1.329202
2,0.12494,-0.483615,1.028326,-0.256833,0.628642,-1.829688,1.110488,-1.187439,1.675886,1.565287,0.784476,0.220617,-1.3085
3,-0.401494,-0.483615,-0.869402,-0.256833,-0.36156,-0.324558,-1.236672,1.10718,-0.511142,-1.094663,0.784476,0.448077,-0.652926
4,-0.005634,-0.483615,1.028326,-0.256833,1.328612,0.153642,0.694808,-0.578572,1.675886,1.565287,0.784476,0.389882,0.263497


In [4]:
# Shape of our train data
train_data.shape

(404, 13)

In [5]:
# First row of our train data: These are factors contributing to house price in boston
train_data[0]

array([  1.23247,   0.     ,   8.14   ,   0.     ,   0.538  ,   6.142  ,
        91.7    ,   3.9769 ,   4.     , 307.     ,  21.     , 396.9    ,
        18.72   ])

In [6]:
# Numpy array filled with train targets
type(train_targets)

numpy.ndarray

In [7]:
# Shape of our targets
train_targets.shape

(102,)

In [8]:
# These are the median price used to train 50.0 correspond to median price of $50,000
train_targets

array([ 7.2, 18.8, 19. , 27. , 22.2, 24.5, 31.2, 22.9, 20.5, 23.2, 18.6,
       14.5, 17.8, 50. , 20.8, 24.3, 24.2, 19.8, 19.1, 22.7, 12. , 10.2,
       20. , 18.5, 20.9, 23. , 27.5, 30.1,  9.5, 22. , 21.2, 14.1, 33.1,
       23.4, 20.1,  7.4, 15.4, 23.8, 20.1, 24.5, 33. , 28.4, 14.1, 46.7,
       32.5, 29.6, 28.4, 19.8, 20.2, 25. , 35.4, 20.3,  9.7, 14.5, 34.9,
       26.6,  7.2, 50. , 32.4, 21.6, 29.8, 13.1, 27.5, 21.2, 23.1, 21.9,
       13. , 23.2,  8.1,  5.6, 21.7, 29.6, 19.6,  7. , 26.4, 18.9, 20.9,
       28.1, 35.4, 10.2, 24.3, 43.1, 17.6, 15.4, 16.2, 27.1, 21.4, 21.5,
       22.4, 25. , 16.6, 18.6, 22. , 42.8, 35.1, 21.5, 36. , 21.9, 24.1,
       50. , 26.7, 25. ])

In [9]:
# Total Train data
len(train_data)

404

In [10]:
# Total Test data
len(test_data)

102

# Data Preparation

In [11]:
'''
Data normalization: 
    It would be problematic to feed into a neural network values that all take widely different range.
    A widely best practice to deal with such data is to do feature_wise normalization.
    Aim is to get data where features are centered around 0 and has unit standard deviation.
''' 
# Acquire mean and standard deviation of train data
mean = train_data.mean(axis=0)
std = train_data.std(axis=0)

# Normalize train data by substracting by mean and divide by standard deviation
train_data -= mean
train_data /= std

# Normalize test data with train data mean and standard deviation. The approach correspond with never
# use in your workflow any quantity computed on test data.
test_data -= mean
test_data /= std

In [12]:
# First row of normalized train data
train_data[1].shape

(13,)

In [13]:
# Test data
test_data[0]

array([ 1.55369355, -0.48361547,  1.0283258 , -0.25683275,  1.03838067,
        0.23545815,  1.11048828, -0.93976936,  1.67588577,  1.5652875 ,
        0.78447637, -3.48459553,  2.25092074])

# Build a model

In [14]:
# A Model is constructed on a function, if it might be instantiated multiple times
# Linear layer is used when you're predicting a single continuous value, implemented without activation
# mse (mean squared error), the square of difference btn prdct and the trgt. Used for regression problems
# mae (mean absolute error), is the absolute value of the difference btn the prdct and the trgt
# For example an MAE of 0.5 on this problem would mean your prediction are off by $500 on average
def build_model():
    model = models.Sequential()
    model.add(layers.Dense(64, activation='relu', input_shape=(train_data.shape[1],)))
    model.add(layers.Dense(64, activation='relu'))
    model.add(layers.Dense(1)) # Linear layer
    model.compile(optimizer='rmsprop', loss='mse', metrics=['mae'])
    return model

# Model validation approach

In [15]:
# Due our small sample data we will use k-fold cross validation technique, instead of data split
k = 4
num_val_samples = len(train_data) // k
num_epochs = 100
all_scores = []

for i in range(k):
    print('processing fold #', i)
    val_data = train_data[i * num_val_samples: (i +1 ) * num_val_samples]
    val_targets = train_targets[i * num_val_samples: (i + 1 * num_val_samples)]

processing fold # 0
processing fold # 1
processing fold # 2
processing fold # 3
