<a href="https://colab.research.google.com/github/avirichie/Deep-Learning-Models/blob/master/My_Keras04(Boston_Housing).ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
from tensorflow.keras.datasets import boston_housing
(train_X,train_Y),(test_X,test_Y) = boston_housing.load_data()

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/boston_housing.npz


In [2]:
train_X.shape

(404, 13)

In [3]:
test_X.shape

(102, 13)

I will use a very small network with two hidden layers, each with 64 units. In general, the less training data you have, the worse overfitting will be, and using a small network is one way to mitigate overfitting.

In [4]:
#Data Normalization
import numpy as np
mean = train_X.mean(axis=0)
train_X = train_X - mean
std= train_X.std(axis=0)
train_X= train_X/std

test_X -=test_X
test_X/=test_X

  


Note that the quantities used for normalizing the test data are computed using the training data. One should never use in workflow any quantity computed on the test data, even for something as simple as data normalization.

In [0]:
from tensorflow.keras import models
from tensorflow.keras import layers

def build_model():
  model= models.Sequential()
  model.add(layers.Dense(64,activation='relu',input_shape=(test_X.shape[1],)))
  model.add(layers.Dense(64,activation='relu'))
  model.add(layers.Dense(1))
  model.compile(optimizer='rmsprop', loss='mse', metrics=['mae'])
  return model

The network ends with a single unit and no activation (it will be a linear layer). This is a typical setup for scalar regression (a regression where you’re trying to predict a single continuous value). Applying an activation function would constrain the range the output can take; for instance, if you applied a sigmoid activation function to the last layer, the network could only learn to predict values between 0 and 1. Here, because the last layer is purely linear, the network is free to learn to predict values in any range.

In [10]:
#K-fold validation on the training set allows us to partition the data into k different sets where it iteratively runs one of the set as the validation set and remaining K-1 sets
# as training sets. Because the dataset is small running a carving a validation set would only increase the variance in the set. This would prevent relaible evaluation of the model

import numpy as np 

k=4
num_val = len(train_X)//k
num_epoch = 100
all_scores=[]
for i in range(k):
  print("Processing folds:",i)
  val_X= train_X[i*num_val : (i+1)*num_val]
  val_Y= train_Y[i*num_val : (i+1)*num_val]

  model= build_model()
  history=model.fit(x=train_X,
            y=train_Y,
            batch_size=1,
            verbose=0,
            epochs = num_epoch)
  val_mse,val_mae=model.evaluate(val_X,val_Y,verbose=0)
  all_scores.append(val_mse)

Processing folds: 0
Processing folds: 1
Processing folds: 2
Processing folds: 3


In [11]:
print(all_scores)

[3.669307010008557, 2.5869547447355665, 3.94205106839095, 5.4286990708643845]


In [13]:
np.mean(all_scores)

3.9067529734998643