## Predicting house prices: a regression example

* Predicting a continuous value instead of a discrete label

> ### The Boston Housing Price dataset

* We want to predict the median price of homes in a given Boston suburb in the mid-1970s, given the crime rate, the local property tax rate, and so on.
* It has relatively few data points: only 506 (404 training samples and 102 test samples).
* Each feature in the input has a different scale.
  * For instance, some values are proportions, which take values between 0 and 1; others take values between 1 and 12, and so on.

In [None]:
from tensorflow.keras.datasets import boston_housing

(train_data, train_targets), (test_data, test_targets) = boston_housing.load_data()

In [None]:
train_data.shape

In [None]:
train_data[0]

In [None]:
test_data.shape

In [None]:
train_targets

> ### Preparing the data

* It would be problematic to feed into a neural network values that all take wildly different ranges.
* Let's do feature-wise normalization
  * For each feature, we subtract the mean of the feature and divide by the standard deviation.
  * Then, the feature is centered around 0 and has a unit standard deviation.

In [None]:
mean = train_data.mean(axis=0)
std = train_data.std(axis=0)

train_data -= mean
train_data /= std

test_data -= mean
test_data /= std


* Note that the quantities used for normalizing the test data are computed using the training data.
* NEVER use any quantity computed on the test data, even for something as simple as data normalization.

> ### Building the network

In [None]:
from tensorflow.keras import models
from tensorflow.keras import layers

def build_model():
  model = models.Sequential()
  model.add(layers.Dense(64, activation='relu', input_shape=(train_data.shape[1],)))
  model.add(layers.Dense(64, activation='relu'))
  model.add(layers.Dense(1))
  model.compile(optimizer='rmsprop', loss='mse', metrics=['mae'])
  return model

* This network ends with a single unit and no activation (it is called a linear layer).
* `mse` loss 
  * mean squared error, the square of the difference between the predictions and the targets
* `mae` for monitoring
  * mean absolute error, the absolute error of the difference between the predictions and the targets

> ### Validation with k-fold cross validation technique

* Since we have few data points, the validation set would be very small if we randomly split the data into a training set and a validation set.
  * It means that the validation scores might change a lot depending on which data points we chose for the validation.
  * We can say that the validation scores might have a high *variance* with regard to the validation split.
  
* The best practice in such situations is to use *k-fold cross-validation*.
  * It consists of splitting the available data into *k* partitions, instantiating *k* identical models, and training each one of *k-1* partitions while evaluating on the remaining partition.
  * The validation score for the model used is then the average of the *k* validation scores obtained.
  
  <img src="https://drive.google.com/uc?id=13ND0yDLHrn1GmDKJ1TjVg21jbeji8DAt" width="800">




In [None]:
import numpy as np

k = 4
num_val_samples = len(train_data) // k
num_epochs = 300
all_mae_histories = []

for i in range(k):
  print('processing fold #', i)
  val_data = train_data[i*num_val_samples: (i+1)*num_val_samples]
  val_targets = train_targets[i*num_val_samples: (i+1)*num_val_samples]
  
  partial_train_data = np.concatenate([train_data[:i*num_val_samples],
                                       train_data[(i+1)*num_val_samples:]],
                                      axis=0)
  partial_train_targets = np.concatenate([train_targets[:i*num_val_samples],
                                          train_targets[(i+1)*num_val_samples:]],
                                         axis=0)
  
  model = build_model()
  history = model.fit(partial_train_data, 
                      partial_train_targets,
                      validation_data=(val_data, val_targets),
                      epochs=num_epochs,
                      batch_size=16, 
                      verbose=0)
  mae_history = history.history['val_mae']
  all_mae_histories.append(mae_history)

In [None]:
average_mae_history = [np.mean([x[i] for x in all_mae_histories]) for i in range(num_epochs)]

* Plotting validation scores

In [None]:
import matplotlib.pyplot as plt

plt.plot(range(1, len(average_mae_history)+1), average_mae_history)
plt.xlabel('Epochs')
plt.ylabel('Validation MAE')
plt.show()

In [None]:
def smooth_curve(points, factor=0.9):
  smoothed_points = []
  for point in points:
    if smoothed_points:
      previous = smoothed_points[-1]
      smoothed_points.append(previous*factor + point*(1-factor))
    else:
      smoothed_points.append(point)
  return smoothed_points

smooth_mae_history = smooth_curve(average_mae_history[10:])

plt.plot(range(1, len(smooth_mae_history)+1), smooth_mae_history)
plt.xlabel('Epochs')
plt.ylabel('Validation MAE')
plt.show()

> ### Exercise

* We found that validation MAE stops improving at a some point.
* Write a code to train a final production model on all of the training data and then look at its performance on the test data.