# AAI612: Deep Learning & its Applications

*Notebook 5.4: Predicting Housing Prices*

<a href="https://colab.research.google.com/github/OmarMlaeb/AAI612_Malaeb/blob/master/Week%205/Notebook5.4.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
"""
The MIT License (MIT)
Copyright (c) 2021 NVIDIA
Permission is hereby granted, free of charge, to any person obtaining a copy of
this software and associated documentation files (the "Software"), to deal in
the Software without restriction, including without limitation the rights to
use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of
the Software, and to permit persons to whom the Software is furnished to do so,
subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS
FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR
COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER
IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
"""


'\nThe MIT License (MIT)\nCopyright (c) 2021 NVIDIA\nPermission is hereby granted, free of charge, to any person obtaining a copy of\nthis software and associated documentation files (the "Software"), to deal in\nthe Software without restriction, including without limitation the rights to\nuse, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of\nthe Software, and to permit persons to whom the Software is furnished to do so,\nsubject to the following conditions:\nThe above copyright notice and this permission notice shall be included in all\ncopies or substantial portions of the Software.\nTHE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR\nIMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS\nFOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR\nCOPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER\nIN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OU

## Part I: Create the model

This example uses a neural network to solve a regression problem, using the Boston housing dataset.  The Boston Housing dataset is included in Keras, so it is simple to access using keras.datasets.boston_housing:

In [2]:
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
import numpy as np

# Read and standardize the data.
boston_housing = keras.datasets.boston_housing
(raw_x_train, y_train), (raw_x_test,
    y_test) = boston_housing.load_data()

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/boston_housing.npz
[1m57026/57026[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 0us/step


We standardize both the training and test data by using the mean and standard deviation from the training data. The parameter axis=0 ensures that we compute the mean and standard deviation for each input variable separately. The resulting mean (and standard deviation) is a vector of means instead of a single value. That is, the standardized value of the nitric oxides concentration is not affected by the values of the per capita crime rate or any of the other variables.

In [3]:
x_mean = np.mean(raw_x_train, axis=0)
x_stddev = np.std(raw_x_train, axis=0)
x_train =(raw_x_train - x_mean) / x_stddev
x_test =(raw_x_test - x_mean) / x_stddev

Create the model by first instantiating the model object without any layers, and then add them one by one using the member method `add()`.
Next, complete the missing code in the below hidden layers using 64 ReLU neurons per layer.  Careful regarding the first layer as it needs to match the match the dataset. The output layer consists of a single neuron with a linear activation function.

In [4]:
# Create and train model.
model = Sequential()
model.add(Dense(64, activation='relu', input_shape=[x_train.shape[1]]))
model.add(Dense(64, activation='relu'))
model.add(Dense(1, activation='linear'))
# model.add(Dense(64, activation='#Fill', input_shape=[#Fill]))
# model.add(Dense(64, activation='#Fill'))
# model.add(Dense(1, activation='#Fill'))

  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


Use MSE as the loss function and use the `Adam` optimizer and compile method that we are interested in seeing the metric mean absolute error, and print out a summary of the model with `model.summary()` and then start training.  Experiment with different `batch sizes` and `epochs`.  Record the results for these experiments.

In [5]:
BATCH_SIZE = 16
EPOCHS = 500

# model.compile(loss='#Fill', optimizer='#Fill', metrics =['mean_absolute_error'])
model.compile(loss='mse', optimizer='adam', metrics=['mean_absolute_error'])
model.summary()
history = model.fit(x_train, y_train, validation_data=(x_test, y_test), epochs=EPOCHS, batch_size=BATCH_SIZE, verbose=2, shuffle=True)

Epoch 1/500
26/26 - 4s - 152ms/step - loss: 530.4760 - mean_absolute_error: 21.0251 - val_loss: 490.7048 - val_mean_absolute_error: 20.1538
Epoch 2/500
26/26 - 3s - 101ms/step - loss: 389.4779 - mean_absolute_error: 17.4993 - val_loss: 309.1619 - val_mean_absolute_error: 15.4931
Epoch 3/500
26/26 - 0s - 5ms/step - loss: 199.3123 - mean_absolute_error: 11.6127 - val_loss: 116.5427 - val_mean_absolute_error: 8.9322
Epoch 4/500
26/26 - 0s - 5ms/step - loss: 78.4066 - mean_absolute_error: 6.7843 - val_loss: 66.7966 - val_mean_absolute_error: 6.5549
Epoch 5/500
26/26 - 0s - 5ms/step - loss: 49.9815 - mean_absolute_error: 5.2892 - val_loss: 46.1780 - val_mean_absolute_error: 5.4990
Epoch 6/500
26/26 - 0s - 5ms/step - loss: 35.0927 - mean_absolute_error: 4.3069 - val_loss: 36.8603 - val_mean_absolute_error: 4.8997
Epoch 7/500
26/26 - 0s - 11ms/step - loss: 28.9655 - mean_absolute_error: 3.8704 - val_loss: 32.4654 - val_mean_absolute_error: 4.5444
Epoch 8/500
26/26 - 0s - 5ms/step - loss: 25.5

After the training is done, we use our model to predict the price for the entire test set.

In [6]:
predictions = model.predict(x_test)

[1m4/4[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 45ms/step


Knowing the answers, let us check how far off you are:

In [7]:
expected =[7.939793, 18.455063, 20.115505, 32.14037]
for i in range(0, 4):
    assert ((predictions[i] - expected[i]) < 0.0001), "predicted value is too large"

AssertionError: predicted value is too large

## Part II: Tuning

Improve the above results by modifying the network to include more layers with more neurons.  Also, apply L1 and L2 regularization using different weight decay parameters: Start with $\lambda=0.1$ and try $\lambda=0.2$ and $\lambda=0.3$

In [12]:
from tensorflow.keras.layers import Dropout
from tensorflow.keras.regularizers import l1_l2

# Create and train model.
model = Sequential()

# Create the model
model = Sequential()
from tensorflow.keras.layers import Dropout

# Create and train model.
model = Sequential()

# Create the model
model = Sequential()
model.add(Dense(128, activation='relu', input_shape=[x_train.shape[1]], kernel_regularizer=l1_l2(0.1)))
model.add(Dropout(0.2))
model.add(Dense(128, activation='relu', kernel_regularizer=l1_l2(0.1)))
model.add(Dropout(0.2))
model.add(Dense(64, activation='relu', kernel_regularizer=l1_l2(0.1)))
model.add(Dense(1, activation='linear'))

model.compile(loss='mse', optimizer='adam', metrics=['mean_absolute_error'])
model.summary()
history = model.fit(x_train, y_train, validation_data=(x_test, y_test), epochs=EPOCHS, batch_size=BATCH_SIZE, verbose=2, shuffle=True)

predictions = model.predict(x_test)
model.add(Dense(1, activation='linear'))


model.summary()
history = model.fit(x_train, y_train, validation_data=(x_test, y_test), epochs=EPOCHS, batch_size=BATCH_SIZE, verbose=2, shuffle=True)

predictions = model.predict(x_test)

Epoch 1/500
26/26 - 5s - 209ms/step - loss: 671.8922 - mean_absolute_error: 19.5250 - val_loss: 487.9757 - val_mean_absolute_error: 15.1086
Epoch 2/500
26/26 - 1s - 43ms/step - loss: 313.6451 - mean_absolute_error: 8.8794 - val_loss: 227.9305 - val_mean_absolute_error: 5.4358
Epoch 3/500
26/26 - 0s - 6ms/step - loss: 209.4387 - mean_absolute_error: 4.6551 - val_loss: 186.7284 - val_mean_absolute_error: 4.1252
Epoch 4/500
26/26 - 0s - 6ms/step - loss: 176.0871 - mean_absolute_error: 3.8522 - val_loss: 167.2116 - val_mean_absolute_error: 4.0679
Epoch 5/500
26/26 - 0s - 12ms/step - loss: 156.2690 - mean_absolute_error: 3.6128 - val_loss: 148.4203 - val_mean_absolute_error: 3.7746
Epoch 6/500
26/26 - 0s - 10ms/step - loss: 139.5142 - mean_absolute_error: 3.5713 - val_loss: 134.9009 - val_mean_absolute_error: 3.7283
Epoch 7/500
26/26 - 0s - 5ms/step - loss: 124.7122 - mean_absolute_error: 3.1976 - val_loss: 123.1664 - val_mean_absolute_error: 3.5576
Epoch 8/500
26/26 - 0s - 5ms/step - loss:

Epoch 1/500
26/26 - 0s - 7ms/step - loss: 22.0065 - mean_absolute_error: 2.5903 - val_loss: 29.7627 - val_mean_absolute_error: 3.1150
Epoch 2/500
26/26 - 0s - 5ms/step - loss: 19.6882 - mean_absolute_error: 2.3369 - val_loss: 25.9068 - val_mean_absolute_error: 2.7207
Epoch 3/500
26/26 - 0s - 5ms/step - loss: 21.0462 - mean_absolute_error: 2.4374 - val_loss: 26.1596 - val_mean_absolute_error: 2.6799
Epoch 4/500
26/26 - 0s - 5ms/step - loss: 19.9876 - mean_absolute_error: 2.4403 - val_loss: 28.2691 - val_mean_absolute_error: 2.9044
Epoch 5/500
26/26 - 0s - 5ms/step - loss: 21.0284 - mean_absolute_error: 2.5147 - val_loss: 27.2709 - val_mean_absolute_error: 2.7962
Epoch 6/500
26/26 - 0s - 5ms/step - loss: 23.8945 - mean_absolute_error: 2.6490 - val_loss: 30.8311 - val_mean_absolute_error: 3.1910
Epoch 7/500
26/26 - 0s - 5ms/step - loss: 22.1213 - mean_absolute_error: 2.4802 - val_loss: 30.0434 - val_mean_absolute_error: 2.9462
Epoch 8/500
26/26 - 0s - 8ms/step - loss: 20.2597 - mean_absol

In [13]:
# Create and train model.
model = Sequential()

# Create the model
model = Sequential()
model.add(Dense(128, activation='relu', input_shape=[x_train.shape[1]], kernel_regularizer=l1_l2(0.2)))
model.add(Dropout(0.2))
model.add(Dense(128, activation='relu', kernel_regularizer=l1_l2(0.2)))
model.add(Dropout(0.2))
model.add(Dense(64, activation='relu', kernel_regularizer=l1_l2(0.2)))
model.add(Dense(1, activation='linear'))

model.compile(loss='mse', optimizer='adam', metrics=['mean_absolute_error'])
model.summary()
history = model.fit(x_train, y_train, validation_data=(x_test, y_test), epochs=EPOCHS, batch_size=BATCH_SIZE, verbose=2, shuffle=True)

predictions = model.predict(x_test)

Epoch 1/500
26/26 - 5s - 185ms/step - loss: 876.7251 - mean_absolute_error: 19.6747 - val_loss: 687.4664 - val_mean_absolute_error: 15.4761
Epoch 2/500
26/26 - 0s - 6ms/step - loss: 492.6700 - mean_absolute_error: 8.9004 - val_loss: 394.7033 - val_mean_absolute_error: 5.9971
Epoch 3/500
26/26 - 0s - 11ms/step - loss: 349.1901 - mean_absolute_error: 4.8545 - val_loss: 312.3655 - val_mean_absolute_error: 4.3155
Epoch 4/500
26/26 - 0s - 7ms/step - loss: 292.5481 - mean_absolute_error: 4.1578 - val_loss: 261.7544 - val_mean_absolute_error: 3.9272
Epoch 5/500
26/26 - 0s - 11ms/step - loss: 245.6413 - mean_absolute_error: 3.8896 - val_loss: 223.0481 - val_mean_absolute_error: 3.7686
Epoch 6/500
26/26 - 0s - 10ms/step - loss: 209.1513 - mean_absolute_error: 3.5574 - val_loss: 193.4433 - val_mean_absolute_error: 3.6143
Epoch 7/500
26/26 - 0s - 5ms/step - loss: 184.0665 - mean_absolute_error: 3.6217 - val_loss: 172.0503 - val_mean_absolute_error: 3.5902
Epoch 8/500
26/26 - 0s - 5ms/step - loss:



[1m4/4[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 45ms/step


In [14]:
# Create and train model.
model = Sequential()

# Create the model
model = Sequential()
model.add(Dense(128, activation='relu', input_shape=[x_train.shape[1]], kernel_regularizer=l1_l2(0.3)))
model.add(Dropout(0.3))
model.add(Dense(128, activation='relu', kernel_regularizer=l1_l2(0.3)))
model.add(Dropout(0.3))
model.add(Dense(64, activation='relu', kernel_regularizer=l1_l2(0.3)))
model.add(Dense(1, activation='linear'))

model.compile(loss='mse', optimizer='adam', metrics=['mean_absolute_error'])
model.summary()
history = model.fit(x_train, y_train, validation_data=(x_test, y_test), epochs=EPOCHS, batch_size=BATCH_SIZE, verbose=2, shuffle=True)

predictions = model.predict(x_test)

Epoch 1/500
26/26 - 5s - 184ms/step - loss: 1059.1846 - mean_absolute_error: 19.6338 - val_loss: 852.4734 - val_mean_absolute_error: 15.5225
Epoch 2/500
26/26 - 2s - 66ms/step - loss: 649.0511 - mean_absolute_error: 9.2407 - val_loss: 533.5870 - val_mean_absolute_error: 6.1231
Epoch 3/500
26/26 - 0s - 5ms/step - loss: 485.1408 - mean_absolute_error: 5.5053 - val_loss: 420.2630 - val_mean_absolute_error: 4.2952
Epoch 4/500
26/26 - 0s - 5ms/step - loss: 393.1979 - mean_absolute_error: 4.3384 - val_loss: 345.3282 - val_mean_absolute_error: 3.9673
Epoch 5/500
26/26 - 0s - 6ms/step - loss: 323.6464 - mean_absolute_error: 4.2583 - val_loss: 286.0326 - val_mean_absolute_error: 3.8885
Epoch 6/500
26/26 - 0s - 9ms/step - loss: 266.1273 - mean_absolute_error: 3.8891 - val_loss: 241.0628 - val_mean_absolute_error: 3.6725
Epoch 7/500
26/26 - 0s - 8ms/step - loss: 229.3767 - mean_absolute_error: 3.8110 - val_loss: 212.0235 - val_mean_absolute_error: 3.7900
Epoch 8/500
26/26 - 0s - 7ms/step - loss: 



[1m4/4[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 41ms/step


Retry the above using dropout regularization.  What do you notice?

Repeat the above by trying multiple parameter combinations:
- Using a combination of 64 to 128 neurons
- Using Dropout of 0.2 and 0.3
- Using L2 = 0.1 and 0.2

For each case, print first 4 predictions and record the answers!

In [15]:
for i in range(0, 4):
    print('Prediction: ', predictions[i, 0],
          ', true value: ', y_test[i])

Prediction:  15.183173 , true value:  7.2
Prediction:  18.268785 , true value:  18.8
Prediction:  21.197498 , true value:  19.0
Prediction:  33.14332 , true value:  27.0
