<a href="https://colab.research.google.com/github/Bitdribble/LDL/blob/main/colab/c6e1_boston.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
"""
The MIT License (MIT)
Copyright (c) 2021 NVIDIA
Permission is hereby granted, free of charge, to any person obtaining a copy of
this software and associated documentation files (the "Software"), to deal in
the Software without restriction, including without limitation the rights to
use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of
the Software, and to permit persons to whom the Software is furnished to do so,
subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS
FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR
COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER
IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
"""


In [1]:
# Set up sandbox environment
!rm -rf LDL
!git clone https://github.com/Bitdribble/LDL.git

# Install module dependencies
!pip install -r /content/LDL/colab_requirements.txt

# cd to desired directory
%cd /content/LDL/pt_framework

Cloning into 'LDL'...
remote: Enumerating objects: 233, done.[K
remote: Counting objects: 100% (233/233), done.[K
remote: Compressing objects: 100% (150/150), done.[K
remote: Total 233 (delta 131), reused 168 (delta 79), pack-reused 0[K
Receiving objects: 100% (233/233), 1.24 MiB | 2.75 MiB/s, done.
Resolving deltas: 100% (131/131), done.
/content/LDL/pt_framework


This code example demonstrates how to use a neural network to solve a regression problem, using the Boston housing dataset. More context for this code example can be found in the section "Programming Example: Predicting House Prices with a DNN" in Chapter 6 in the book Learning Deep Learning by Magnus Ekman (ISBN: 9780137470358).


Unlike MNIST, the Boston Housing dataset is not included with PyTorch, so we retrieve it using scikit-learn instead. This is done by calling the load_boston() function. We then retrieve the inputs and targets as NumPy arrays by calling the get() method. We explicitly split them up into a training set and a test set using the scikit-learn function train_test_split().

We convert the NumPy arrays to np.float32 and reshape them to ensure that the datatype and dimensions later match what PyTorch expects. 

We standardize both the training and test data by using the mean and standard deviation from the training data. The parameter axis=0 ensures that we compute the mean and standard deviation for each input variable separately. The resulting mean (and standard deviation) is a vector of means instead of a single value. That is, the standardized value of the nitric oxides concentration is not affected by the values of the per capita crime rate or any of the other variables.

Finally we create Dataset objects. To do that we need to first convert the NumPy arrays to PyTorch tensors. That is done by calling torch.from_numpy().


In [4]:
from sklearn.datasets import load_boston
from sklearn.model_selection import train_test_split
import torch
import torch.nn as nn
from torch.utils.data import TensorDataset, DataLoader
import numpy as np
from utilities import train_model_w_df

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
EPOCHS = 500
BATCH_SIZE = 16

# Read and standardize the data.
boston_housing = load_boston()
data = boston_housing.get('data')
target = boston_housing.get('target')

raw_x_train, raw_x_test, y_train, y_test = train_test_split(
    data, target, test_size=0.2, random_state=0)

# Convert to same precision as model.
raw_x_train = raw_x_train.astype(np.float32)
raw_x_test = raw_x_test.astype(np.float32)
y_train = y_train.astype(np.float32)
y_test = y_test.astype(np.float32)

# -1 means the value is inferred from the length of the array and remaining
# dimensions. The end result is, the array is reshaped to shape 1.
y_train = np.reshape(y_train, (-1, 1))
y_test = np.reshape(y_test, (-1, 1))

x_mean = np.mean(raw_x_train, axis=0)
x_stddev = np.std(raw_x_train, axis=0)
x_train = (raw_x_train - x_mean) / x_stddev
x_test = (raw_x_test - x_mean) / x_stddev

# Create Dataset objects.
trainset = TensorDataset(torch.from_numpy(x_train),
                         torch.from_numpy(y_train))
testset = TensorDataset(torch.from_numpy(x_test),
                        torch.from_numpy(y_test))



    The Boston housing prices dataset has an ethical problem. You can refer to
    the documentation of this function for further details.

    The scikit-learn maintainers therefore strongly discourage the use of this
    dataset unless the purpose of the code is to study and educate about
    ethical issues in data science and machine learning.

    In this special case, you can fetch the dataset from the original
    source::

        import pandas as pd
        import numpy as np


        data_url = "http://lib.stat.cmu.edu/datasets/boston"
        raw_df = pd.read_csv(data_url, sep="\s+", skiprows=22, header=None)
        data = np.hstack([raw_df.values[::2, :], raw_df.values[1::2, :2]])
        target = raw_df.values[1::2, 2]

    Alternative datasets include the California housing dataset (i.e.
    :func:`~sklearn.datasets.fetch_california_housing`) and the Ames housing
    dataset. You can load the datasets as follows::

        from sklearn.datasets import fetch_california_h

The [Boston Dataset](http://lib.stat.cmu.edu/datasets/boston) fields in order:

Feature   | Description
----------|------------
CRIM      | per capita crime rate by town
 ZN       | proportion of residential land zoned for lots over 25,000 sq.ft.
 INDUS    |proportion of non-retail business acres per town
 CHAS     |Charles River dummy variable (= 1 if tract bounds river; 0 otherwise)
 NOX      |nitric oxides concentration (parts per 10 million)
 RM       |average number of rooms per dwelling
 AGE      |proportion of owner-occupied units built prior to 1940
 DIS      |weighted distances to five Boston employment centres
 RAD      |index of accessibility to radial highways
 TAX      |full-value property-tax rate per \$10,000
 PTRATIO  |pupil-teacher ratio by town
 B        |1000(Bk - 0.63)^2 where Bk is the proportion of blacks by town
 LSTAT    |% lower status of the population
 MEDV     |Median value of owner-occupied homes in $1000's

We will be predicting the MEDV based on the 13 inputs CRIM-LSTAT.

We then create the model. The code looks follows the same pattern as c5e1_mnist_learning. We define our network to have two hidden layers, so we are now officially doing DL! The two hidden layers in our network implementation have 64 ReLU neurons each, where the first layer is declared to have 13 inputs to match the dataset. The output layer consists of a single neuron with a linear activation function. We use MSE as the loss function and use the Adam optimizer.

Instead of implementing the training loop below, we have broken it out into a separate function train_model(). Its implementation can be found in the file utilities.py. It is very similar to the training loop in c5e1_mnist_learning but has some additional logic to be able to handle both classification and regression problems. In particular, it takes a parameter "metric". If we work on a classification problem it should be set to "acc" and the function will compute accuracy. If we work on a regression problem it should be set to "mae" and the function will compute mean absolute error instead. 


In [5]:
# Create model.
model = nn.Sequential(
    nn.Linear(13, 64),
    nn.ReLU(),
    nn.Linear(64, 64),
    nn.ReLU(),
    nn.Linear(64, 1)
)

# Initialize weights.
for module in model.modules():
    if isinstance(module, nn.Linear):
        nn.init.xavier_uniform_(module.weight)
        nn.init.constant_(module.bias, 0.0)

# Loss function and optimizer
optimizer = torch.optim.Adam(model.parameters())
loss_function = nn.MSELoss()

# Train model.
loss_df = train_model_w_df(model, device, EPOCHS, BATCH_SIZE, trainset, testset,
                                optimizer, loss_function, 'mae')


Epoch 1/500 loss: 534.2032 - mae: 20.5371 - val_loss: 442.4801 - val_mae: 17.4684
Epoch 2/500 loss: 393.3004 - mae: 17.1625 - val_loss: 285.5605 - val_mae: 13.3840
Epoch 3/500 loss: 210.1577 - mae: 11.7961 - val_loss: 127.5317 - val_mae: 8.0401
Epoch 4/500 loss: 78.0188 - mae: 6.7393 - val_loss: 75.1264 - val_mae: 5.9290
Epoch 5/500 loss: 41.5719 - mae: 4.9134 - val_loss: 53.0952 - val_mae: 5.0037
Epoch 6/500 loss: 28.7528 - mae: 3.9324 - val_loss: 43.2835 - val_mae: 4.5192
Epoch 7/500 loss: 22.9669 - mae: 3.4789 - val_loss: 38.7490 - val_mae: 4.2412
Epoch 8/500 loss: 20.4081 - mae: 3.2460 - val_loss: 34.7741 - val_mae: 4.0212
Epoch 9/500 loss: 18.5611 - mae: 3.0920 - val_loss: 33.1772 - val_mae: 3.8744
Epoch 10/500 loss: 17.5316 - mae: 2.9440 - val_loss: 32.0443 - val_mae: 3.7565
Epoch 11/500 loss: 16.0867 - mae: 2.8753 - val_loss: 30.3291 - val_mae: 3.6283
Epoch 12/500 loss: 15.6170 - mae: 2.7708 - val_loss: 29.4159 - val_mae: 3.5384
Epoch 13/500 loss: 14.4348 - mae: 2.7077 - val_los

In [6]:
# Display the loss_df, for sanity
loss_df

Unnamed: 0,epoch,loss,acc,mae,val_loss,val_acc,val_mae
0,1,534.203,0,20.5371,442.48,0,17.4684
1,2,393.3,0,17.1625,285.56,0,13.384
2,3,210.158,0,11.7961,127.532,0,8.04014
3,4,78.0188,0,6.73925,75.1264,0,5.929
4,5,41.5719,0,4.91337,53.0952,0,5.00374
...,...,...,...,...,...,...,...
495,496,0.667394,0,0.587966,18.4763,0,2.75046
496,497,0.58236,0,0.550601,17.9745,0,2.71553
497,498,0.596211,0,0.555561,18.5099,0,2.72685
498,499,0.528717,0,0.517616,18.3934,0,2.75461


In [9]:
# Display the loss and error chart
import altair as alt

# Operate on a copy to not modify loss_df
loss_df1 = loss_df.copy()
data = loss_df1.reset_index().melt(
    id_vars=['epoch'], 
    value_vars=['loss', 'mae', 'val_loss', 'val_mae'])

bind = alt.selection_interval(bind='scales')

alt.Chart(data).mark_line().encode(
    x='epoch',
    y='value',
    color='variable'
).add_selection(
    bind
)

After the training is done, we use our model to predict the price for all test examples and then print out the first four predictions and the correct values so we can get an idea of how correct the model is.


In [8]:
# Print first 4 predictions.
inputs = torch.from_numpy(x_test)
inputs = inputs.to(device)
outputs = model(inputs)
for i in range(0, 4):
    print('Prediction: %4.2f' % outputs.data[i].item(),
         ', true value: %4.2f' % y_test[i].item())


Prediction: 23.02 , true value: 22.60
Prediction: 30.28 , true value: 50.00
Prediction: 21.58 , true value: 23.00
Prediction: 9.03 , true value: 8.30
