## PhD course: a Comparative Introduction to Deep Learning Frameworks: TensorFlow, PyTorch and JAX - Coursework
@author Matteo Magnini, Ph.D. student of the Computer Science and Engineering (38th cycle) course of the University of Bologna.

This notebook presents a comparison study between `pytorch` and `tensorflow`.
Given a public dataset, the same ML model -- a deep neural network -- is build, trained and tested with both technologies.
Moreover, the two models are also evaluated on the computational time during training.

Note: the experiments are reproducible via explicit seed declaration (just change it if you want to see different executions).

Some necessary imports.

In [1]:
import pandas as pd
import torch
from sklearn.model_selection import train_test_split
from torch import nn, manual_seed
from torch.utils.data import DataLoader
import time

Hyper-parameters.

In [2]:
DATASET_URL = 'http://lib.stat.cmu.edu/datasets/houses.zip'

SEED = 1
TRAIN_RATIO = 2 / 3
EPOCHS = 100
BATCH_SIZE = 64
NEURONS = [128, 128, 64, 64, 1]

Utility functions.

In [3]:
def get_device():
    if torch.cuda.is_available():
        device = torch.device('cuda:0')
    else:
        device = torch.device('cpu')
    return device


def df_to_tensor(df):
    device = get_device()
    return torch.from_numpy(df.values).float().to(device)


def create_pytorch_net(input_shape, neurons):
    linear_layers = [nn.Linear(neurons[i], neurons[i + 1])for i in range(len(neurons) - 1)]
    relu_layers = (len(neurons) - 1) * [nn.ReLU()]
    mixed_layers = [None]*(len(linear_layers)+len(relu_layers))
    mixed_layers[::2] = relu_layers
    mixed_layers[1::2] = linear_layers
    return nn.Sequential(nn.Linear(input_shape, neurons[0]), *mixed_layers)

### Dataset
The dataset I use for this coursework contains the housing prices of California in 1990.
It is public available at `http://lib.stat.cmu.edu/datasets/houses.zip`.

A brief description from authors:
> We collected information on the variables using all the block groups in California from the 1990 Census.
In this sample a block group on average includes 1425.5 individuals living in a geographically compact area.
Naturally, the geographical area included varies inversely with the population density.
We computed distances among the centroids of each block group as measured in latitude and longitude.
We excluded all the block groups reporting zero entries for the independent and dependent variables.
The final data contained 20,640 observations on 9 variables.
The dependent variable is ln(median house value).

The dataset consists of 8 independent variables and 1 dependent variable (median house value).
The dataset contains 20,640 records.
The task is to predict the house cost value from the independent variable (regression).

In [4]:
data = pd.read_csv(DATASET_URL, sep="\s+", skiprows=27, header=None, encoding='windows-1252')
data.columns = ["median_house_value", "median_income", "housing_median_age", "rooms", "bedrooms", "population", "households", "latitude", "longitude"]
data.describe()

Unnamed: 0,median_house_value,median_income,housing_median_age,rooms,bedrooms,population,households,latitude,longitude
count,20640.0,20640.0,20640.0,20640.0,20640.0,20640.0,20640.0,20640.0,20640.0
mean,206855.816909,3.870671,28.639486,2635.763081,537.898014,1425.476744,499.53968,35.631861,-119.569704
std,115395.615874,1.899822,12.585558,2181.615252,421.247906,1132.462122,382.329753,2.135952,2.003532
min,14999.0,0.4999,1.0,2.0,1.0,3.0,1.0,32.54,-124.35
25%,119600.0,2.5634,18.0,1447.75,295.0,787.0,280.0,33.93,-121.8
50%,179700.0,3.5348,29.0,2127.0,435.0,1166.0,409.0,34.26,-118.49
75%,264725.0,4.74325,37.0,3148.0,647.0,1725.0,605.0,37.71,-118.01
max,500001.0,15.0001,52.0,39320.0,6445.0,35682.0,6082.0,41.95,-114.31


Split into train and test set

In [5]:
train, test = train_test_split(data, train_size=TRAIN_RATIO, random_state=SEED)
train

Unnamed: 0,median_house_value,median_income,housing_median_age,rooms,bedrooms,population,households,latitude,longitude
16054,336100.0,4.4094,52.0,3260.0,653.0,1594.0,632.0,37.76,-122.48
20161,161200.0,2.5441,41.0,1408.0,311.0,793.0,264.0,34.37,-119.29
6525,176100.0,3.2132,39.0,1258.0,245.0,988.0,228.0,34.06,-118.04
15763,412500.0,2.3977,52.0,2514.0,729.0,1428.0,597.0,37.77,-122.43
19798,61500.0,1.8696,23.0,1091.0,217.0,539.0,201.0,40.54,-123.12
...,...,...,...,...,...,...,...,...,...
10955,205300.0,1.7823,17.0,1768.0,474.0,1079.0,436.0,33.76,-117.88
17289,500001.0,8.5608,42.0,1765.0,263.0,753.0,260.0,34.42,-119.63
5192,104800.0,1.1326,42.0,1433.0,295.0,775.0,293.0,33.93,-118.26
12172,140700.0,2.6322,10.0,2381.0,454.0,1323.0,477.0,33.73,-117.16


Normalization.

In [6]:
train, test = (train-train.min())/(train.max()-train.min()), (test-test.min())/(test.max()-test.min())
train

Unnamed: 0,median_house_value,median_income,housing_median_age,rooms,bedrooms,population,households,latitude,longitude
16054,0.662061,0.269617,1.000000,0.082863,0.105009,0.044592,0.117790,0.554729,0.182182
20161,0.301444,0.140977,0.784314,0.035760,0.049928,0.022142,0.049095,0.194474,0.501502
6525,0.332166,0.187122,0.745098,0.031945,0.039298,0.027607,0.042374,0.161530,0.626627
15763,0.819586,0.130881,1.000000,0.063889,0.117249,0.039939,0.111256,0.555792,0.187187
19798,0.095878,0.094461,0.431373,0.027697,0.034788,0.015023,0.037334,0.850159,0.118118
...,...,...,...,...,...,...,...,...,...
10955,0.392372,0.088440,0.313725,0.044916,0.076180,0.030158,0.081202,0.129649,0.642643
17289,1.000000,0.555916,0.803922,0.044840,0.042197,0.021021,0.048348,0.199787,0.467467
5192,0.185156,0.043634,0.803922,0.036396,0.047351,0.021637,0.054508,0.147715,0.604605
12172,0.259176,0.147053,0.176471,0.060507,0.072959,0.036997,0.088856,0.126461,0.714715


### PyTorch
Dataframes are converted into pytorch tensors and then a `DataLoader` is created for both train and test set.

In [7]:
train = DataLoader(df_to_tensor(train), batch_size=BATCH_SIZE)
test = DataLoader(df_to_tensor(test), batch_size=BATCH_SIZE)

Definition of train and test loop functions.

In [8]:
def train_loop(train_loader, model, loss_fn, opt):
    """
    :param train_loader: pytorch loader for the training sry
    :param model: the neural network
    :param loss_fn: the loss function
    :param opt: the optimizer
    :return: the execution time (prints are excluded but the computations of loss and mae are included)
    """
    start_time = time.time()
    num_batches = len(train_loader)
    num_records = sum([batch.shape[0] for batch in train_loader])
    train_loss, mae = 0, 0
    for batch, train_batch in enumerate(train_loader):
        train_x, train_y = train_batch[:, 1:], train_batch[:, :1]
        pred_y = model(train_x)
        loss = loss_fn(pred_y, train_y)
        opt.zero_grad()
        loss.backward()
        train_loss += loss.item()
        train_loss /= num_batches
        mae += sum(abs(train_y - pred_y))
        opt.step()
    loss, current = loss.item(), batch * len(train_x)
    mae /= num_records
    execution_time = time.time() - start_time
    print(f"loss: {loss:>4f}, mae: {mae.item(): > 0.4f}")
    return execution_time


def test_loop(test_loader, model, loss_fn):
    num_batches = len(test_loader)
    num_records = sum([batch.shape[0] for batch in test_loader])
    test_loss, mae, ssr, sst = 0, 0, 0, 0
    with torch.no_grad():
        for test_batch in test_loader:
            test_x, test_y = test_batch[:, 1:], test_batch[:, :1]
            pred_y = model(test_x)
            test_loss += loss_fn(pred_y, test_y).item() / num_batches
            mae += sum(abs(test_y - pred_y))
            ssr += sum((test_y - pred_y) ** 2)
            sst += sum((test_y - test_loader.dataset[:, 0].mean()) ** 2)
            r2 = (1 - ssr / sst).item()
    rmse = (ssr / num_records) ** (1/2)
    mae /= num_records
    print(f"\nTest Error: loss: {test_loss: > 4f}, mae: {mae.item(): > 0.4f}, rmse: {rmse.item(): > 0.4f}, R2: {r2: > 0.4f}")

Construction of the neural network

In [9]:
manual_seed(SEED)
pytorch_net = create_pytorch_net(data.shape[1] - 1, NEURONS)
pytorch_net

Sequential(
  (0): Linear(in_features=8, out_features=128, bias=True)
  (1): ReLU()
  (2): Linear(in_features=128, out_features=128, bias=True)
  (3): ReLU()
  (4): Linear(in_features=128, out_features=64, bias=True)
  (5): ReLU()
  (6): Linear(in_features=64, out_features=64, bias=True)
  (7): ReLU()
  (8): Linear(in_features=64, out_features=1, bias=True)
)

Training

In [10]:
loss_function = nn.L1Loss()
optimizer = torch.optim.Adam(pytorch_net.parameters())
pytorch_computation_time = 0
for e in range(EPOCHS):
    print(f"Epoch{e+1}: ")
    pytorch_computation_time += train_loop(train, pytorch_net, loss_function, optimizer)
print(f"\nTraining time: {pytorch_computation_time: > 0.4f} seconds\n")

Epoch1: 
loss: 0.095232, mae:  0.1556
Epoch2: 
loss: 0.083598, mae:  0.0991
Epoch3: 
loss: 0.077849, mae:  0.0944
Epoch4: 
loss: 0.077245, mae:  0.0915
Epoch5: 
loss: 0.075610, mae:  0.0889
Epoch6: 
loss: 0.075111, mae:  0.0873
Epoch7: 
loss: 0.076353, mae:  0.0858
Epoch8: 
loss: 0.073170, mae:  0.0850
Epoch9: 
loss: 0.074998, mae:  0.0841
Epoch10: 
loss: 0.072773, mae:  0.0838
Epoch11: 
loss: 0.076641, mae:  0.0831
Epoch12: 
loss: 0.071477, mae:  0.0828
Epoch13: 
loss: 0.080216, mae:  0.0820
Epoch14: 
loss: 0.074275, mae:  0.0823
Epoch15: 
loss: 0.072062, mae:  0.0809
Epoch16: 
loss: 0.070625, mae:  0.0814
Epoch17: 
loss: 0.070001, mae:  0.0802
Epoch18: 
loss: 0.071761, mae:  0.0799
Epoch19: 
loss: 0.071344, mae:  0.0796
Epoch20: 
loss: 0.071010, mae:  0.0795
Epoch21: 
loss: 0.071006, mae:  0.0792
Epoch22: 
loss: 0.070436, mae:  0.0790
Epoch23: 
loss: 0.071310, mae:  0.0783
Epoch24: 
loss: 0.069468, mae:  0.0781
Epoch25: 
loss: 0.071608, mae:  0.0777
Epoch26: 
loss: 0.070177, mae:  0.

Testing

In [11]:
test_loop(test, pytorch_net, loss_function)


Test Error: loss:  0.086529, mae:  0.0865, rmse:  0.1335, R2:  0.6796


### Tensorflow
TODO