## PyTorch

In the last few lessons, we learned how to build and optimize neural network architectures.  This gave us a grounding in how data flows between layers, how parameters get adjusted, and how loss decreases.  So far, we've been using NumPy to build and optimize our networks.  In this lesson, we'll learn about PyTorch, a framework that makes building and applying neural networks much simpler.

We'll start off by taking a look at how PyTorch represents data, and we'll move to building a complete neural network in PyTorch.

## Tensors

We'll first load in the same house prices dataset from the last lesson.  Each row in this dataset represents a single house.  The predictor columns are:

- `interest`: The interest rate
- `vacancy`: The vacancy rate
- `cpi`: The consumer price index
- `price`: The price of a house
- `value`: The value of a house
- `adj_price`: The price of a house, adjusted for inflation
- `adj_value`: The value of a house, adjusted for inflation

The predictor columns have all been scaled using the scikit-learn `StandardScaler`.  This gives each column a mean of 0 and a standard deviation of 1.  This makes it easier to activate our nonlinearities.

The target column is `next_quarter`, which is the price of the house in three months.  `next_quarter` has been scaled so the minimum value is `0`, and it has been divided by `1000` and rounded to the nearest integer.  This makes the prediction task simpler for our network.

In [1]:
import sys, os
sys.path.append(os.path.abspath('../data'))
from csv_data import HousePricesDatasetWrapper

# Load in data from csv file
wrapper = HousePricesDatasetWrapper()
train_data, valid_data, test_data = wrapper.get_flat_datasets()

The data is currently loaded into NumPy arrays.  We can instead load the data into torch tensors.  Tensors are n-dimensional data structures similar to NumPy arrays.  The primary difference is that torch tensors can be loaded onto different devices, like GPUs.  We'll discuss this more later.

For now, we'll load our training set predictors and targets into torch tensors.

In [2]:
import torch

# Convert the numpy arrays to torch tensors
train_x = torch.from_numpy(train_data[0])
train_y = torch.from_numpy(train_data[1])

In [3]:
train_x

tensor([[ 1.9451,  1.3964, -1.5228,  ..., -0.1168, -0.1389,  0.8226],
        [ 1.9325,  1.3964, -1.4935,  ..., -0.1168, -0.1560,  0.8022],
        [ 1.9955,  1.3964, -1.4935,  ..., -0.1168, -0.0446,  0.8022],
        ...,
        [-0.2595, -0.6860,  0.5061,  ...,  0.3840,  0.4345,  0.3539],
        [-0.2469, -0.6860,  0.5061,  ...,  0.3840,  0.4217,  0.3539],
        [-0.1839, -0.6860,  0.5061,  ...,  0.3840,  0.6257,  0.3539]],
       dtype=torch.float64)

Tensors work very similarly to NumPy arrays.  You can do operations using scalars:

In [4]:
train_x + 1

tensor([[ 2.9451,  2.3964, -0.5228,  ...,  0.8832,  0.8611,  1.8226],
        [ 2.9325,  2.3964, -0.4935,  ...,  0.8832,  0.8440,  1.8022],
        [ 2.9955,  2.3964, -0.4935,  ...,  0.8832,  0.9554,  1.8022],
        ...,
        [ 0.7405,  0.3140,  1.5061,  ...,  1.3840,  1.4345,  1.3539],
        [ 0.7531,  0.3140,  1.5061,  ...,  1.3840,  1.4217,  1.3539],
        [ 0.8161,  0.3140,  1.5061,  ...,  1.3840,  1.6257,  1.3539]],
       dtype=torch.float64)

One important difference is that you want to make sure to use torch functions instead of NumPy methods.  This ensures that the operation is done on the appropriate device.  There are torch equivalents for most NumPy functions:

In [5]:
# Take the square root of each value in the array.  Negative values have an undefined square root.
torch.sqrt(train_x)

tensor([[1.3947, 1.1817,    nan,  ...,    nan,    nan, 0.9070],
        [1.3901, 1.1817,    nan,  ...,    nan,    nan, 0.8957],
        [1.4126, 1.1817,    nan,  ...,    nan,    nan, 0.8957],
        ...,
        [   nan,    nan, 0.7114,  ..., 0.6196, 0.6591, 0.5949],
        [   nan,    nan, 0.7114,  ..., 0.6196, 0.6494, 0.5949],
        [   nan,    nan, 0.7114,  ..., 0.6196, 0.7910, 0.5949]],
       dtype=torch.float64)

## Autograd

One big advantage that Torch has over NumPy for deep learning is autograd.  Autograd will automatically calculate the gradient, without you having to write a backward pass!

To do this, we first need to define that parameter that we want a gradient for, then set `requires_grad` to `True`:

In [6]:
# Define a matrix of weights
# Torch.rand generates random numbers
weights = torch.rand(train_x.shape[1], 1)
# Set requires_grad to True so that autograd can work
weights.requires_grad = True

Then we can load in our training data and multiply it by the weights.  You may have noticed above that our `train_x` tensor is in `float64`.  This is because the NumPy arrays were in `float64`.  `float64` means that each number is stored using `64` bits of data.  In PyTorch, the default tends to be `float32`, which uses `32` bits to store each number.

The main difference is the range of possible values that the number can store.

In [19]:
import numpy as np

# Display the maximum value of float64
np.finfo("float64").max

1.7976931348623157e+308

In [20]:
# Display the maximum value of float32
np.finfo("float32").max

3.4028235e+38

`float32` can store large enough numbers that we rarely have issues when training deep learning models.  Thus, it's much more common to work with `float32` in PyTorch.  There are also times when you'll work with `float16` or `int8`, and we'll cover those in a later lesson.

We'll convert our array to `float32`, which is just `torch.float`, since it's the default.  We can then multiply the weights and the `train_x` values:

In [21]:
train_x = train_x.to(torch.float)
predictions = train_x @ weights

We can find the gradient by finding the loss (mean squared error derivative), then calling `loss.backward()`.  This will automatically backpropagate from `loss` to `weights`:

In [None]:
loss = (predictions - train_y).mean()
loss.backward()

Then we can display the weight gradient:

In [7]:
weights.grad

tensor([[ 0.2605],
        [ 0.4172],
        [-0.5335],
        [-0.5425],
        [-0.5502],
        [-0.5209],
        [-0.5167]])

And make the gradient update with a `1e-5` learning rate:

In [8]:
weights = weights - 1e-5 * weights.grad

## nn.Module



We can also automate the backward pass entirely.  Inherit from nn.Module

In [9]:
from torch import nn
import math

class DenseLayer(nn.Module):
    def __init__(self, input_units, output_units):
        super().__init__()

        k = math.sqrt(1/input_units)
        self.weight = nn.Parameter(torch.rand(input_units, output_units) * 2 * k - k)
        self.bias = nn.Parameter(torch.rand(1, output_units) * 2 * k - k)

    def forward(self, x):
        return x @ self.weight + self.bias

class DenseNetwork(nn.Module):
    def __init__(self, input_units, hidden_units, output_units, layers):
        super().__init__()

        torch.manual_seed(0)
        modules = []
        for i in range(layers):
            in_size = out_size = hidden_units
            if i == 0:
                in_size = input_units
            elif i == layers - 1:
                out_size = output_units
            modules.append(DenseLayer(in_size, out_size))
        self.module_list = nn.ModuleList(modules)

    def forward(self, x):
        for module in self.module_list:
            x = module(x)
        return x

PyTorch has dataloaders to make it easy to work with data and batches:

In [10]:
from torch.utils.data import DataLoader, Dataset

class PriceData(Dataset):
    def __init__(self, x, y):
        self.x = x.float()
        self.y = y.float()

    def __len__(self):
        return len(self.x)

    def __getitem__(self, idx):
        x = self.x[idx]
        y = self.y[idx]
        return x, y

train_ds = PriceData(train_x, train_y)

Automatic batching with DataLoader:

In [11]:
batch_size = 16
train = DataLoader(train_ds, batch_size=batch_size, shuffle=False)

Can run the full train loop:

In [12]:
from statistics import mean

epochs = 50
layers = 5
hidden_size = 25
lr = 5e-4

net = DenseNetwork(train_x.shape[1], hidden_size, 1, layers)
optimizer = torch.optim.SGD(net.parameters(), lr=lr)

def train_loop(net, optimizer, epochs):
    loss_fn = nn.MSELoss()

    train_losses = []
    for epoch in range(epochs):
        for batch, (x, y) in enumerate(train):
            optimizer.zero_grad()
            pred = net(x)
            loss = loss_fn(pred, y)
            loss.backward()
            optimizer.step()
            train_losses.append(loss.item())
        if (epoch + 1) % 10 == 0:
            print(mean(train_losses))

train_loop(net, optimizer, epochs)

49.68326367922127
30.341548889130355
23.66963948508104
20.26992240929976
18.193296501636507


PyTorch also has prebuilt components:

In [13]:
class DenseNetwork(nn.Module):
    def __init__(self, input_units, hidden_units, output_units, layers):
        super().__init__()

        torch.manual_seed(0)
        modules = []
        for i in range(layers):
            in_size = out_size = hidden_units
            if i == 0:
                in_size = input_units
            elif i == layers - 1:
                out_size = output_units
            modules.append(nn.Linear(in_size, out_size))
        self.module_list = nn.ModuleList(modules)

    def forward(self, x):
        for module in self.module_list:
            x = module(x)
        return x

In [14]:
net = DenseNetwork(train_x.shape[1], hidden_size, 1, layers)
optimizer = torch.optim.SGD(net.parameters(), lr=lr)
train_loop(net, optimizer, epochs)

51.05724552236497
30.849216412380336
23.951016113410393
20.450236316770315
18.319719779416918


PyTorch makes it easy to swap components in and out to make a more complex network.  You can pick from:

- Different layer types
- Optimizers
- Schedulers

PyTorch also makes your code portable.  So far, we've run on CPU, but PyTorch also lets you run code on a gpu, or on mps (specific to Macs).

In [15]:
if torch.cuda.is_available():
    device = "cuda"
elif torch.backends.mps.is_available() and torch.backends.mps.is_built():
    device = "mps"
else:
    device = "cpu"

class PriceData(Dataset):
    def __init__(self, x, y):
        self.x = x.float()
        self.y = y.float()

    def __len__(self):
        return len(self.x)

    def __getitem__(self, idx):
        x = self.x[idx]
        y = self.y[idx]
        return x.to(device), y.to(device)

train_ds = PriceData(train_x, train_y)
train = DataLoader(train_ds, batch_size=batch_size, shuffle=False)

net = DenseNetwork(train_x.shape[1], hidden_size, 1, layers).to(device)
optimizer = torch.optim.SGD(net.parameters(), lr=lr)

train_loop(net, optimizer, epochs)

51.057245976105335
30.84921663403511
23.95101627384623
20.450236380659042
18.319719846621155


We'll learn about other PyTorch features later, but this is the core set that you'll need.