Follows YouTube PyTorch tutorial at <https://www.youtube.com/watch?v=GIsg-ZUy0MY>. Around timestamp 1:10:00

# Linear regression using PyTorch built-ins

In [3]:
import torch
import torch.nn as nn
import numpy as np

In [4]:
# Inputs: Temp, rainfall, humidity
inputs = np.array([[73, 67, 43],
                    [91, 88, 64],
                    [87, 134, 58],
                    [102, 43, 37],
                    [69, 96, 70]], dtype='float32')

In [5]:
# Targets: apples, oranges
targets = np.array([[56, 70],
                    [81, 101],
                    [119, 133],
                    [22, 37],
                    [103, 119]], dtype='float32')

In [6]:
inputs = torch.from_numpy(inputs)
targets = torch.from_numpy(targets)
inputs, targets

(tensor([[ 73.,  67.,  43.],
         [ 91.,  88.,  64.],
         [ 87., 134.,  58.],
         [102.,  43.,  37.],
         [ 69.,  96.,  70.]]),
 tensor([[ 56.,  70.],
         [ 81., 101.],
         [119., 133.],
         [ 22.,  37.],
         [103., 119.]]))

# Dataset and DataLoader

The `TensorDataset` allows access to `inputs` and `targets` as tuples and provides standard APIs for working with many different types of datasets in PyTorch.

In [36]:
from torch.utils.data import TensorDataset

In [37]:
# Define dataset
train_ds = TensorDataset(inputs, targets)
train_ds[0:3]

(tensor([[ 73.,  67.,  43.],
         [ 91.,  88.,  64.],
         [ 87., 134.,  58.]]),
 tensor([[ 56.,  70.],
         [ 81., 101.],
         [119., 133.]]))

The `TensorDataset` allows us to access a small section of the training data using the array indexing notation. It returns a tuple in which the first element contains the input variables for the selected rows and the second contains the targets.

We'll also create a DataLoader, which can split data into branches of a predefined size while training. It also provides other utilities like shuffling and random sampling of data.

In [38]:
from torch.utils.data import DataLoader

In [39]:
# Define data loader
batch_size = 5
train_dl = DataLoader(train_ds, batch_size, shuffle=True)

What `shuffle=True` means is before creating the batches it shuffles the data.

The `DataLoader` is typically used in a `for` loop.

In [40]:
for xb, yb in train_dl:
    print(xb, yb)

tensor([[ 73.,  67.,  43.],
        [ 91.,  88.,  64.],
        [102.,  43.,  37.],
        [ 87., 134.,  58.],
        [ 69.,  96.,  70.]]) tensor([[ 56.,  70.],
        [ 81., 101.],
        [ 22.,  37.],
        [119., 133.],
        [103., 119.]])


# nn.Linear

Instead of initializing the weights and biases manually, we can define the model using the `nn.Linear` class from PyTorch, which does it automatically.

In [41]:
# Define model
model = nn.Linear(3, 2)
model.weight, model.bias

(Parameter containing:
 tensor([[-0.2721, -0.0777, -0.0572],
         [-0.3578, -0.2715,  0.0870]], requires_grad=True),
 Parameter containing:
 tensor([0.0906, 0.1579], requires_grad=True))

PyTorch models also have a helpful `.parameters()` method, which returns a list containing all the weights and biases matrices present in the model. For our linear regression model, we have on weight matrix and one bias matrix. 

In [42]:
list(model.parameters())

[Parameter containing:
 tensor([[-0.2721, -0.0777, -0.0572],
         [-0.3578, -0.2715,  0.0870]], requires_grad=True),
 Parameter containing:
 tensor([0.0906, 0.1579], requires_grad=True)]

We can use the model to generate predictions in the exact same way as before.

In [43]:
preds = model(inputs)
preds

tensor([[-27.4382, -40.4128],
        [-35.1686, -50.7280],
        [-37.3137, -62.3093],
        [-33.1198, -44.7947],
        [-30.1476, -44.5063]], grad_fn=<AddmmBackward>)

# Loss Function

Instead of defining loss functions manually, we can use the built in loss-functions.

The `nn.functional` package contains many useful loss functions and several utilities.

In [44]:
import torch.nn.functional as F

In [45]:
# Define loss functional 
loss_fn = F.mse_loss

In [46]:
loss = loss_fn(model(inputs), targets)
loss

tensor(17244.0293, grad_fn=<MseLossBackward>)

# Optimizer

Instead of manually manipulating the model's parameters using gradients, we can use the optimizer `optim.SGD`.

In [47]:
# Define optimizer
opt = torch.optim.SGD(model.parameters(), lr=1e-5)

Note that `model.parameters()` is passed to the optimizer so that it knows which matrices should be modified during the update step. Also we specify the learning rate.

# Train the model

We implement the same steps for gradient descent:

1. Generate predictions.
2. Calculate the loss.
3. Compute the gradients w.r.t the weights and biases.
4. Adjust the weights by substracting a small quality proportional to the gradient.
5. Reset the gradients to Zero

And we'll be working with batches of data instead of the whole data every iteration. we'll define a utility fit function which trains the model for a given number of epochs.

In [50]:
def fit(num_epochs, model, loss_fn, opt):
    # for each epoch, we run through our data
    for epoch in range(num_epochs):
        
        # For each batch in our training data, we go through a gradient descent loop.
        for  xb, yb in train_dl:
            
            # 1. Generate predictions
            preds = model(xb)

            # 2. Calculate the loss
            loss = loss_fn(preds, yb)

            # 3. Compute the gradients
            loss.backward()

            # 4. Update the paramters
            opt.step()

            # 5. Reset the gradients to zero
            opt.zero_grad()

        # Print the progress every 10 epochs
        if (epoch + 1) % 10 == 0:
            print(f'Epoch [{epoch + 1}/{num_epochs}], Loss: {loss.item()}')


Some things to note:
- We use the data loader defined above (Globally) 
- Instead updating the paramters manually, we use the optimizer (`opt.step()`) to perform the update and `opt.zero_grad()` to reset them to zero.
- We added a log statement that shows the loss every 10 epochs. `loss.item()` returns the actual value stored in the loss tensor.

Let's train our model for 200 epochs.

In [49]:
fit(250, model, loss_fn, opt)

Epoch [10/250], Loss: 805.4483032226562
Epoch [20/250], Loss: 293.786376953125
Epoch [30/250], Loss: 251.9359893798828
Epoch [40/250], Loss: 222.95248413085938
Epoch [50/250], Loss: 197.5584716796875
Epoch [60/250], Loss: 175.17657470703125
Epoch [70/250], Loss: 155.4447021484375
Epoch [80/250], Loss: 138.0470428466797
Epoch [90/250], Loss: 122.70530700683594
Epoch [100/250], Loss: 109.17462158203125
Epoch [110/250], Loss: 97.23921203613281
Epoch [120/250], Loss: 86.70904541015625
Epoch [130/250], Loss: 77.41676330566406
Epoch [140/250], Loss: 69.2149887084961
Epoch [150/250], Loss: 61.974021911621094
Epoch [160/250], Loss: 55.579376220703125
Epoch [170/250], Loss: 49.930458068847656
Epoch [180/250], Loss: 44.938621520996094
Epoch [190/250], Loss: 40.525760650634766
Epoch [200/250], Loss: 36.62309265136719
Epoch [210/250], Loss: 33.170021057128906
Epoch [220/250], Loss: 30.11318016052246
Epoch [230/250], Loss: 27.40557861328125
Epoch [240/250], Loss: 25.00590705871582
Epoch [250/250], 

In [52]:
preds = model(inputs)
preds, targets

(tensor([[ 58.6420,  71.7206],
         [ 81.1299,  99.6374],
         [118.7685, 133.0384],
         [ 29.3656,  44.8612],
         [ 95.2309, 112.7947]], grad_fn=<AddmmBackward>),
 tensor([[ 56.,  70.],
         [ 81., 101.],
         [119., 133.],
         [ 22.,  37.],
         [103., 119.]]))