In [2]:
import numpy as np
import torch

## Training data

We can represent the training data using two matrices: `inputs` and `targets`, each with one row per observation, and one column per variable.

In [3]:
# Input (temp, rainfall, humidity)
inputs = np.array([[73, 67, 43], 
                   [91, 88, 64], 
                   [87, 134, 58], 
                   [102, 43, 37], 
                   [69, 96, 70]], dtype='float32')
# Targets (apples, oranges)
targets = np.array([[56, 70], 
                    [81, 101], 
                    [119, 133], 
                    [22, 37], 
                    [103, 119]], dtype='float32')
# Convert inputs and targets to tensors
inputs = torch.from_numpy(inputs)
targets = torch.from_numpy(targets)
print(inputs)
print(targets)

tensor([[ 73.,  67.,  43.],
        [ 91.,  88.,  64.],
        [ 87., 134.,  58.],
        [102.,  43.,  37.],
        [ 69.,  96.,  70.]])
tensor([[ 56.,  70.],
        [ 81., 101.],
        [119., 133.],
        [ 22.,  37.],
        [103., 119.]])


## Linear regression model from scratch

The weights and biases (`w11, w12,... w23, b1 & b2`) can also be represented as matrices, initialized as random values. The first row of `w` and the first element of `b` are used to predict the first target variable, i.e., yield of apples, and similarly, the second for oranges.

In [15]:
# Weights and biases
w = torch.randn(2, 3, requires_grad=True)
b = torch.randn(2, requires_grad=True)
print(w)
print(b)

tensor([[ 1.0845,  2.6584, -0.2787],
        [ 0.2692, -0.8148,  0.7142]], requires_grad=True)
tensor([0.1663, 1.1629], requires_grad=True)


In [16]:
def model(x):
    return x @ w.t() + b

In [17]:
# Generate predictions
preds = model(inputs)
print(preds)

tensor([[ 2.4546e+02, -3.0678e+00],
        [ 3.1496e+02, -3.3474e-01],
        [ 4.3458e+02, -4.3178e+01],
        [ 2.1479e+02,  2.0008e+01],
        [ 3.1070e+02, -8.4894e+00]], grad_fn=<AddBackward0>)


## Loss function

Before we improve our model, we need a way to evaluate how well our model is performing. We can compare the model's predictions with the actual targets using the following method:

* Calculate the difference between the two matrices (`preds` and `targets`).
* Square all elements of the difference matrix to remove negative values.
* Calculate the average of the elements in the resulting matrix.

The result is a single number, known as the **mean squared error** (MSE).

In [18]:
# MSE loss
def mse(t1, t2):
    diff = t1 - t2
    return torch.sum(diff * diff) / diff.numel()

`torch.sum` returns the sum of all the elements in a tensor. 

The `.numel` method of a tensor returns the number of elements in a tensor. 

In [19]:
# Compute loss
loss = mse(preds, targets)
print(loss)

tensor(33371.7266, grad_fn=<DivBackward0>)


In [20]:
# Compute gradients
loss.backward()

In [21]:
# Gradients for weights
print(w)
print(w.grad)

tensor([[ 1.0845,  2.6584, -0.2787],
        [ 0.2692, -0.8148,  0.7142]], requires_grad=True)
tensor([[ 19314.3789,  20759.7793,  12619.1650],
        [ -8082.5703, -10078.0967,  -5879.7246]])


In [27]:
with torch.no_grad():
    w -= w.grad * 1e-5
    b -= b.grad * 1e-5

We use `torch.no_grad` to indicate to PyTorch that we shouldn't track, calculate, or modify gradients while updating the weights and biases.

In [29]:
# Let's verify that the loss is actually lower
loss = mse(preds, targets)
print(loss)

tensor(33371.7266, grad_fn=<DivBackward0>)


Before we proceed, we reset the gradients to zero by invoking the .zero_() method.

 We need to do this because PyTorch accumulates gradients. Otherwise, the next time we invoke .backward on the loss, the new gradient values are added to the existing gradients, which may lead to unexpected results.

In [30]:
w.grad.zero_()
b.grad.zero_()
print(w.grad)
print(b.grad)

tensor([[0., 0., 0.],
        [0., 0., 0.]])
tensor([0., 0.])


### Implementation - Train for multiple epochs

## Train the model using gradient descent

As seen above, we reduce the loss and improve our model using the gradient descent optimization algorithm. Thus, we can _train_ the model using the following steps:

1. Generate predictions

2. Calculate the loss

3. Compute gradients w.r.t the weights and biases

4. Adjust the weights by subtracting a small quantity proportional to the gradient

5. Reset the gradients to zero

Let's implement the above step by step.

In [115]:
print(inputs)
print(targets)

tensor([[ 73.,  67.,  43.],
        [ 91.,  88.,  64.],
        [ 87., 134.,  58.],
        [102.,  43.,  37.],
        [ 69.,  96.,  70.]])
tensor([[ 56.,  70.],
        [ 81., 101.],
        [119., 133.],
        [ 22.,  37.],
        [103., 119.]])


In [78]:
inputs.shape

torch.Size([5, 3])

In [152]:
def model(x,w,b):
    return x @ w.t() + b

def loss_function(pred, targets):
    dif = pred - targets
    return torch.sum(dif*dif) / dif.numel()

w = torch.randn(2, 3, requires_grad=True)
b = torch.randn(2, requires_grad=True)

it = 40

best_w, best_b, best_loss = w, b, loss_function(model(inputs,w,b),targets)

for _ in range(it):

    pred = model(inputs,w,b)
    loss = loss_function(pred, targets)
    loss.backward()
    lr = 1e-5
    with torch.no_grad():
        w -= w.grad * lr
        b -= b.grad * lr

        w.grad.zero_()
        b.grad.zero_()

    if loss < best_loss:
        best_w, best_b, best_loss = w, b, loss

    if (_+1)%5 == 0: print(loss)

w, b, loss = best_w, best_b, best_loss 

print(f'Loss: {loss}')
print(targets)
print(model(inputs,w,b))

tensor(16709.5352, grad_fn=<DivBackward0>)
tensor(2403.9863, grad_fn=<DivBackward0>)
tensor(412.1810, grad_fn=<DivBackward0>)
tensor(131.8728, grad_fn=<DivBackward0>)
tensor(89.6205, grad_fn=<DivBackward0>)
tensor(80.6427, grad_fn=<DivBackward0>)
tensor(76.4687, grad_fn=<DivBackward0>)
tensor(73.1310, grad_fn=<DivBackward0>)
Loss: 73.13096618652344
tensor([[ 56.,  70.],
        [ 81., 101.],
        [119., 133.],
        [ 22.,  37.],
        [103., 119.]])
tensor([[ 57.7773,  73.4738],
        [ 76.9917, 102.1158],
        [128.8356, 124.6824],
        [ 24.6041,  54.7388],
        [ 90.8556, 111.4090]], grad_fn=<AddBackward0>)
