## Practice Gradient Descent and Linear Regression with PyTorch


In [39]:
import numpy as np
import torch

![linear-regression-training-data](https://i.imgur.com/6Ujttb4.png)



```
yield_apple  = w11 * temp + w12 * rainfall + w13 * humidity + b1
yield_orange = w21 * temp + w22 * rainfall + w23 * humidity + b2
```


In [40]:
# Input (temp, rainfall, humidity)
inputs = np.array([[73, 67, 43], 
                   [91, 88, 64], 
                   [87, 134, 58], 
                   [102, 43, 37], 
                   [69, 96, 70]], dtype='float32')

In [41]:
# Targets (apples, oranges)
targets = np.array([[56, 70], 
                    [81, 101], 
                    [119, 133], 
                    [22, 37], 
                    [103, 119]], dtype='float32')

In [42]:
targets.dtype

dtype('float32')

In [43]:
type(targets)

numpy.ndarray

In [44]:
# Convert inputs and targets to tensor
inputs = torch.from_numpy(inputs)
targets = torch.from_numpy(targets)
print(inputs)
print(targets)

tensor([[ 73.,  67.,  43.],
        [ 91.,  88.,  64.],
        [ 87., 134.,  58.],
        [102.,  43.,  37.],
        [ 69.,  96.,  70.]])
tensor([[ 56.,  70.],
        [ 81., 101.],
        [119., 133.],
        [ 22.,  37.],
        [103., 119.]])


In [45]:
m1 = np.array([[73, 67, 42],
               [91, 88, 64]])

In [46]:
m1

array([[73, 67, 42],
       [91, 88, 64]])

In [47]:
m2 = np.array([[10, 5, 2],
               [2, 2, 2]])


m2 = m2.transpose()
m2

array([[10,  2],
       [ 5,  2],
       [ 2,  2]])

In [48]:
# Matrix multiplication

mult = m1 @ m2
m1, m2

(array([[73, 67, 42],
        [91, 88, 64]]),
 array([[10,  2],
        [ 5,  2],
        [ 2,  2]]))

In [49]:
mult

array([[1149,  364],
       [1478,  486]])

In [50]:
10 * 73 + 5 * 67 + 42 * 2

1149

## Linear regression model from scratch


In [51]:
# Weights and biases
w = torch.randn(2, 3, requires_grad=True)
b = torch.randn(2, requires_grad=True)
print(w)
print(b)

tensor([[-0.0595, -0.7246, -1.1100],
        [ 0.5987,  0.5227,  1.1704]], requires_grad=True)
tensor([ 0.2645, -1.1283], requires_grad=True)


Model is a function that performs a matrix multiplication of  the `inputs` and the weights `w` (transposed) and adds the bias `b`.

In [52]:
def model(x):
    return x @ w.t() + b

@ represents matrix multiplication in PyTorch, and the .t method returns the transpose of a tensor.

In [53]:
# Generate predictions
preds = model(inputs)
preds

tensor([[-100.3491,  127.9222],
        [-139.9440,  174.2527],
        [-166.3760,  188.8781],
        [ -78.0241,  125.7189],
        [-151.0923,  172.2844]], grad_fn=<AddBackward0>)

In [54]:
targets

tensor([[ 56.,  70.],
        [ 81., 101.],
        [119., 133.],
        [ 22.,  37.],
        [103., 119.]])

## Loss function
* Calculate the difference between the two matrices (`preds` and `targets`).
* Square all elements of the difference matrix to remove negative values.
* Calculate the average of the elements in the resulting matrix.

The result is a single number, known as the **mean squared error** (MSE).

In [55]:
# MSE loss
def mse(t1, t2):
    diff = t1 - t2
    return torch.sum(diff * diff) / diff.numel()

`torch.sum` returns the sum of all the elements in a tensor. The `.numel` method of a tensor returns the number of elements in a tensor. Let's compute the mean squared error for the current predictions of our model.

In [56]:
loss = mse(preds, targets)
loss

tensor(25182.1992, grad_fn=<DivBackward0>)

## Compute Gradients

In [57]:
loss.backward()

In [58]:
# Gradients for weights
print(w)
print(w.grad)

tensor([[-0.0595, -0.7246, -1.1100],
        [ 0.5987,  0.5227,  1.1704]], requires_grad=True)
tensor([[-16816.3848, -19370.5469, -11780.5176],
        [  5696.3320,   5348.9810,   3486.4526]])


In [59]:
w
w.grad

tensor([[-16816.3848, -19370.5469, -11780.5176],
        [  5696.3320,   5348.9810,   3486.4526]])

In [60]:
with torch.no_grad():
    w -= w.grad * 1e-5
    b -= b.grad * 1e-5# Let's verify that the loss is actually lower
loss = mse(preds, targets)
print(loss)

tensor(25182.1992, grad_fn=<DivBackward0>)


In [61]:
# Let's verify that the loss is actually lower
loss = mse(preds, targets)
print(loss)

tensor(25182.1992, grad_fn=<DivBackward0>)


In [62]:
w.grad.zero_()
b.grad.zero_()
print(w.grad)
print(b.grad)

tensor([[0., 0., 0.],
        [0., 0., 0.]])
tensor([0., 0.])


## Train the model using gradient descent


In [63]:
# Generate predictions
preds = model(inputs)
print(preds)

tensor([[ -70.0272,  118.6802],
        [-100.0535,  162.1299],
        [-118.9544,  174.7318],
        [ -48.1812,  116.3179],
        [-112.6449,  160.7777]], grad_fn=<AddBackward0>)


In [64]:
# Calculate the loss
loss = mse(preds, targets)
print(loss)

tensor(17259.8535, grad_fn=<DivBackward0>)


In [65]:
# Compute gradients
loss.backward()
print(w.grad)
print(b.grad)

tensor([[-13683.1738, -15996.4248,  -9699.9590],
        [  4744.0479,   4330.8818,   2857.0432]])
tensor([-166.1722,   54.5275])


In [66]:
# Adjust weights & reset gradients
with torch.no_grad():
    w -= w.grad * 1e-5
    b -= b.grad * 1e-5
    w.grad.zero_()
    b.grad.zero_()

In [67]:
print(w)
print(b)

tensor([[ 0.2455, -0.3709, -0.8952],
        [ 0.4943,  0.4259,  1.1069]], requires_grad=True)
tensor([ 0.2682, -1.1295], requires_grad=True)


In [68]:
# Calculate loss
preds = model(inputs)
loss = mse(preds, targets)
print(loss)

tensor(11917.6533, grad_fn=<DivBackward0>)


We have already achieved a significant reduction in the loss merely by adjusting the weights and biases slightly using gradient descent.

## Train for multiple epochs

To reduce the loss further, we can repeat the process of adjusting the weights and biases using the gradients multiple times. Each iteration is called an _epoch_. Let's train the model for 100 epochs.

In [69]:
# Train for 100 epochs
for i in range(800):
    preds = model(inputs)
    loss = mse(preds, targets)
    loss.backward()
    with torch.no_grad():
        w -= w.grad * 1e-5
        b -= b.grad * 1e-5
        w.grad.zero_()
        b.grad.zero_()
    

In [70]:
# Calculate loss
preds = model(inputs)
loss = mse(preds, targets)
print(loss)

tensor(14.3364, grad_fn=<DivBackward0>)


In [71]:
preds

tensor([[ 57.7182,  70.1078],
        [ 78.5452, 101.2477],
        [126.1328, 131.9650],
        [ 23.6265,  37.0230],
        [ 94.1183, 119.9138]], grad_fn=<AddBackward0>)

In [72]:
targets

tensor([[ 56.,  70.],
        [ 81., 101.],
        [119., 133.],
        [ 22.,  37.],
        [103., 119.]])

In [73]:
import jovian
jovian.commit(filename="practice-linear-regression.ipynb")

<IPython.core.display.Javascript object>

[jovian] Attempting to save notebook..
[jovian] Updating notebook "aleksmn/practice-linear-regression" on https://jovian.ai/
[jovian] Uploading notebook..
[jovian] Capturing environment..
[jovian] Error: Failed to read Anaconda environment using command: "conda env export -n base --no-builds"
[jovian] Committed successfully! https://jovian.ai/aleksmn/practice-linear-regression


'https://jovian.ai/aleksmn/practice-linear-regression'

## Linear regression using PyTorch built-ins


In [87]:
import torch.nn as nn

In [88]:
# Input (temp, rainfall, humidity)
inputs = np.array([[73, 67, 43], 
                   [91, 88, 64], 
                   [87, 134, 58], 
                   [102, 43, 37], 
                   [69, 96, 70], 
                   [74, 66, 43], 
                   [91, 87, 65], 
                   [88, 134, 59], 
                   [101, 44, 37], 
                   [68, 96, 71], 
                   [73, 66, 44], 
                   [92, 87, 64], 
                   [87, 135, 57], 
                   [103, 43, 36], 
                   [68, 97, 70]], 
                  dtype='float32')

# Targets (apples, oranges)
targets = np.array([[56, 70], 
                    [81, 101], 
                    [119, 133], 
                    [22, 37], 
                    [103, 119],
                    [57, 69], 
                    [80, 102], 
                    [118, 132], 
                    [21, 38], 
                    [104, 118], 
                    [57, 69], 
                    [82, 100], 
                    [118, 134], 
                    [20, 38], 
                    [102, 120]], 
                   dtype='float32')

inputs = torch.from_numpy(inputs)
targets = torch.from_numpy(targets)

In [89]:
inputs

tensor([[ 73.,  67.,  43.],
        [ 91.,  88.,  64.],
        [ 87., 134.,  58.],
        [102.,  43.,  37.],
        [ 69.,  96.,  70.],
        [ 74.,  66.,  43.],
        [ 91.,  87.,  65.],
        [ 88., 134.,  59.],
        [101.,  44.,  37.],
        [ 68.,  96.,  71.],
        [ 73.,  66.,  44.],
        [ 92.,  87.,  64.],
        [ 87., 135.,  57.],
        [103.,  43.,  36.],
        [ 68.,  97.,  70.]])

## Dataset and DataLoader


In [90]:
from torch.utils.data import TensorDataset

In [91]:
# Define datasets
train_ds = TensorDataset(inputs, targets)
train_ds[0:3] 

(tensor([[ 73.,  67.,  43.],
         [ 91.,  88.,  64.],
         [ 87., 134.,  58.]]),
 tensor([[ 56.,  70.],
         [ 81., 101.],
         [119., 133.]]))

In [92]:
from torch.utils.data import DataLoader

In [93]:
# Define data loader
batch_size = 5
train_dl = DataLoader(train_ds, batch_size, shuffle=True)

In [94]:
for xb, yb in train_dl:
    print(xb)
    print(yb)
    break

tensor([[103.,  43.,  36.],
        [ 87., 134.,  58.],
        [ 68.,  97.,  70.],
        [102.,  43.,  37.],
        [ 91.,  87.,  65.]])
tensor([[ 20.,  38.],
        [119., 133.],
        [102., 120.],
        [ 22.,  37.],
        [ 80., 102.]])


## nn.Linear

In [97]:
model = nn.Linear(3, 2)
print(model.weight)
print(model.bias)

Parameter containing:
tensor([[-0.2334, -0.2486,  0.0896],
        [-0.3911,  0.1839,  0.5082]], requires_grad=True)
Parameter containing:
tensor([ 0.0592, -0.3712], requires_grad=True)


In [101]:
list(model.parameters())

[Parameter containing:
 tensor([[-0.2334, -0.2486,  0.0896],
         [-0.3911,  0.1839,  0.5082]], requires_grad=True),
 Parameter containing:
 tensor([ 0.0592, -0.3712], requires_grad=True)]

In [104]:
preds = model(inputs)
preds

tensor([[-29.7873,   5.2510],
        [-37.3287,  12.7449],
        [-48.3703,  19.7200],
        [-31.1273, -13.5545],
        [-33.6445,  25.8700],
        [-29.7721,   4.6759],
        [-36.9905,  13.0692],
        [-48.5141,  19.8370],
        [-31.1425, -12.9795],
        [-33.3215,  26.7693],
        [-29.4491,   5.5753],
        [-37.3135,  12.1698],
        [-48.7085,  19.3957],
        [-31.4503, -14.4538],
        [-33.6597,  26.4450]], grad_fn=<AddmmBackward>)

## Loss function

In [105]:
import torch.nn.functional as F

In [106]:
loss_fn = F.mse_loss

In [107]:
loss = loss_fn(model(inputs), targets)
loss

tensor(10663.5127, grad_fn=<MseLossBackward>)

## Optimizer

Instead of manually manipulating the model's weights & biases using gradients, we can use the optimizer `optim.SGD`. SGD is short for "stochastic gradient descent". The term _stochastic_ indicates that samples are selected in random batches instead of as a single group.

In [108]:
# Define optimizer
opt = torch.optim.SGD(model.parameters(), lr=1e-5)

## Train the model

We are now ready to train the model. We'll follow the same process to implement gradient descent:

1. Generate predictions

2. Calculate the loss

3. Compute gradients w.r.t the weights and biases

4. Adjust the weights by subtracting a small quantity proportional to the gradient

5. Reset the gradients to zero

The only change is that we'll work batches of data instead of processing the entire training data in every iteration. Let's define a utility function `fit` that trains the model for a given number of epochs.

In [109]:
# Utility function to train the model
def fit(num_epochs, model, loss_fn, opt, train_dl):

    # Repeat for given number of epochs
    for epoch in range(num_epochs):

        # Train with batches of data
        for xb, yb in train_dl:

            # 1. Generate predictions
            pred = model(xb)

            # 2. Calculate loss
            loss = loss_fn(pred, yb)

            # 3. Compute gradients
            loss.backward()


            # 4. Update parameters using gradients
            opt.step()

            # 5. Reset the gradients to zero
            opt.zero_grad()
        
        # Print the progress
        if (epoch + 1) % 10 == 0:
            print('Epoch [{}/{}], Loss: {:.4f}'.format(epoch+1, num_epochs, loss.item()))



In [110]:
fit(100, model, loss_fn, opt, train_dl)

Epoch [10/100], Loss: 196.8401
Epoch [20/100], Loss: 161.1165
Epoch [30/100], Loss: 95.8791
Epoch [40/100], Loss: 56.4104
Epoch [50/100], Loss: 69.7375
Epoch [60/100], Loss: 34.4572
Epoch [70/100], Loss: 6.0207
Epoch [80/100], Loss: 6.2167
Epoch [90/100], Loss: 16.8885
Epoch [100/100], Loss: 6.5597


In [111]:
# Generate predictions
preds = model(inputs)
preds

tensor([[ 58.1426,  70.7555],
        [ 81.9050, 100.3173],
        [117.0020, 132.5575],
        [ 27.2462,  40.2869],
        [ 97.8682, 116.5772],
        [ 57.0528,  69.7381],
        [ 81.6517, 100.3470],
        [117.2822, 133.1449],
        [ 28.3360,  41.3042],
        [ 98.7047, 117.6243],
        [ 57.8893,  70.7852],
        [ 80.8152,  99.3000],
        [117.2553, 132.5278],
        [ 26.4097,  39.2398],
        [ 98.9580, 117.5946]], grad_fn=<AddmmBackward>)

In [112]:
targets

tensor([[ 56.,  70.],
        [ 81., 101.],
        [119., 133.],
        [ 22.,  37.],
        [103., 119.],
        [ 57.,  69.],
        [ 80., 102.],
        [118., 132.],
        [ 21.,  38.],
        [104., 118.],
        [ 57.,  69.],
        [ 82., 100.],
        [118., 134.],
        [ 20.,  38.],
        [102., 120.]])

In [113]:
model(torch.tensor([[75, 63, 44.]]))

tensor([[54.8981, 67.9631]], grad_fn=<AddmmBackward>)