Gradient Descent & Linear Regression w/ PyTorch

We're going to make a linear regression model to predict crop yields based off some features in the data

![](2022-06-02-17-31-34.png)

How to determine yield

yield_apple  = w11 * temp + w12 * rainfall + w13 * humidity + b1
yield_orange = w21 * temp + w22 * rainfall + w23 * humidity + b2

b1, b2 are a bias

the learning part of linear regression is to figure out a set of weights w11, w12,... w23, b1 & b2 using the training data, to make accurate predictions for new data. The learned weights will be used to predict the yields for apples and oranges in a new region using the average temperature, rainfall, and humidity for that region.

We'll train our model by adjusting the weights slightly many times to make better predictions, using an optimization technique called gradient descent. Let's begin by importing Numpy and PyTorch.

In [452]:
import numpy as np
import torch

In [453]:
# Input (temp, rainfall, humidity)
inputs = np.array([[73, 67, 43], 
                   [91, 88, 64], 
                   [87, 134, 58], 
                   [102, 43, 37], 
                   [69, 96, 70]], dtype='float32')

In [454]:
 # Targets (apples, oranges)
targets = np.array([[56, 70], 
                    [81, 101], 
                    [119, 133], 
                    [22, 37], 
                    [103, 119]], dtype='float32')

In [455]:
inputs = torch.from_numpy(inputs)
targets = torch.from_numpy(targets)

In [456]:
#Creating the weights and biases, initially random values

#randn is used to randomly select elements from a normal dist, centered at 0 w/ SD 1
weights = torch.randn(2, 3, requires_grad=True)
biases = torch.randn(2, requires_grad=True)

weights, biases


(tensor([[-0.3034,  0.5033,  0.3379],
         [-0.2687,  2.0168,  0.0073]], requires_grad=True),
 tensor([-0.6933,  1.0025], requires_grad=True))

We can define our model as simply a function for now:

![](2022-06-02-17-47-29.png)

In [457]:
# matrix multiplication is done using the @ symbol w/ pytorch
def model(x):
    return x @ weights.t() + biases

In [458]:
preds = model(inputs)
print(preds)

#Here each row represents the prediction for the yield of apples and orange respectively, where each row represents a region

tensor([[ 25.4078, 116.8274],
        [ 37.6114, 154.4967],
        [ 59.9475, 248.3006],
        [  2.5035,  60.5889],
        [ 50.3397, 176.5856]], grad_fn=<AddBackward0>)


In [459]:
print(targets)

tensor([[ 56.,  70.],
        [ 81., 101.],
        [119., 133.],
        [ 22.,  37.],
        [103., 119.]])


In [460]:
#How far every prediction is from the target
diff = preds - targets

#Mean Squared Error
torch.sum(diff * diff) / diff.numel()


tensor(3168.0337, grad_fn=<DivBackward0>)

In [461]:
def mse(preds, targets):
    diff = preds - targets
    return torch.sum(diff * diff) / diff.numel()

In [462]:
#On average, each element in the prediction differs by the actual target by the sqrt(mse)
#Loss indicates how bad the model is at predicting the target values
loss = mse(preds, targets)

loss

tensor(3168.0337, grad_fn=<DivBackward0>)

In [463]:
#Loss is a function of weights and biases so we can compute the gradient of each tensor
loss.backward()


In [464]:
#Each element in the grad matrix represents the derivative of the loss w.r.t. each element of the matrix
weights.grad

tensor([[-3388.2715, -3934.9287, -2384.9937],
        [ 4939.4468,  5967.5938,  3405.7173]])

Adjust weights and biases to reduce the loss
The loss is a quadratic function of our weights and biases, and our objective is to find the set of weights where the loss is the lowest. If we plot a graph of the loss w.r.t any individual weight or bias element, it will look like the figure shown below. An important insight from calculus is that the gradient indicates the rate of change of the loss, i.e., the loss function's slope w.r.t. the weights and biases.

If a gradient element is positive:

increasing the weight element's value slightly will increase the loss
decreasing the weight element's value slightly will decrease the loss

![](2022-06-02-20-48-34.png)

The gradient of the loss w.r.t. each element indicates the slope of the loss function w.r.t. each respective element

We compute the gradient to find the direction of greatest descent in order find the minimum loss

We want to iteratively adjust the value of the weights to move in the direction of greatest descent

In [465]:
#Use .no_grad() when you don't want to compute the gradient of the computation you're performing
with torch.no_grad():
    weights -= weights.grad * 1e-5 # <- We call this the learning rate
    biases -= biases.grad * 1e-5

In [466]:
weights, biases

(tensor([[-0.2695,  0.5426,  0.3618],
         [-0.3181,  1.9571, -0.0268]], requires_grad=True),
 tensor([-0.6928,  1.0019], requires_grad=True))

In [467]:
preds = model(inputs)

loss = mse(preds, targets)

loss

tensor(2218.5178, grad_fn=<DivBackward0>)

In [468]:
weights.grad.zero_()
biases.grad.zero_()
print(weights.grad)
print(biases.grad)

tensor([[0., 0., 0.],
        [0., 0., 0.]])
tensor([0., 0.])


Now Let's train our model using gradient descent following these steps:

1. Generate predictions

2. Calculate the loss

3. Compute gradients w.r.t the weights and biases

4. Adjust the weights by subtracting a small quantity proportional to the gradient

5. Reset the gradients to zero

6. Iterate

In [469]:
#1. Generate Prediction

preds = model(inputs)
print(preds)

tensor([[ 31.5436, 107.7583],
        [ 45.6843, 142.5701],
        [ 69.5518, 234.0308],
        [  8.5344,  51.7239],
        [ 58.1251, 165.0639]], grad_fn=<AddBackward0>)


In [470]:
#2. Calculate the loss

loss = mse(preds, targets)
print(loss)

tensor(2218.5178, grad_fn=<DivBackward0>)


In [471]:
#3. Compute the gradient

loss.backward()
print(weights.grad)
print(biases.grad)


tensor([[-2754.1799, -3251.8872, -1963.8596],
        [ 4001.8320,  4956.2720,  2782.6272]])
tensor([-33.5122,  48.2294])


In [472]:
#4. Adjust weights by subtracting a small portion of the gradient

with torch.no_grad():
    weights -= weights.grad * 1e-5
    biases -= biases.grad * 1e-5
    
    #5. Reset the gradients back to zero
    weights.grad.zero_()
    biases.grad.zero_()

In [473]:
def train_model(epochs):
    global weights, biases
    for i in range(epochs):
        # print(weights)
        preds = model(inputs)
        loss = mse(preds, targets)
        loss.backward()
        with torch.no_grad():
            weights -= weights.grad * 1e-5
            biases -= biases.grad * 1e-5
            weights.grad.zero_()




In [474]:
train_model(10000)

In [475]:
preds = model(inputs)
loss = mse(preds, targets)

loss

#We see now that the loss has been quite minimized

tensor(0.6582, grad_fn=<DivBackward0>)

In [476]:
import jovian

In [477]:
jovian.commit(project = '02-gradient-descent', filename='2-gradient_descent_and_lin_reg.ipynb')

<IPython.core.display.Javascript object>

[jovian] Updating notebook "danielcufino/02-gradient-descent" on https://jovian.ai/[0m
[jovian] Committed successfully! https://jovian.ai/danielcufino/02-gradient-descent[0m


'https://jovian.ai/danielcufino/02-gradient-descent'

Linear Regression using PyTorch builtins

In [478]:
import torch.nn as nn

In [479]:
# Input (temp, rainfall, humidity)
inputs = np.array([[73, 67, 43], 
                   [91, 88, 64], 
                   [87, 134, 58], 
                   [102, 43, 37], 
                   [69, 96, 70], 
                   [74, 66, 43], 
                   [91, 87, 65], 
                   [88, 134, 59], 
                   [101, 44, 37], 
                   [68, 96, 71], 
                   [73, 66, 44], 
                   [92, 87, 64], 
                   [87, 135, 57], 
                   [103, 43, 36], 
                   [68, 97, 70]], 
                  dtype='float32')

# Targets (apples, oranges)
targets = np.array([[56, 70], 
                    [81, 101], 
                    [119, 133], 
                    [22, 37], 
                    [103, 119],
                    [57, 69], 
                    [80, 102], 
                    [118, 132], 
                    [21, 38], 
                    [104, 118], 
                    [57, 69], 
                    [82, 100], 
                    [118, 134], 
                    [20, 38], 
                    [102, 120]], 
                   dtype='float32')

inputs = torch.from_numpy(inputs)
targets = torch.from_numpy(targets)

In [480]:
inputs

tensor([[ 73.,  67.,  43.],
        [ 91.,  88.,  64.],
        [ 87., 134.,  58.],
        [102.,  43.,  37.],
        [ 69.,  96.,  70.],
        [ 74.,  66.,  43.],
        [ 91.,  87.,  65.],
        [ 88., 134.,  59.],
        [101.,  44.,  37.],
        [ 68.,  96.,  71.],
        [ 73.,  66.,  44.],
        [ 92.,  87.,  64.],
        [ 87., 135.,  57.],
        [103.,  43.,  36.],
        [ 68.,  97.,  70.]])

In [481]:
from torch.utils.data import TensorDataset

In [482]:
# We break the dataset into batches and perform gradient descent on those batches to train our model faster
train_ds = TensorDataset(inputs, targets)
#TensorDataset objects allow us to access input and target rows as tuples 

#This gives us the first three rows of inputs and first 3 rows of targets
train_ds[0:3]


(tensor([[ 73.,  67.,  43.],
         [ 91.,  88.,  64.],
         [ 87., 134.,  58.]]),
 tensor([[ 56.,  70.],
         [ 81., 101.],
         [119., 133.]]))

In [483]:
from torch.utils.data import DataLoader

In [484]:
#We define a data loader to split our dataset into batches of a defined size

batch_size = 15

train_dl = DataLoader(train_ds, batch_size, shuffle=True) #Setting shuffle to true shuffle all the data before putting it into batches

In [485]:
#Iterating over a DataLoader gives you a batch of input and its corresponding batch of output 
for xb, yb in train_dl:
    print(xb)
    print(yb)
    break

tensor([[ 91.,  87.,  65.],
        [ 87., 134.,  58.],
        [101.,  44.,  37.],
        [103.,  43.,  36.],
        [102.,  43.,  37.],
        [ 73.,  66.,  44.],
        [ 73.,  67.,  43.],
        [ 87., 135.,  57.],
        [ 88., 134.,  59.],
        [ 68.,  96.,  71.],
        [ 68.,  97.,  70.],
        [ 91.,  88.,  64.],
        [ 74.,  66.,  43.],
        [ 92.,  87.,  64.],
        [ 69.,  96.,  70.]])
tensor([[ 80., 102.],
        [119., 133.],
        [ 21.,  38.],
        [ 20.,  38.],
        [ 22.,  37.],
        [ 57.,  69.],
        [ 56.,  70.],
        [118., 134.],
        [118., 132.],
        [104., 118.],
        [102., 120.],
        [ 81., 101.],
        [ 57.,  69.],
        [ 82., 100.],
        [103., 119.]])


In [486]:
#Creating the model with PyTorch

model = nn.Linear(3, 2)
print(model.weight)
print(model.bias)


Parameter containing:
tensor([[ 0.1585,  0.5506, -0.0265],
        [ 0.1283, -0.0863,  0.0453]], requires_grad=True)
Parameter containing:
tensor([-0.0463,  0.2311], requires_grad=True)


In [487]:
list(model.parameters())

[Parameter containing:
 tensor([[ 0.1585,  0.5506, -0.0265],
         [ 0.1283, -0.0863,  0.0453]], requires_grad=True),
 Parameter containing:
 tensor([-0.0463,  0.2311], requires_grad=True)]

In [488]:
#Generate predictions

preds = model(inputs)
preds

tensor([[47.2755,  5.7656],
        [61.1351,  7.2147],
        [85.9889,  2.4595],
        [38.8153, 11.2871],
        [61.8947,  3.9725],
        [46.8834,  5.9803],
        [60.5580,  7.3463],
        [86.1209,  2.6331],
        [39.2074, 11.0725],
        [61.7097,  3.8895],
        [46.6984,  5.8972],
        [60.7429,  7.4294],
        [86.5660,  2.3279],
        [39.0003, 11.3702],
        [62.2868,  3.7579]], grad_fn=<AddmmBackward0>)

In [489]:
import torch.nn.functional as F

In [490]:
loss_fn = F.mse_loss

In [491]:
loss = loss_fn(model(inputs), targets)
print(loss)

tensor(4733.7285, grad_fn=<MseLossBackward0>)


In [492]:
#Define our optimizer to minimize loss instead of manually performing gradient descent
#SGD Stochastic Gradient Descent: elements in batches are selected randomly

opt = torch.optim.SGD(model.parameters(), lr=1e-5) #we pass model.parameters() as an argument so the optimizer knows which matrices to modify



In [493]:
#Now Define a function that puts all the parts together

def fit(epochs, model, loss_fn, optimizer, train_dl):

    #Repeat epoch number of times
    for epoch in range(epochs):

        # Train using the batches of data
        for xb, yb in train_dl:
            #1. Generate predictions
            pred = model(xb)
            #2. Calculate the loss
            loss = loss_fn(pred, yb)
            #3. Compute Gradients
            loss.backward()
            #4. Update parameters  
            optimizer.step()
            #5. Reset Gradients
            optimizer.zero_grad()

            #Print the loss after a few epochs
            if (epoch + 1) % 10 == 0:
                print('Epoch [{} / {}]: Loss: {:.4f}'.format(epoch + 1, epochs, loss.item()))


In [494]:
fit(1000, model, loss_fn, opt, train_dl)

Epoch [10 / 1000]: Loss: 548.1976
Epoch [20 / 1000]: Loss: 381.3996
Epoch [30 / 1000]: Loss: 337.1642
Epoch [40 / 1000]: Loss: 300.1525
Epoch [50 / 1000]: Loss: 267.5632
Epoch [60 / 1000]: Loss: 238.8285
Epoch [70 / 1000]: Loss: 213.4854
Epoch [80 / 1000]: Loss: 191.1270
Epoch [90 / 1000]: Loss: 171.3957
Epoch [100 / 1000]: Loss: 153.9766
Epoch [110 / 1000]: Loss: 138.5928
Epoch [120 / 1000]: Loss: 125.0008
Epoch [130 / 1000]: Loss: 112.9859
Epoch [140 / 1000]: Loss: 102.3597
Epoch [150 / 1000]: Loss: 92.9563
Epoch [160 / 1000]: Loss: 84.6297
Epoch [170 / 1000]: Loss: 77.2513
Epoch [180 / 1000]: Loss: 70.7082
Epoch [190 / 1000]: Loss: 64.9009
Epoch [200 / 1000]: Loss: 59.7419
Epoch [210 / 1000]: Loss: 55.1542
Epoch [220 / 1000]: Loss: 51.0700
Epoch [230 / 1000]: Loss: 47.4298
Epoch [240 / 1000]: Loss: 44.1810
Epoch [250 / 1000]: Loss: 41.2774
Epoch [260 / 1000]: Loss: 38.6785
Epoch [270 / 1000]: Loss: 36.3484
Epoch [280 / 1000]: Loss: 34.2557
Epoch [290 / 1000]: Loss: 32.3726
Epoch [30

In [495]:
preds = model(inputs)
preds

tensor([[ 57.1649,  70.6139],
        [ 80.3567,  99.1806],
        [121.8035, 135.6339],
        [ 21.8111,  38.5679],
        [ 98.3461, 115.7023],
        [ 55.8828,  69.5062],
        [ 79.8967,  99.0237],
        [121.9257, 136.0959],
        [ 23.0932,  39.6756],
        [ 99.1682, 116.6531],
        [ 56.7049,  70.4570],
        [ 79.0746,  98.0729],
        [122.2635, 135.7908],
        [ 20.9890,  37.6171],
        [ 99.6282, 116.8100]], grad_fn=<AddmmBackward0>)

In [496]:
print(targets)

tensor([[ 56.,  70.],
        [ 81., 101.],
        [119., 133.],
        [ 22.,  37.],
        [103., 119.],
        [ 57.,  69.],
        [ 80., 102.],
        [118., 132.],
        [ 21.,  38.],
        [104., 118.],
        [ 57.,  69.],
        [ 82., 100.],
        [118., 134.],
        [ 20.,  38.],
        [102., 120.]])


In [497]:
#Difference in the predicted and actual targets
print(preds - targets)

tensor([[ 1.1649,  0.6139],
        [-0.6433, -1.8194],
        [ 2.8035,  2.6339],
        [-0.1889,  1.5679],
        [-4.6539, -3.2977],
        [-1.1172,  0.5062],
        [-0.1033, -2.9763],
        [ 3.9257,  4.0959],
        [ 2.0932,  1.6756],
        [-4.8318, -1.3469],
        [-0.2951,  1.4570],
        [-2.9254, -1.9271],
        [ 4.2635,  1.7908],
        [ 0.9890, -0.3829],
        [-2.3718, -3.1900]], grad_fn=<SubBackward0>)


In [498]:
#Predicted Crop yields of Temperature, Rainfall, and 
model(torch.tensor([75, 63, 44.]))

tensor([53.2084, 67.3782], grad_fn=<AddBackward0>)

In [499]:
jovian.commit(project = '02-gradient-descent', filename='2-gradient_descent_and_lin_reg.ipynb')

<IPython.core.display.Javascript object>

[jovian] Updating notebook "danielcufino/02-gradient-descent" on https://jovian.ai/[0m
[jovian] Committed successfully! https://jovian.ai/danielcufino/02-gradient-descent[0m


'https://jovian.ai/danielcufino/02-gradient-descent'

Here are some resources for learning more about linear regression and gradient descent:

An visual & animated explanation of gradient descent: https://www.youtube.com/watch?v=IHZwWFHWa-w

For a more detailed explanation of derivates and gradient descent, see these notes from a Udacity course.

For an animated visualization of how linear regression works, see this post.

For a more mathematical treatment of matrix calculus, linear regression and gradient descent, you should check out Andrew Ng's excellent course notes from CS229 at Stanford University.

To practice and test your skills, you can participate in the Boston Housing Price Prediction competition on Kaggle, a website that hosts data science competitions.

With this, we complete our discussion of linear regression in PyTorch, and we’re ready to move on to the next topic: Working with Images & Logistic Regression.