## Linear Regression with PyTorch



![linear-regression-training-data](https://i.imgur.com/6Ujttb4.png)

I

The *learning* part of linear regression is to figure out a set of weights `w11, w12,... w23, b1 & b2` by looking at the training data, to make accurate predictions for new data (i.e. to predict the yields for apples and oranges in a new region using the average temperature, rainfall and humidity). This is done by adjusting the weights slightly many times to make better predictions, using an optimization technique called *gradient descent*.

In [24]:
import numpy as np
import torch

In [25]:
# Input (temp, rainfall, humidity)
inputs = np.array([[73, 67, 43], 
                   [91, 88, 64], 
                   [87, 134, 58], 
                   [102, 43, 37], 
                   [69, 96, 70]], dtype='float32')

In [26]:
# Targets (apples, oranges)
targets = np.array([[56, 70], 
                    [81, 101], 
                    [119, 133], 
                    [22, 37], 
                    [103, 119]], dtype='float32')

In [27]:
# Convert inputs and targets to tensors
inputs = torch.from_numpy(inputs)
targets = torch.from_numpy(targets)
print(inputs)
print(targets)

tensor([[ 73.,  67.,  43.],
        [ 91.,  88.,  64.],
        [ 87., 134.,  58.],
        [102.,  43.,  37.],
        [ 69.,  96.,  70.]])
tensor([[ 56.,  70.],
        [ 81., 101.],
        [119., 133.],
        [ 22.,  37.],
        [103., 119.]])


## Linear regression model from scratch

The weights and biases (`w11, w12,... w23, b1 & b2`) can also be represented as matrices, initialized as random values. The first row of `w` and the first element of `b` are used to predict the first target variable i.e. yield of apples, and similarly the second for oranges.

In [28]:
# Weights and biases
w = torch.randn(2, 3, requires_grad=True)
b = torch.randn(2, requires_grad=True)
print(w)
print(b)

tensor([[-0.2318, -0.3457,  0.6827],
        [ 0.4089,  0.2843,  0.2326]], requires_grad=True)
tensor([ 0.2932, -0.7258], requires_grad=True)


In [29]:
def model(x):
    return x @ w.t() + b

In [30]:
# Generate predictions
preds = model(inputs)
print(preds)

tensor([[-10.4397,  58.1673],
        [ -7.5368,  76.3803],
        [-26.6092,  86.4257],
        [-12.9614,  61.8064],
        [ -1.1062,  71.0550]], grad_fn=<AddBackward0>)


In [31]:
# Compare with targets
print(targets)

tensor([[ 56.,  70.],
        [ 81., 101.],
        [119., 133.],
        [ 22.,  37.],
        [103., 119.]])


## Loss function
 We can compare the model's predictions with the actual targets, using the following method:

* Calculate the difference between the two matrices (`preds` and `targets`).
* Square all elements of the difference matrix to remove negative values.
* Calculate the average of the elements in the resulting matrix.

The result is a single number, known as the **mean squared error** (MSE).

In [32]:
# MSE loss
def mse(t1, t2):
    diff = t1 - t2
    return torch.sum(diff * diff) / diff.numel()

In [33]:
# Compute loss
loss = mse(preds, targets)
print(loss)

tensor(5134.4829, grad_fn=<DivBackward0>)


In [34]:
# Compute gradients
loss.backward()

In [35]:
# Gradients for weights
print(w)
print(w.grad)

tensor([[-0.2318, -0.3457,  0.6827],
        [ 0.4089,  0.2843,  0.2326]], requires_grad=True)
tensor([[-7264.8682, -8650.3730, -5109.9209],
        [-1586.8176, -2547.2639, -1444.8171]])


Before we proceed, we reset the gradients to zero by calling `.zero_()` method. We need to do this, because PyTorch accumulates, gradients i.e. the next time we call `.backward` on the loss, the new gradient values will get added to the existing gradient values, which may lead to unexpected results.

In [36]:
w.grad.zero_()
b.grad.zero_()
print(w.grad)
print(b.grad)

tensor([[0., 0., 0.],
        [0., 0., 0.]])
tensor([0., 0.])


## Adjust weights and biases using gradient descent

We'll reduce the loss and improve our model using the gradient descent optimization algorithm, which has the following steps:

1. Generate predictions

2. Calculate the loss

3. Compute gradients w.r.t the weights and biases

4. Adjust the weights by subtracting a small quantity proportional to the gradient

5. Reset the gradients to zero

Let's implement the above step by step.

In [37]:
# Generate predictions
preds = model(inputs)
print(preds)

tensor([[-10.4397,  58.1673],
        [ -7.5368,  76.3803],
        [-26.6092,  86.4257],
        [-12.9614,  61.8064],
        [ -1.1062,  71.0550]], grad_fn=<AddBackward0>)


In [38]:
# Calculate the loss
loss = mse(preds, targets)
print(loss)

tensor(5134.4829, grad_fn=<DivBackward0>)


In [39]:
# Compute gradients
loss.backward()
print(w.grad)
print(b.grad)

tensor([[-7264.8682, -8650.3730, -5109.9209],
        [-1586.8176, -2547.2639, -1444.8171]])
tensor([-87.9307, -21.2330])


In [40]:
# Adjust weights & reset gradients
with torch.no_grad():
    w -= w.grad * 1e-5
    b -= b.grad * 1e-5
    w.grad.zero_()
    b.grad.zero_()


* We use `torch.no_grad` to indicate to PyTorch that we shouldn't track, calculate or modify gradients while updating the weights and biases. 

* We multiply the gradients with a *learning rate* of the algorithm.

* After we have updated the weights, we reset the gradients back to zero, to avoid affecting any future computations.

In [41]:
print(w)
print(b)

tensor([[-0.1592, -0.2592,  0.7338],
        [ 0.4247,  0.3097,  0.2470]], requires_grad=True)
tensor([ 0.2941, -0.7256], requires_grad=True)


In [42]:
# Calculate loss
preds = model(inputs)
loss = mse(preds, targets)
print(loss)

tensor(3633.2859, grad_fn=<DivBackward0>)


## Train for multiple epochs


In [50]:
# Train for 500 epochs
for i in range(500):
    preds = model(inputs)
    loss = mse(preds, targets)
    loss.backward()
    with torch.no_grad():
        w -= w.grad * 1e-5
        b -= b.grad * 1e-5
        w.grad.zero_()
        b.grad.zero_()

In [51]:
# Calculate loss
preds = model(inputs)
loss = mse(preds, targets)
print(loss)

tensor(1.6790, grad_fn=<DivBackward0>)


In [52]:
# Predictions
preds

tensor([[ 57.1435,  70.2951],
        [ 82.7697,  99.8428],
        [117.4487, 134.8602],
        [ 20.7309,  37.5597],
        [103.1616, 117.2373]], grad_fn=<AddBackward0>)

In [53]:
# Targets
targets

tensor([[ 56.,  70.],
        [ 81., 101.],
        [119., 133.],
        [ 22.,  37.],
        [103., 119.]])