## Gradient Descent and Linear Regression with PyTorch

Before we begin, we need to install the required libraries. The installation of PyTorch may differ based on your operating system / cloud environment. You can find detailed installation instructions here: https://pytorch.org .

## Introduction to Linear Regression

First the foundational algorithms in machine learning: *Linear regression*. We will create a model that predicts crop yields for apples and oranges (*target variables*) by looking at the average temperature, rainfall, and humidity (*input variables or features*) in a region. Here's the training data:

![linear-regression-training-data](https://i.imgur.com/6Ujttb4.png)

In a linear regression model, each target variable is estimated to be a weighted sum of the input variables, offset by some constant, known as a bias :

```
yield_apple  = w11 * temp + w12 * rainfall + w13 * humidity + b1
yield_orange = w21 * temp + w22 * rainfall + w23 * humidity + b2
```

Visually, it means that the yield of apples is a linear or planar function of temperature, rainfall and humidity:

![linear-regression-graph](https://i.imgur.com/4DJ9f8X.png)

The *learning* part of linear regression is to figure out a set of weights `w11, w12,... w23, b1 & b2` using the training data, to make accurate predictions for new data. The _learned_ weights will be used to predict the yields for apples and oranges in a new region using the average temperature, rainfall, and humidity for that region. 

We will _train_ our model by adjusting the weights slightly many times to make better predictions, using an optimization technique called *gradient descent*. Let's begin by importing Numpy and PyTorch.

In [1]:
import numpy as np
import torch

## Training data

We can represent the training data using two matrices: `inputs` and `targets`, each with one row per observation, and one column per variable.

In [2]:
# Input (temp, rainfall, humidity)
inputs = np.array([[73, 67, 43], 
                   [91, 88, 64], 
                   [87, 134, 58], 
                   [102, 43, 37], 
                   [69, 96, 70]], dtype='float32')

In [3]:
# Targets (apples, oranges)
targets = np.array([[56, 70], 
                    [81, 101], 
                    [119, 133], 
                    [22, 37], 
                    [103, 119]], dtype='float32')

The input and target variables have been separated because we will operate on them separately.

Let's convert the arrays to PyTorch tensors. Tensors are used to store the input data, they can also store the weights and biases. Tensors are designed for hardware acceleration and they take care of backpropagation with automatic differentiation.

In [4]:
# Convert inputs and targets to tensors
inputs = torch.from_numpy(inputs)
targets = torch.from_numpy(targets)
print(inputs)
print(targets)

tensor([[ 73.,  67.,  43.],
        [ 91.,  88.,  64.],
        [ 87., 134.,  58.],
        [102.,  43.,  37.],
        [ 69.,  96.,  70.]])
tensor([[ 56.,  70.],
        [ 81., 101.],
        [119., 133.],
        [ 22.,  37.],
        [103., 119.]])


## Linear regression model from scratch

The weights and biases (`w11, w12,... w23, b1 & b2`) can also be represented as matrices, initialized as random values. The first row of `w` and the first element of `b` are used to predict the first target variable, i.e., yield of apples, and similarly, the second for oranges.

In [5]:
# Weights and biases
w = torch.randn(2, 3, requires_grad=True)
b = torch.randn(2, requires_grad=True)
print(w)
print(b)

tensor([[-0.6104, -0.1047, -0.1037],
        [-0.6573, -0.5032,  0.4781]], requires_grad=True)
tensor([ 0.7999, -1.1757], requires_grad=True)


`torch.randn` creates a tensor with the given shape, with elements picked randomly from a [normal distribution](https://en.wikipedia.org/wiki/Normal_distribution) with mean 0 and standard deviation 1.

Our *model* is simply a function that performs a matrix multiplication of the `inputs` and the weights `w` (transposed) and adds the bias `b` (replicated for each observation).

![matrix-mult](https://i.imgur.com/WGXLFvA.png)

We can define the model as follows:

In [6]:
def model(x):
    return torch.matmul(x, w.t()) + b

## Train the model using gradient descent

As seen above, we reduce the loss and improve our model using the gradient descent optimization algorithm. Thus, we can _train_ the model using the following steps:

1. Generate predictions

2. Calculate the loss

3. Compute gradients w.r.t the weights and biases

4. Adjust the weights by subtracting a small quantity proportional to the gradient

5. Reset the gradients to zero

Let's implement the above step by step.

In [7]:
# Generate predictions
preds = model(inputs)
print(preds)

tensor([[-55.2292, -62.3154],
        [-70.5920, -74.6742],
        [-72.3427, -98.0612],
        [-69.7954, -72.1684],
        [-58.6238, -61.3710]], grad_fn=<AddBackward0>)


In [8]:
# Calculate the loss
def mse(preds, targets):
    return torch.mean((preds - targets) ** 2)

loss = mse(preds, targets)
print(loss)

tensor(25272.2266, grad_fn=<MeanBackward0>)


## Train for multiple epochs

To reduce the loss further, we can repeat the process of adjusting the weights and biases using the gradients multiple times. Each iteration is called an _epoch_. Let's train the model for 100 epochs.

In [9]:
# Train for 100 epochs
learning_rate = 1e-5

for i in range(100):
    # compute the prediction
    preds = model(inputs)
    # compute the loss
    loss = mse(preds, targets)
    # backpropagate (compute the gradient of the loss function) 
    loss.backward()
    # disable gradient calculation 
    with torch.no_grad():
        #update weights and bias
        w -= w.grad * learning_rate
        b -= b.grad * learning_rate
        w.grad.zero_()
        b.grad.zero_()


Once again, let's verify that the loss is now lower:

In [10]:
# Calculate loss
preds = model(inputs)
loss = mse(preds, targets)
print(loss)
#Output: tensor(63.2188, grad_fn=<MeanBackward0>)

tensor(63.2188, grad_fn=<MeanBackward0>)


The loss is now much lower than its initial value. Let's look at the model's predictions and compare them with the targets.

In [11]:
# Predictions
preds

tensor([[ 59.7533,  72.5421],
        [ 81.7903, 103.6481],
        [115.4822, 122.5922],
        [ 35.1220,  51.2776],
        [ 93.1611, 115.7651]], grad_fn=<AddBackward0>)

In [12]:
# Targets
targets

tensor([[ 56.,  70.],
        [ 81., 101.],
        [119., 133.],
        [ 22.,  37.],
        [103., 119.]])

The predictions are now quite close to the target variables. We can get even better results by training for a few more epochs. 

## The use of Pytorch build in functions

In [13]:
from torch.utils.data import TensorDataset, DataLoader
# define a dataset
dataset = TensorDataset(inputs, targets)
# define a dataloader
train_dl = DataLoader(dataset, batch_size=5)

In [14]:
# define the model
from torch import nn
model = nn.Linear(3, 2)

In [15]:
# define the loss function (with the pytorch library)
loss_fn = nn.MSELoss()

In [16]:
# Utility function to train the model
def fit(num_epochs, model, loss_fn, opt, train_dl):
    
    # Repeat for given number of epochs
    for epoch in range(num_epochs):
        
        # Train with batches of data
        for xb,yb in train_dl:
            
            # 1. Generate predictions
            pred = model(xb)
            
            # 2. Calculate loss
            loss = loss_fn(pred, yb)
            
            # 3. Compute gradients
            loss.backward()
            
            # 4. Update parameters using gradients
            opt.step()
            
            # 5. Reset the gradients to zero
            opt.zero_grad()
        
        # Print the progress
        if (epoch+1) % 10 == 0:
            print('Epoch [{}/{}], Loss: {:.4f}'.format(epoch+1, num_epochs, loss.item()))

Some things to note above:

* We use the data loader defined earlier to get batches of data for every iteration.

* Instead of updating parameters (weights and biases) manually, we use `opt.step` to perform the update and `opt.zero_grad` to reset the gradients to zero.

* We've also added a log statement that prints the loss from the last batch of data for every 10th epoch to track training progress. `loss.item` returns the actual value stored in the loss tensor.

Let's train the model for 100 epochs.

In [20]:
from torch import optim
opt = optim.SGD(model.parameters(), lr=learning_rate)

In [21]:
fit(100, model, loss_fn, opt, train_dl)

Epoch [10/100], Loss: 606.1872
Epoch [20/100], Loss: 330.1665
Epoch [30/100], Loss: 287.4241
Epoch [40/100], Loss: 253.6417
Epoch [50/100], Loss: 223.9557
Epoch [60/100], Loss: 197.8042
Epoch [70/100], Loss: 174.7640
Epoch [80/100], Loss: 154.4640
Epoch [90/100], Loss: 136.5773
Epoch [100/100], Loss: 120.8163


Let's generate predictions using our model and verify that they're close to our targets.

In [22]:
# Generate predictions
preds = model(inputs)
preds

tensor([[ 61.7957,  72.3059],
        [ 83.6548, 101.0742],
        [107.9572, 128.8286],
        [ 47.3774,  48.7076],
        [ 89.1850, 112.9808]], grad_fn=<AddmmBackward0>)

In [23]:
# Compare with targets
targets

tensor([[ 56.,  70.],
        [ 81., 101.],
        [119., 133.],
        [ 22.,  37.],
        [103., 119.]])

Indeed, the predictions are quite close to our targets. We have a trained a reasonably good model to predict crop yields for apples and oranges by looking at the average temperature, rainfall, and humidity in a region. We can use it to make predictions of crop yields for new regions by passing a batch containing a single row of input.

In [24]:
model(torch.tensor([[75, 63, 44.]]))
#Output: tensor([[59.9864, 70.1107]]

tensor([[59.9864, 70.1107]], grad_fn=<AddmmBackward0>)

To conclude, the linear regression model was trained to predict the crop yields for apples and oranges using 3 input parameters. We observed a strong correlation between predictions and target results; however, they do not entirely coincide. For example, in the 4th case, the model predicted twice the amount of the actual yield.

Based on a single row of new data, the model predicted the yield of apples to be almost 60 and the yield of oranges to be 70.