<a href="https://colab.research.google.com/github/BKeita-collab/PDM/blob/main/Assignment_PDM7_2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Gradient Descent and Linear Regression with PyTorch

As you go through this notebook, you will find a **???** in certain places. Your job is to replace the **???** with appropriate code or values, to ensure that the notebook runs properly end-to-end.


Before we begin, we need to install the required libraries. The installation of PyTorch may differ based on your operating system / cloud environment. You can find detailed installation instructions here: https://pytorch.org .

## Introduction to Linear Regression

First the foundational algorithms in machine learning: *Linear regression*. You will create a model that predicts crop yields for apples and oranges (*target variables*) by looking at the average temperature, rainfall, and humidity (*input variables or features*) in a region. Here's the training data:

![linear-regression-training-data](https://i.imgur.com/6Ujttb4.png)

In a linear regression model, each target variable is estimated to be a weighted sum of the input variables, offset by some constant, known as a bias :

```
yield_apple  = w11 * temp + w12 * rainfall + w13 * humidity + b1
yield_orange = w21 * temp + w22 * rainfall + w23 * humidity + b2
```

Visually, it means that the yield of apples is a linear or planar function of temperature, rainfall and humidity:

![linear-regression-graph](https://i.imgur.com/4DJ9f8X.png)

The *learning* part of linear regression is to figure out a set of weights `w11, w12,... w23, b1 & b2` using the training data, to make accurate predictions for new data. The _learned_ weights will be used to predict the yields for apples and oranges in a new region using the average temperature, rainfall, and humidity for that region. 

You will _train_ our model by adjusting the weights slightly many times to make better predictions, using an optimization technique called *gradient descent*. Let's begin by importing Numpy and PyTorch.

In [2]:
import numpy as np
import torch

## Training data

You can represent the training data using two matrices: `inputs` and `targets`, each with one row per observation, and one column per variable.

In [3]:
# Input (temp, rainfall, humidity)
inputs = np.array([[73, 67, 43], 
                   [91, 88, 64], 
                   [87, 134, 58], 
                   [102, 43, 37], 
                   [69, 96, 70]], dtype='float32')

In [4]:
# Targets (apples, oranges)
targets = np.array([[56, 70], 
                    [81, 101], 
                    [119, 133], 
                    [22, 37], 
                    [103, 119]], dtype='float32')

The input and target variables have been separated because you will operate on them separately.

Let's convert the arrays to PyTorch tensors.

In [5]:
# Convert inputs and targets to tensors
inputs = torch.from_numpy(inputs)
targets = torch.from_numpy(targets)
print(inputs.shape)
print(targets)

torch.Size([5, 3])
tensor([[ 56.,  70.],
        [ 81., 101.],
        [119., 133.],
        [ 22.,  37.],
        [103., 119.]])


## Linear regression model from scratch

The weights and biases (`w11, w12,... w23, b1 & b2`) can also be represented as matrices, initialized as random values. The first row of `w` and the first element of `b` are used to predict the first target variable, i.e., yield of apples, and similarly, the second for oranges.

In [6]:
# Weights and biases
w = torch.randn(2,3 , requires_grad=True)
b = torch.randn(2, requires_grad=True)
print(w)
print(b.T)

tensor([[-0.7384, -0.8493, -0.8118],
        [ 1.8652,  0.2408, -1.9276]], requires_grad=True)
tensor([ 0.1275, -0.7642], grad_fn=<PermuteBackward0>)


`torch.randn` creates a tensor with the given shape, with elements picked randomly from a [normal distribution](https://en.wikipedia.org/wiki/Normal_distribution) with mean 0 and standard deviation 1.

Our *model* is simply a function that performs a matrix multiplication of the `inputs` and the weights `w` (transposed) and adds the bias `b` (replicated for each observation).

![matrix-mult](https://i.imgur.com/WGXLFvA.png)

We can define the model as follows:

In [7]:
# torch.reshape(x @ w.t() + b, (-1,1))
def model(x):
    return x @ w.T + b

## Train the model using gradient descent

As seen above, we reduce the loss and improve our model using the gradient descent optimization algorithm. Thus, we can _train_ the model using the following steps:

1. Generate predictions

2. Calculate the loss

3. Compute gradients w.r.t the weights and biases

4. Adjust the weights by subtracting a small quantity proportional to the gradient

5. Reset the gradients to zero

Let's implement the above step by step.

In [8]:
print('size of input = {}, w = {} and b = {}'.format(inputs.shape, w.shape, b.shape))

size of input = torch.Size([5, 3]), w = torch.Size([2, 3]) and b = torch.Size([2])


In [9]:
# Generate predictions
preds = model(inputs)
print(preds)

tensor([[-145.5931,   68.6492],
        [-193.7695,   66.8022],
        [-225.0146,   81.9852],
        [-141.7527,  128.5263],
        [-189.1894,   16.1284]], grad_fn=<AddBackward0>)


In [16]:
# Calculate the loss
def mse(preds, targets):
    error = preds - targets
    mse = torch.sum(error ** 2) / error.numel()
    return mse

loss = mse(preds, targets)
print(loss)

tensor(36940.7148, grad_fn=<DivBackward0>)


## Train for multiple epochs

To reduce the loss further, we can repeat the process of adjusting the weights and biases using the gradients multiple times. Each iteration is called an _epoch_. Let's train the model for 100 epochs.

In [17]:
# Train for 100 epochs
learning_rate = 1e-5

for i in range(100):
    # compute the prediction
    preds = model(inputs)
    # compute the loss
    loss = mse(preds, targets)
    # What does the following line do ?
    loss.backward()
    # what to do next
    with torch.no_grad():
      w -= w.grad * learning_rate
      b -= b.grad * learning_rate
      w.grad.zero_()
      b.grad.zero_()


Once again, let's verify that the loss is now lower:

In [18]:
# Calculate loss
preds = model(inputs)
loss = mse(preds, targets)
print(loss)

tensor(851.0043, grad_fn=<DivBackward0>)


The loss is now much lower than its initial value. Let's look at the model's predictions and compare them with the targets.

In [19]:
# Predictions
preds

tensor([[ 61.7426,  81.5097],
        [ 81.0986,  90.1357],
        [113.8552, 138.9080],
        [ 47.1276, 101.0369],
        [ 84.8203,  63.2995]], grad_fn=<AddBackward0>)

In [20]:
# Targets
targets

tensor([[ 56.,  70.],
        [ 81., 101.],
        [119., 133.],
        [ 22.,  37.],
        [103., 119.]])

The predictions are now quite close to the target variables. We can get even better results by training for a few more epochs. 

## How would you do with the pytorch build in functions ?

Hint: use the nn.Linear function and mse_loss function of pytorch

Hint2: don't forgte the Tensordataset and the dataloader 

In [21]:
# Utility function to train the model
def fit(num_epochs, model, loss_fn, opt, train_dl):
    
    # Repeat for given number of epochs
    for epoch in range(num_epochs):
        
        # Train with batches of data
        for xb,yb in train_dl:
            
            # 1. Generate predictions
            pred = model(xb)
            
            # 2. Calculate loss
            loss = loss_fn(pred, yb)
            
            # 3. Compute gradients
            loss.backward()
            
            # 4. Update parameters using gradients
            opt.step()
            
            # 5. Reset the gradients to zero
            opt.zero_grad()
        
        # Print the progress
        if (epoch+1) % 10 == 0:
            print('Epoch [{}/{}], Loss: {:.4f}'.format(epoch+1, num_epochs, loss.item()))

Some things to note above:

* We use the data loader defined earlier to get batches of data for every iteration.

* Instead of updating parameters (weights and biases) manually, we use `opt.step` to perform the update and `opt.zero_grad` to reset the gradients to zero.

* We've also added a log statement that prints the loss from the last batch of data for every 10th epoch to track training progress. `loss.item` returns the actual value stored in the loss tensor.

Let's train the model for 100 epochs.

In [22]:
import torch.nn.functional as F 
import torch.nn as nn

In [23]:
## creation of training and test dataset 
batch_size = 5
train_ds = torch.utils.data.TensorDataset(inputs, targets)
train_dl = torch.utils.data.DataLoader(train_ds, batch_size, shuffle=True)

In [24]:
loss_fn = F.mse_loss
model = nn.Linear(3, 2)

opt = torch.optim.SGD(model.parameters(), lr=1e-5)

In [25]:
fit(100, model, loss_fn, opt, train_dl)

Epoch [10/100], Loss: 850.6973
Epoch [20/100], Loss: 603.5648
Epoch [30/100], Loss: 531.2391
Epoch [40/100], Loss: 470.3055
Epoch [50/100], Loss: 416.6523
Epoch [60/100], Loss: 369.3583
Epoch [70/100], Loss: 327.6643
Epoch [80/100], Loss: 290.9031
Epoch [90/100], Loss: 258.4865
Epoch [100/100], Loss: 229.8971


Let's generate predictions using our model and verify that they're close to our targets.

In [26]:
# Generate predictions
preds = model(inputs)
preds

tensor([[ 62.4071,  74.7856],
        [ 83.5678,  99.8720],
        [107.1723, 127.5762],
        [ 50.8556,  62.9362],
        [ 87.0088, 102.5644]], grad_fn=<AddmmBackward0>)

In [27]:
# Compare with targets
targets

tensor([[ 56.,  70.],
        [ 81., 101.],
        [119., 133.],
        [ 22.,  37.],
        [103., 119.]])

Indeed, the predictions are quite close to our targets. We have a trained a reasonably good model to predict crop yields for apples and oranges by looking at the average temperature, rainfall, and humidity in a region. We can use it to make predictions of crop yields for new regions by passing a batch containing a single row of input.

In [28]:
model(torch.tensor([[75, 63, 44.]]))

tensor([[60.7363, 72.8951]], grad_fn=<AddmmBackward0>)

Comment the results... Which amount of orange and apples does it predict ?
60 apples and 72 orange