<a href="https://colab.research.google.com/github/Anubhav3084/Deep-Learning/blob/main/PyTorch_Tutorial/linear_regression.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Linear Regression**


*   yeild_apple = w11 x temp + w12 x rainfall + w13 x humidity + b1
*   yeild_orange = w21 x temp + w22 x rainfall + w23 x humidity + b1

In [None]:
import numpy as np
import torch

In [None]:
# inputs (temp, rainfall, humidity)
inputs = np.array([[73, 67, 43],
                   [91, 88, 64],
                   [87, 134, 58],
                   [102, 43, 37],
                   [69, 96, 70]], dtype='float32')
inputs

array([[ 73.,  67.,  43.],
       [ 91.,  88.,  64.],
       [ 87., 134.,  58.],
       [102.,  43.,  37.],
       [ 69.,  96.,  70.]], dtype=float32)

In [None]:
# targets (apples, oranges)
targets = np.array([[56, 70],
                    [81, 101],
                    [119, 133],
                    [22, 37],
                    [103, 119]], dtype='float32')
targets

array([[ 56.,  70.],
       [ 81., 101.],
       [119., 133.],
       [ 22.,  37.],
       [103., 119.]], dtype=float32)

In [None]:
# convert to tensors 
inputs = torch.Tensor(inputs)
targets = torch.Tensor(targets)
print(inputs)
print(targets)

tensor([[ 73.,  67.,  43.],
        [ 91.,  88.,  64.],
        [ 87., 134.,  58.],
        [102.,  43.,  37.],
        [ 69.,  96.,  70.]])
tensor([[ 56.,  70.],
        [ 81., 101.],
        [119., 133.],
        [ 22.,  37.],
        [103., 119.]])


In [None]:
# weights and biases
w = torch.randn(2, 3, requires_grad=True)
b = torch.randn(2, requires_grad=True)
print(w)
print(b)

tensor([[-0.6334, -2.4306, -0.6939],
        [-0.7253, -0.3460, -0.0641]], requires_grad=True)
tensor([-0.2577, -0.8253], requires_grad=True)


We create our model for predicting
`@` represents matrix multiplication in PyTorch
`.t()` returns the transpose of the matrix

In [None]:
def model(x):
  return x @ w.t() + b

In [None]:
preds = model(inputs)
print(preds)

tensor([[-239.1823,  -79.7169],
        [-316.1978, -101.3862],
        [-421.3069, -114.0183],
        [-195.0533,  -92.0624],
        [-325.8714,  -88.5814]], grad_fn=<AddBackward0>)


In [None]:
print(targets)

tensor([[ 56.,  70.],
        [ 81., 101.],
        [119., 133.],
        [ 22.,  37.],
        [103., 119.]])


## **Loss Function**

In [None]:
def mse(t1, t2):
  diff = t1-t2
  return torch.sum(diff*diff)/diff.numel()

In [None]:
loss = mse(preds, targets)
loss

tensor(95201.3594, grad_fn=<DivBackward0>)

## **Computing gradient**

In [None]:
loss.backward()

In [None]:
print(w)
print(w.grad)

tensor([[-0.6334, -2.4306, -0.6939],
        [-0.7253, -0.3460, -0.0641]], requires_grad=True)
tensor([[-31286.3164, -35527.3438, -21500.6543],
        [-15664.9121, -17283.7949, -10604.7236]])


In [None]:
print(b)
print(b.grad)

tensor([-0.2577, -0.8253], requires_grad=True)
tensor([-375.7224, -187.1531])


We dont' need to keep track of gradient here. So we are using `.no_grad()` function

In [None]:
with torch.no_grad():
  w -= w.grad * 1e-5
  b -= b.grad * 1e-5

In [None]:
# verify that the loss decreased
preds = model(inputs)
loss = mse(preds, targets)
loss

tensor(64608.0430, grad_fn=<DivBackward0>)

We need to reset the gradient to zero by invoking `.zero_()` method.
Otherwise the next time we invoke `.backward()`to loss then it will add the values to existing gradient

In [None]:
w.grad.zero_()
b.grad.zero_()
print(w.grad)
print(b.grad)

tensor([[0., 0., 0.],
        [0., 0., 0.]])
tensor([0., 0.])


## **Train for Multiple Epochs**

In [None]:
for i in range(500):
  preds = model(inputs)
  loss = mse(preds, targets)
  loss.backward()
  with torch.no_grad():
    w -= w.grad * 1e-5
    b -= b.grad * 1e-5
    w.grad.zero_()
    b.grad.zero_()

Once again let's verify that now the loss is very lower

In [None]:
preds = model(inputs)
loss = mse(preds, targets)
print(loss)
preds

tensor(5.7484, grad_fn=<DivBackward0>)


tensor([[ 57.8161,  70.5323],
        [ 82.6219,  99.2217],
        [116.7020, 135.8913],
        [ 25.1034,  38.9462],
        [100.2465, 115.3301]], grad_fn=<AddBackward0>)

# **Linear regression using PyTorch built-ins**

In [None]:
import torch.nn as nn

In [None]:
inputs = np.array([[73, 67, 43], 
                   [91, 88, 64], 
                   [87, 134, 58], 
                   [102, 43, 37], 
                   [69, 96, 70], 
                   [74, 66, 43], 
                   [91, 87, 65], 
                   [88, 134, 59], 
                   [101, 44, 37], 
                   [68, 96, 71], 
                   [73, 66, 44], 
                   [92, 87, 64], 
                   [87, 135, 57], 
                   [103, 43, 36], 
                   [68, 97, 70]], 
                  dtype='float32')

# Targets (apples, oranges)
targets = np.array([[56, 70], 
                    [81, 101], 
                    [119, 133], 
                    [22, 37], 
                    [103, 119],
                    [57, 69], 
                    [80, 102], 
                    [118, 132], 
                    [21, 38], 
                    [104, 118], 
                    [57, 69], 
                    [82, 100], 
                    [118, 134], 
                    [20, 38], 
                    [102, 120]], 
                   dtype='float32')

inputs = torch.from_numpy(inputs)
targets = torch.from_numpy(targets)

## **Dataset and DataLoader**
We'll create a `TensorDataSet`, which allows access to rows from `inputs` and `targets` as tuples, and provide standard APIs for working with many different types of datasets in PyTorch.

In [None]:
from torch.utils.data import TensorDataset

In [None]:
train_ds = TensorDataset(inputs, targets)
train_ds[0:3]

(tensor([[ 73.,  67.,  43.],
         [ 91.,  88.,  64.],
         [ 87., 134.,  58.]]), tensor([[ 56.,  70.],
         [ 81., 101.],
         [119., 133.]]))

* The `TensorDataSet` allows us to access a small section of the training data using array indexing notation (`[0:3]`).
* In the above code it, returns a tuple with two elements. The first element is input and 2nd for targets for selected rows.
* We'll also create `DataLoader`, which can split the data into batches of a predefined size while training. 
* It also provides other utilities like shuffling and random sampling of the data.

In [None]:
from torch.utils.data import DataLoader

In [None]:
batch_size = 5
train_dl = DataLoader(train_ds, batch_size, shuffle=True)

In [None]:
for xb, yb in train_dl:
  print(xb)
  print(yb)
  break

tensor([[ 88., 134.,  59.],
        [ 92.,  87.,  64.],
        [103.,  43.,  36.],
        [ 68.,  96.,  71.],
        [ 87., 134.,  58.]])
tensor([[118., 132.],
        [ 82., 100.],
        [ 20.,  38.],
        [104., 118.],
        [119., 133.]])


* In each Iteration, the data loader returns one batch of data with the given batch size.
* `shuffle=True` shuffles the training data before creating batches.
* It helps randomize the input to the optimization algorithm, leading to a faster rduction in the loss.

## **nn.Linear**
Instead of initializing the weights and biases manually, we can define the model using the `nn.Linear` class from PyTorch, which does it automatically.

In [None]:
# define model
model = nn.Linear(3,2)
print(model.weight)
print(model.bias)

Parameter containing:
tensor([[-0.4555,  0.3598,  0.1509],
        [ 0.4053, -0.2122,  0.3945]], requires_grad=True)
Parameter containing:
tensor([-0.5450, -0.1811], requires_grad=True)


Pytorch models also have a helpful `.parameters` method, which returns a list containing all the weights and bias matrices present in the model. For our linear regression model, we have one weight matrix and one bias matrix.

In [None]:
# parameters
list(model.parameters())

[Parameter containing:
 tensor([[-0.4555,  0.3598,  0.1509],
         [ 0.4053, -0.2122,  0.3945]], requires_grad=True),
 Parameter containing:
 tensor([-0.5450, -0.1811], requires_grad=True)]

In [None]:
preds = model(inputs)
preds

tensor([[ -3.2035,  32.1481],
        [ -0.6786,  43.2712],
        [ 16.7890,  29.5201],
        [-25.9540,  46.6268],
        [ 13.1265,  35.0250],
        [ -4.0188,  32.7656],
        [ -0.8875,  43.8779],
        [ 16.4843,  30.3199],
        [-25.1387,  46.0093],
        [ 13.7329,  35.0143],
        [ -3.4124,  32.7549],
        [ -1.4939,  43.8886],
        [ 16.9979,  28.9133],
        [-26.5604,  46.6375],
        [ 13.9419,  34.4075]], grad_fn=<AddmmBackward0>)

## **Loss Function**
Instead of defining a loss function manually, we can use the built-in loss function `mse_loss`.

In [None]:
import torch.nn.functional as F

In [None]:
loss_fn = F.mse_loss

In [None]:
loss = loss_fn(model(inputs), targets)
print(loss)

tensor(5336.6011, grad_fn=<MseLossBackward0>)


## **Optimizer**
* Instead of manually manipulating the model's weights and biases using gradients, we can use the optimizer `optim.SGD`
* SGD is shirt for "*stochastic gradient descent*" 
* The term *stochastic* indicates that the samples are selected in random batches instead of as a single group

In [None]:
# define optimizer
opt = torch.optim.SGD(model.parameters(), lr=1e-5)

Note that `model.parameters()` is passed as an argement to `optim.SGD` so that the optimizer knows which matrics should be modiied during the update step. Also, we can specify a learning rate that controls the amount by which the parameters are modified.

## **Train the model**
We are now ready to train the model. We'll follow the same process to implement gradient descent:
1. Generate predictions
2. Calculate the loss
3. Compute gradients w.r.t the weights and biases
4. Adjust the weights by substracting a samll quantity proportional to the gradient
5. Reset the gradients to zero.

The only change is the we'll work on batches of data instead of processing the entire training data in every iteration. Let's define a utility function `fit` that trains the model for a given number of epochs.

In [None]:
# utility function to train the model
def fit(num_epochs, model, loss_fn, opt, train_dl):
  
  # repeat for given number of epochs
  for epoch in range(num_epochs):

    # trains with batches of data
    for xb, yb in train_dl:

      # 1. Generate predictions
      pred = model(xb)

      # 2. Calculate the loss
      loss = loss_fn(pred, yb)
      
      # 3. Compute gradients w.r.t the weights and biases
      loss.backward()
      
      # 4. Update parameters using gradients
      opt.step()

      # 5. Reset the gradients to zero.
      opt.zero_grad()

    # print the progress
    if(epoch+1) % 10 == 0:
      print('Epoch [{}/{}], Loss: {:.4f}'.format(epoch+1, num_epochs, loss.item()))

Some things to note above:
* We use the data loader earlier to get batches of data for every iteration.
* Instead of updating parameters (weights and biases) manually, we use `opt.step` to perform the update and `opt.zero_grad` to reset the gradients to zero.
* `loss.item` returns the actual value stored in the loss tensor.

In [None]:
fit(100, model, loss_fn, opt, train_dl)

Epoch [10/100], Loss: 362.7525
Epoch [20/100], Loss: 225.6245
Epoch [30/100], Loss: 255.9174
Epoch [40/100], Loss: 192.4297
Epoch [50/100], Loss: 83.0999
Epoch [60/100], Loss: 53.7823
Epoch [70/100], Loss: 21.5872
Epoch [80/100], Loss: 14.6056
Epoch [90/100], Loss: 4.1818
Epoch [100/100], Loss: 11.5203


In [None]:
preds = model(inputs)
preds

tensor([[ 57.3666,  71.7949],
        [ 80.7312, 100.3341],
        [120.2504, 130.5171],
        [ 23.7869,  46.3426],
        [ 97.7741, 113.1330],
        [ 56.1554,  70.9384],
        [ 80.3240, 100.3134],
        [120.4226, 131.1196],
        [ 24.9982,  47.1991],
        [ 98.5782, 113.9687],
        [ 56.9595,  71.7742],
        [ 79.5199,  99.4776],
        [120.6576, 130.5378],
        [ 22.9828,  45.5069],
        [ 98.9854, 113.9894]], grad_fn=<AddmmBackward0>)

In [None]:
targets

tensor([[ 56.,  70.],
        [ 81., 101.],
        [119., 133.],
        [ 22.,  37.],
        [103., 119.],
        [ 57.,  69.],
        [ 80., 102.],
        [118., 132.],
        [ 21.,  38.],
        [104., 118.],
        [ 57.,  69.],
        [ 82., 100.],
        [118., 134.],
        [ 20.,  38.],
        [102., 120.]])