#### Common Basics of Linear Regression and why we're doing it

In this tutorial, we'll discuss one of the foundational algorithms in machine learning: *Linear regression*. We'll create a model that predicts crop yields for apples and oranges (*target variables*) by looking at the average temperature, rainfall, and humidity (*input variables or features*) in a region. Here's the training data:

![linear-regression-training-data](https://i.imgur.com/6Ujttb4.png)

In a linear regression model, each target variable is estimated to be a weighted sum of the input variables, offset by some constant, known as a bias :

```
yield_apple  = w11 * temp + w12 * rainfall + w13 * humidity + b1
yield_orange = w21 * temp + w22 * rainfall + w23 * humidity + b2
```

Visually, it means that the yield of apples is a linear or planar function of temperature, rainfall and humidity:

![linear-regression-graph](https://i.imgur.com/4DJ9f8X.png)

The *learning* part of linear regression is to figure out a set of weights `w11, w12,... w23, b1 & b2` using the training data, to make accurate predictions for new data. The _learned_ weights will be used to predict the yields for apples and oranges in a new region using the average temperature, rainfall, and humidity for that region. 

We'll _train_ our model by adjusting the weights slightly many times to make better predictions, using an optimization technique called *gradient descent*. Let's begin by importing Numpy and PyTorch.

In [1]:
import numpy as np
import torch

# Training Data
in this what we do is that we will make 2 arrays of independent (aka inputs) and target variables.

In [6]:
# Input (temp, rainfall, humidity) from above table.
inputs = np.array([[73, 67, 43], 
                   [91, 88, 64], 
                   [87, 134, 58], 
                   [102, 43, 37], 
                   [69, 96, 70]], dtype='float32')

In [7]:
# Targets (apples, oranges) from above table
targets = np.array([[56, 70], 
                    [81, 101], 
                    [119, 133], 
                    [22, 37], 
                    [103, 119]], dtype='float32')

In [8]:
# Convert numpy arrays to pytorch tensors
inputs = torch.from_numpy(inputs)
targets = torch.from_numpy(targets)

In [9]:
inputs,targets

(tensor([[ 73.,  67.,  43.],
         [ 91.,  88.,  64.],
         [ 87., 134.,  58.],
         [102.,  43.,  37.],
         [ 69.,  96.,  70.]]),
 tensor([[ 56.,  70.],
         [ 81., 101.],
         [119., 133.],
         [ 22.,  37.],
         [103., 119.]]))

Now our weight kind of look like a 2X3 matrix  and our constants b1 and b2 look like vectors(arrays) <br>
We have no idea where to start so let us assign them with random vals

In [10]:
w = torch.rand(2,3,requires_grad=True)
b = torch.rand(2,requires_grad=True)
w,b

(tensor([[0.8489, 0.6545, 0.4170],
         [0.0779, 0.9282, 0.1102]], requires_grad=True),
 tensor([0.4345, 0.4415], requires_grad=True))

`torch.randn` creates a tensor with the given shape, with elements picked randomly from a [normal distribution](https://en.wikipedia.org/wiki/Normal_distribution) with mean 0 and standard deviation 1.

Our *model* is simply a function that performs a matrix multiplication of the `inputs` and the weights `w` (transposed) and adds the bias `b` (replicated for each observation).

![matrix-mult](https://i.imgur.com/WGXLFvA.png)

We can define the model as follows:

In [12]:
def model(x):
    return x @ w.t() + b

# what we return from the function will be the predicted values of the target. 

In [15]:
preds = model(inputs)
# compare preds with actual target
preds, targets
# it wont match because our preds are using random weights and constants

(tensor([[124.1837,  73.0576],
         [161.9645,  96.2668],
         [186.1733, 137.9933],
         [130.5915,  52.3771],
         [151.0271, 102.6406]], grad_fn=<AddBackward0>),
 tensor([[ 56.,  70.],
         [ 81., 101.],
         [119., 133.],
         [ 22.,  37.],
         [103., 119.]]))

### Calculating Loss
We use it to test how well our model actually is. Done by using MSE (mean squared error) formula usually.

In [18]:
def mse(t1,t2):
    diff = t1-t2
    return torch.sum(diff*diff)/diff.numel()


In [19]:
loss = mse(preds, targets)
loss

tensor(3037.6011, grad_fn=<DivBackward0>)

# Compute Gradients

In [21]:
loss.backward()

In [23]:
w, w.grad

(tensor([[0.8489, 0.6545, 0.4170],
         [0.0779, 0.9282, 0.1102]], requires_grad=True),
 tensor([[6515.8926, 5994.8887, 3877.8923],
         [ 133.3140,  -90.3685,  -91.6077]]))

In [24]:
b, b.grad

(tensor([0.4345, 0.4415], requires_grad=True), tensor([74.5880,  0.4671]))

# Adjust weights and biases under gradient descent

### Train the model using gradient descent

1. Generate predictions

2. Calculate the loss

3. Compute gradients w.r.t the weights and biases

4. Adjust the weights by subtracting a small quantity proportional to the gradient

5. Reset the gradients to zero

In [28]:
# 1
preds = model(inputs)
preds

tensor([[124.1837,  73.0576],
        [161.9645,  96.2668],
        [186.1733, 137.9933],
        [130.5915,  52.3771],
        [151.0271, 102.6406]], grad_fn=<AddBackward0>)

In [29]:
# 2
loss = mse(preds, targets)
loss

tensor(3037.6011, grad_fn=<DivBackward0>)

In [30]:
# 3
loss.backward()
w.grad, b.grad

(tensor([[13031.7852, 11989.7773,  7755.7847],
         [  266.6281,  -180.7370,  -183.2153]]),
 tensor([149.1760,   0.9342]))

In [31]:
# 4
with torch.no_grad():
    w -= w.grad*1e-5
    b -= b.grad*1e-5
    w.grad.zero_()
    b.grad.zero_()

In [32]:
w, b

(tensor([[0.7186, 0.5346, 0.3394],
         [0.0752, 0.9301, 0.1120]], requires_grad=True),
 tensor([0.4330, 0.4415], requires_grad=True))

In [33]:
# checking loss again
preds = model(inputs)
loss = mse(preds, targets)
loss

tensor(1501.1707, grad_fn=<DivBackward0>)

We can see that there has been a significant decrease in our loss value, by slight adjustments only

# Train for multiple Epochs

an epoch is an iteration of training the model after doing a small change on it, we will train the model using 100 epochs.

In [35]:
# training for 100 epochs 
for i in range(100):
    preds = model(inputs)
    loss = mse(preds,targets)
    loss.backward()
    with torch.no_grad():
        w -= w.grad*1e-5
        b -= b.grad*1e-5
        w.grad.zero_()
        b.grad.zero_()


In [36]:
# Lets check loss again
preds = model(inputs)
loss = mse(preds, targets)
loss

tensor(145.4011, grad_fn=<DivBackward0>)

So we are quite close to out actual results. As we can see. We can train the model to be better by letting it run through more epochs. 

In [73]:
# after more epochs we have
preds = model(inputs)
loss = mse(preds, targets)
loss

tensor(0.6595, grad_fn=<DivBackward0>)

In [71]:
preds, targets

(tensor([[ 57.2562,  70.4212],
         [ 81.9959, 100.3047],
         [119.0279, 133.6108],
         [ 21.1696,  37.1921],
         [101.5859, 118.4862]], grad_fn=<AddBackward0>),
 tensor([[ 56.,  70.],
         [ 81., 101.],
         [119., 133.],
         [ 22.,  37.],
         [103., 119.]]))

# Linear Regression using PyTorch built in functions

What we did was basic matrix operations and it is so same everywhere that pyTorch has built in functions to make and train models for us.

In [76]:
import torch.nn as nn
# This is pytorch for building neural network

In [77]:
# Input (temp, rainfall, humidity)
inputs = np.array([[73, 67, 43],
                   [91, 88, 64],
                   [87, 134, 58],
                   [102, 43, 37],
                   [69, 96, 70],
                   [74, 66, 43],
                   [91, 87, 65],
                   [88, 134, 59],
                   [101, 44, 37],
                   [68, 96, 71],
                   [73, 66, 44],
                   [92, 87, 64],
                   [87, 135, 57],
                   [103, 43, 36],
                   [68, 97, 70]], dtype='float32')

# Targets (apples, oranges)
targets = np.array([[56, 70],
                    [81, 101],
                    [119, 133],
                    [22, 37],
                    [103, 119],
                    [57, 69],
                    [80, 102],
                    [118, 132],
                    [21, 38],
                    [104, 118],
                    [57, 69],
                    [82, 100],
                    [118, 134],
                    [20, 38],
                    [102, 120]], dtype='float32')

inputs = torch.from_numpy(inputs)
targets = torch.from_numpy(targets)


### DataSet and DataLoader

TensorDataset, it allows access to rows from inputs and targets as tuples, and provides standard APIs for working with many different types of datasets in PyTorch. <br>
TensorDataset allows us to access a small section of the training data using the array indexing notation ([0:3] in the above code). It returns a tuple with two elements. The first element contains the input variables for the selected rows, and the second contains the targets.

In [78]:
from torch.utils.data import TensorDataset

In [79]:
train_ds = TensorDataset(inputs,targets)
train_ds[0:3]

(tensor([[ 73.,  67.,  43.],
         [ 91.,  88.,  64.],
         [ 87., 134.,  58.]]),
 tensor([[ 56.,  70.],
         [ 81., 101.],
         [119., 133.]]))

DataLoader, it can split the data into batches of a predefined size while training. It also provides other utilities like shuffling and random sampling of the data.

In [80]:
from torch.utils.data import DataLoader

In [81]:
batch_size = 5
train_dl = DataLoader(train_ds, batch_size, shuffle=True) 

### nn.Linear

used to define weights and biases manually

In [91]:
model = nn.Linear(3,2) # it needs 2 arguments the num of inputs and num of outputs. 
print(model.weight)
print(model.bias)

Parameter containing:
tensor([[ 0.4944,  0.0873, -0.2832],
        [ 0.5323, -0.5287, -0.2924]], requires_grad=True)
Parameter containing:
tensor([0.4476, 0.5324], requires_grad=True)


### Generating Predictions

In [92]:
preds = model(inputs)
preds

tensor([[ 30.2068,  -8.6064],
        [ 34.9909, -16.2683],
        [ 38.7263, -40.9651],
        [ 44.1495,  21.2748],
        [ 23.1129, -33.9636],
        [ 30.6139,  -7.5454],
        [ 34.6204, -16.0320],
        [ 38.9375, -40.7252],
        [ 43.7424,  20.2138],
        [ 22.3353, -34.7883],
        [ 29.8363,  -8.3701],
        [ 35.3981, -15.2073],
        [ 39.0968, -41.2015],
        [ 44.9271,  22.0995],
        [ 22.7058, -35.0246]], grad_fn=<AddmmBackward0>)

### Loss function

In [93]:
import torch.nn.functional as F

In [94]:
loss_fn = F.mse_loss

In [95]:
loss = loss_fn(preds, targets)
loss

tensor(8990.0195, grad_fn=<MseLossBackward0>)

### Optimizer

In [99]:
opt = torch.optim.SGD(model.parameters(), lr=1e-5)

### Train the model

We are now ready to train the model. We'll follow the same process to implement gradient descent:

1. Generate predictions

2. Calculate the loss

3. Compute gradients w.r.t the weights and biases

4. Adjust the weights by subtracting a small quantity proportional to the gradient

5. Reset the gradients to zero

The only change is that we'll work batches of data instead of processing the entire training data in every iteration. Let's define a utility function `fit` that trains the model for a given number of epochs.

In [100]:
# Utility function to train the model
def fit(num_epochs, model, loss_fn, opt, train_dl):
    
    # Repeat for given number of epochs
    for epoch in range(num_epochs):
        
        # Train with batches of data
        for xb,yb in train_dl:
            
            # 1. Generate predictions
            pred = model(xb)
            
            # 2. Calculate loss
            loss = loss_fn(pred, yb)
            
            # 3. Compute gradients
            loss.backward()
            
            # 4. Update parameters using gradients
            opt.step()
            
            # 5. Reset the gradients to zero
            opt.zero_grad()
        
        # Print the progress
        if (epoch+1) % 10 == 0:
            print('Epoch [{}/{}], Loss: {:.4f}'.format(epoch+1, num_epochs, loss.item()))

In [101]:
fit(100, model, loss_fn, opt, train_dl)

Epoch [10/100], Loss: 1687.1840
Epoch [20/100], Loss: 650.8481
Epoch [30/100], Loss: 723.2008
Epoch [40/100], Loss: 137.0951
Epoch [50/100], Loss: 233.8211
Epoch [60/100], Loss: 235.2502
Epoch [70/100], Loss: 235.9969
Epoch [80/100], Loss: 152.6842
Epoch [90/100], Loss: 107.7626
Epoch [100/100], Loss: 89.6423


In [102]:
preds = model(inputs)
preds

tensor([[ 58.9351,  72.6955],
        [ 79.0166,  97.9575],
        [122.0980, 134.9642],
        [ 31.2932,  50.1938],
        [ 90.5719, 106.8570],
        [ 57.8481,  71.8458],
        [ 78.2131,  97.4590],
        [122.0916, 135.3167],
        [ 32.3801,  51.0435],
        [ 90.8552, 107.2082],
        [ 58.1315,  72.1970],
        [ 77.9297,  97.1078],
        [122.9016, 135.4627],
        [ 31.0098,  49.8426],
        [ 91.6588, 107.7067]], grad_fn=<AddmmBackward0>)

In [103]:
targets

tensor([[ 56.,  70.],
        [ 81., 101.],
        [119., 133.],
        [ 22.,  37.],
        [103., 119.],
        [ 57.,  69.],
        [ 80., 102.],
        [118., 132.],
        [ 21.,  38.],
        [104., 118.],
        [ 57.,  69.],
        [ 82., 100.],
        [118., 134.],
        [ 20.,  38.],
        [102., 120.]])