import the resp. libraries

In [1]:
import numpy as np
import torch

## Introduction to Linear Regression

In this tutorial, we'll discuss one of the foundational algorithms in machine learning: *Linear regression*. We'll create a model that predicts crop yields for apples and oranges (*target variables*) by looking at the average temperature, rainfall, and humidity (*input variables or features*) in a region. Here's the training data:

![linear-regression-training-data](https://i.imgur.com/6Ujttb4.png)

In a linear regression model, each target variable is estimated to be a weighted sum of the input variables, offset by some constant, known as a bias :

```
yield_apple  = w11 * temp + w12 * rainfall + w13 * humidity + b1
yield_orange = w21 * temp + w22 * rainfall + w23 * humidity + b2
```

Visually, it means that the yield of apples is a linear or planar function of temperature, rainfall and humidity:

![linear-regression-graph](https://i.imgur.com/4DJ9f8X.png)

In [2]:
# Input (temp, rainfall, humidity)
inputs = np.array([[73, 67, 43], 
                   [91, 88, 64], 
                   [87, 134, 58], 
                   [102, 43, 37], 
                   [69, 96, 70]], dtype='float32')

In [3]:
# Targets (apples, oranges)
targets = np.array([[56, 70], 
                    [81, 101], 
                    [119, 133], 
                    [22, 37], 
                    [103, 119]], dtype='float32')

In [4]:
# Convert inputs and targets to tensors
inputs = torch.from_numpy(inputs)
targets = torch.from_numpy(targets)
print(inputs)
print(targets)

tensor([[ 73.,  67.,  43.],
        [ 91.,  88.,  64.],
        [ 87., 134.,  58.],
        [102.,  43.,  37.],
        [ 69.,  96.,  70.]])
tensor([[ 56.,  70.],
        [ 81., 101.],
        [119., 133.],
        [ 22.,  37.],
        [103., 119.]])


## Linear regression model from scratch

The weights and biases (`w11, w12,... w23, b1 & b2`) can also be represented as matrices, initialized as random values. The first row of `w` and the first element of `b` are used to predict the first target variable, i.e., yield of apples, and similarly, the second for oranges.
`torch.randn` creates a tensor with the given shape, with elements picked randomly from a [normal distribution](https://en.wikipedia.org/wiki/Normal_distribution) with mean 0 and standard deviation 1.

Our *model* is simply a function that performs a matrix multiplication of the `inputs` and the weights `w` (transposed) and adds the bias `b` (replicated for each observation).

![matrix-mult](https://i.imgur.com/WGXLFvA.png)


Just see the dimensions carefully

In [5]:
# Weights and biases
w = torch.randn(2, 3, requires_grad=True)
b = torch.randn(2, requires_grad=True)
print(w)
print(b)

tensor([[ 1.2310,  1.2493,  0.2229],
        [ 0.1696, -0.3330,  1.4909]], requires_grad=True)
tensor([ 0.7164, -1.3752], requires_grad=True)


Make a new model which takes input values as arguments and form a predected output value. compare it with the targets, values will be off because we took raddom value for basis and all

In [6]:
def model(x):
    return x @ w.t() + b

In [7]:
# Generate predictions
preds = model(inputs)
print(preds)

tensor([[183.8644,  52.8070],
        [236.9379,  80.1767],
        [288.1444,  55.2365],
        [188.2416,  56.7716],
        [221.1887,  82.7267]], grad_fn=<AddBackward0>)


In [8]:
# Compare with targets
print(targets)

tensor([[ 56.,  70.],
        [ 81., 101.],
        [119., 133.],
        [ 22.,  37.],
        [103., 119.]])


make loss fuction which calculates the square of difference between the predected and target variable
use torch sum and numel (see documentation)

In [9]:
# MSE loss
def mse(t1, t2):
    diff = t1 - t2
    return torch.sum(diff * diff) / diff.numel()

In [10]:
# Compute loss
loss = mse(preds, targets)
print(loss)

tensor(11936.3604, grad_fn=<DivBackward0>)


Here comes the gradient descent algorithm.we have to minimize the loss with respect to the weight and bases 

In [11]:
# Compute gradients
loss.backward()

In [12]:
# Gradients for weights
print(w)
print(w.grad)

tensor([[ 1.2310,  1.2493,  0.2229],
        [ 0.1696, -0.3330,  1.4909]], requires_grad=True)
tensor([[12670.3340, 12689.8594,  7942.5425],
        [-2080.3181, -3207.3501, -1677.9714]])


In [13]:
w
w.grad

tensor([[12670.3340, 12689.8594,  7942.5425],
        [-2080.3181, -3207.3501, -1677.9714]])

We multiply the gradients with a very small number (`10^-5` in this case) to ensure that we don't modify the weights by a very large amount. We want to take a small step in the downhill direction of the gradient, not a giant leap. This number is called the *learning rate* of the algorithm. 

We use `torch.no_grad` to indicate to PyTorch that we shouldn't track, calculate, or modify gradients while updating the weights and biases.

In [14]:
with torch.no_grad():
    w -= w.grad * 1e-5
    b -= b.grad * 1e-5

In [15]:
# Let's verify that the loss is actually lower
loss = mse(preds, targets)
print(loss)

tensor(11936.3604, grad_fn=<DivBackward0>)


Before we proceed, we reset the gradients to zero by invoking the `.zero_()` method. We need to do this because PyTorch accumulates gradients. Otherwise, the next time we invoke `.backward` on the loss, the new gradient values are added to the existing gradients, which may lead to unexpected results.

In [16]:
w.grad.zero_()
b.grad.zero_()
print(w.grad)
print(b.grad)

tensor([[0., 0., 0.],
        [0., 0., 0.]])
tensor([0., 0.])


# Train the model using gradient descent
As seen above, we reduce the loss and improve our model using the gradient descent optimization algorithm. Thus, we can train the model using the following steps:

Generate predictions

Calculate the loss

Compute gradients w.r.t the weights and biases

Adjust the weights by subtracting a small quantity proportional to the gradient

Reset the gradients to zero

Let's implement the above step by step.

In [17]:
# Generate predictions
preds = model(inputs)
print(preds)

tensor([[162.6960,  57.1963],
        [209.1561,  85.9664],
        [255.5086,  62.3177],
        [166.9210,  60.8938],
        [194.7026,  88.4160]], grad_fn=<AddBackward0>)


In [18]:
# Calculate the loss
loss = mse(preds, targets)
print(loss)

tensor(8274.6318, grad_fn=<DivBackward0>)


In [19]:
# Compute gradients
loss.backward()
print(w.grad)
print(b.grad)

tensor([[10487.3379, 10350.7148,  6497.7358],
        [-1625.0422, -2712.1716, -1373.8179]])
tensor([121.5969, -21.0419])


In [20]:
# Adjust weights & reset gradients
with torch.no_grad():
    w -= w.grad * 1e-5
    b -= b.grad * 1e-5
    w.grad.zero_()
    b.grad.zero_()

In [21]:
print(w)
print(b)

tensor([[ 0.9994,  1.0189,  0.0785],
        [ 0.2067, -0.2738,  1.5214]], requires_grad=True)
tensor([ 0.7138, -1.3747], requires_grad=True)


In [22]:
# Calculate loss
preds = model(inputs)
loss = mse(preds, targets)
print(loss)

tensor(5804.3794, grad_fn=<DivBackward0>)


In [23]:
# Train for 100 epochs
for i in range(500):
    preds = model(inputs)
    loss = mse(preds, targets)
    loss.backward()
    with torch.no_grad():
        w -= w.grad * 1e-5
        b -= b.grad * 1e-5
        w.grad.zero_()
        b.grad.zero_()

In [24]:
# Calculate loss
preds = model(inputs)
loss = mse(preds, targets)
print(loss)

tensor(29.3550, grad_fn=<DivBackward0>)


In [25]:
print(preds)
print(targets)

tensor([[ 58.0759,  69.9971],
        [ 78.4679, 104.5862],
        [125.7347, 124.5493],
        [ 25.2519,  36.9604],
        [ 93.1210, 125.7102]], grad_fn=<AddBackward0>)
tensor([[ 56.,  70.],
        [ 81., 101.],
        [119., 133.],
        [ 22.,  37.],
        [103., 119.]])


In [31]:
!pip install jovian --upgrade -q

In [32]:
import jovian

In [None]:
jovian.commit(project='02-linear-regression')

<IPython.core.display.Javascript object>

[jovian] Attempting to save notebook..
