PyTorch basics - Linear Regression from scratch

In [1]:
'''
Author: Dhruv B Kakadiya

'''

'\nAuthor: Dhruv B Kakadiya\n\n'

In [2]:
from google.colab import drive
drive.mount("/content/drive")

Mounted at /content/drive


In [16]:
# importing libraries

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

from sklearn.preprocessing import OneHotEncoder, LabelEncoder
from sklearn.tree import DecisionTreeClassifier, export_graphviz, plot_tree
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, confusion_matrix, precision_score, recall_score
from subprocess import call

In [17]:
import torch

We'll create a model that predicts crop yeilds for apples (*target variable*) by looking at the average temperature, rainfall and humidity (*input variables or features*) in different regions. 

Here's the training data:

>Temp | Rain | Humidity | Prediction
>--- | --- | --- | ---
> 73 | 67 | 43 | 56
> 91 | 88 | 64 | 81
> 87 | 134 | 58 | 119
> 102 | 43 | 37 | 22
> 69 | 96 | 70 | 103

In a **linear regression** model, each target variable is estimated to be a weighted sum of the input variables, offset by some constant, known as a bias :

```
yeild_apple  = w11 * temp + w12 * rainfall + w13 * humidity + b1
```

It means that the yield of apples is a linear or planar function of the temperature, rainfall & humidity.



**Our objective**: Find a suitable set of *weights* and *biases* using the training data, to make accurate predictions.

In [18]:
# Input (temp, rainfall, humidity)
inputs = np.array([[73, 67, 43], 
                   [91, 88, 64], 
                   [87, 134, 58], 
                   [102, 43, 37], 
                   [69, 96, 70]], dtype='float32')

In [19]:
# Targets (apples, oranges)
targets = np.array([[56, 70], 
                    [81, 101], 
                    [119, 133], 
                    [22, 37], 
                    [103, 119]], dtype='float32')

In [20]:
# Convert inputs and targets to tensors
inputs = torch.from_numpy(inputs)
targets = torch.from_numpy(targets)
print(inputs)
print(targets)

tensor([[ 73.,  67.,  43.],
        [ 91.,  88.,  64.],
        [ 87., 134.,  58.],
        [102.,  43.,  37.],
        [ 69.,  96.,  70.]])
tensor([[ 56.,  70.],
        [ 81., 101.],
        [119., 133.],
        [ 22.,  37.],
        [103., 119.]])


## Linear Regression Model (from scratch)

The *weights* and *biases* can also be represented as matrices, initialized with random values. The first row of `w` and the first element of `b` are use to predict the first target variable i.e. yield for apples, and similarly the second for oranges.

In [21]:
# Weights and biases
w = torch.randn(2, 3, requires_grad=True)
b = torch.randn(2, requires_grad=True)
print(w)
print(b)

tensor([[ 0.7317, -1.2505,  0.3836],
        [ 0.9070,  0.0806,  0.5828]], requires_grad=True)
tensor([ 0.0287, -1.2243], requires_grad=True)


The *model* is simply a function that performs a matrix multiplication of the input `x` and the weights `w` (transposed) and adds the bias `b` (replicated for each observation).

$$
\hspace{2.5cm} X \hspace{1.1cm} \times \hspace{1.2cm} W^T \hspace{1.2cm}  + \hspace{1cm} b \hspace{2cm}
$$

$$
\left[ \begin{array}{cc}
73 & 67 & 43 \\
91 & 88 & 64 \\
\vdots & \vdots & \vdots \\
69 & 96 & 70
\end{array} \right]
%
\times
%
\left[ \begin{array}{cc}
w_{11} & w_{21} \\
w_{12} & w_{22} \\
w_{13} & w_{23}
\end{array} \right]
%
+
%
\left[ \begin{array}{cc}
b_{1} & b_{2} \\
b_{1} & b_{2} \\
\vdots & \vdots \\
b_{1} & b_{2} \\
\end{array} \right]
$$

In [22]:
# Define the model
def model(x):
    return x @ w.t() + b
  
# Generate predictions
preds = model(inputs)
print(preds)
print(targets)

tensor([[-13.8492,  95.4441],
        [-18.8846, 125.7003],
        [-81.6368, 122.2839],
        [ 35.0809, 116.3151],
        [-42.6845, 109.8883]], grad_fn=<AddBackward0>)
tensor([[ 56.,  70.],
        [ 81., 101.],
        [119., 133.],
        [ 22.,  37.],
        [103., 119.]])


## Loss Function

We can compare the predictions with the actual targets, using the following method: 
* Calculate the difference between the two matrices (`preds` and `targets`).
* Square all elements of the difference matrix to remove negative values.
* Calculate the average of the elements in the resulting matrix.

The result is a single number, known as the **mean squared error** (MSE).

Loss Function

In [23]:
# MSE loss
def mse(t1, t2):
    diff = t1 - t2
    return torch.sum(diff * diff) / diff.numel()

# Compute loss
loss = mse(preds, targets)
print(loss)

tensor(8425.2305, grad_fn=<DivBackward0>)


## Compute Gradients

With PyTorch, we can automatically compute the gradient or derivative of the `loss` w.r.t. to the weights and biases, because they have `requires_grad` set to `True`.

More on autograd:  https://pytorch.org/docs/stable/autograd.html#module-torch.autograd

In [24]:
# Compute gradients
loss.backward()

# Gradients for weights
print(w)
print(w.grad)
print(b)
print(b.grad)

tensor([[ 0.7317, -1.2505,  0.3836],
        [ 0.9070,  0.0806,  0.5828]], requires_grad=True)
tensor([[ -8072.3750, -10755.6621,  -6149.3979],
        [  2126.8538,    995.6473,    870.0427]])
tensor([ 0.0287, -1.2243], requires_grad=True)
tensor([-100.5948,   21.9263])


A key insight from calculus is that the gradient indicates the rate of change of the loss, or the slope of the loss function w.r.t. the weights and biases. 

* If a gradient element is **postive**, 
    * **increasing** the element's value slightly will **increase** the loss.
    * **decreasing** the element's value slightly will **decrease** the loss.




* If a gradient element is **negative**,
    * **increasing** the element's value slightly will **decrease** the loss.
    * **decreasing** the element's value slightly will **increase** the loss.
    


The increase or decrease is proportional to the value of the gradient.

In [25]:
# Generate predictions
preds = model(inputs)
print(preds)

tensor([[-13.8492,  95.4441],
        [-18.8846, 125.7003],
        [-81.6368, 122.2839],
        [ 35.0809, 116.3151],
        [-42.6845, 109.8883]], grad_fn=<AddBackward0>)


In [26]:
# Calculate the loss
loss = mse(preds, targets)
print(loss)

tensor(8425.2305, grad_fn=<DivBackward0>)


In [27]:
# Compute gradients
loss.backward()

# Adjust weights & reset gradients
with torch.no_grad():
    w -= w.grad * 1e-5
    b -= b.grad * 1e-5
    w.grad.zero_()
    b.grad.zero_()

print(w)

tensor([[ 0.8931, -1.0354,  0.5066],
        [ 0.8644,  0.0607,  0.5654]], requires_grad=True)


In [28]:
# Calculate loss
preds = model(inputs)
loss = mse(preds, targets)
print(loss)

tensor(4722.3564, grad_fn=<DivBackward0>)


In [29]:
# Train for 100 epochs
for i in range(100):
    preds = model(inputs)
    loss = mse(preds, targets)
    loss.backward()
    with torch.no_grad():
        w -= w.grad * 1e-5
        b -= b.grad * 1e-5
        w.grad.zero_()
        b.grad.zero_()

In [30]:
# Calculate loss
preds = model(inputs)
loss = mse(preds, targets)
print(loss)

tensor(612.4523, grad_fn=<DivBackward0>)


In [31]:
# Print predictions
preds

tensor([[ 66.2592,  76.0827],
        [ 90.7835, 103.3186],
        [ 84.5672, 117.6493],
        [ 74.1543,  71.6410],
        [ 85.8753, 103.2966]], grad_fn=<AddBackward0>)