### PyTorch

PyTorch is a popular open source machine learning library based on Torch library. Pytorch provides three set of libraries, i.e., torchvision, torchaudio, torchtext for Computer Vision, Audio and Text respectively.

It provides two high-level features:

* Tensor computation (like NumPy) with strong GPU acceleration.
* Deep neural networks built on a type-based autograd system.

**Topic Covered**

- Building simple model.
- Computing derivative.
- Training and computing loss.
- Using Autograd.
- Turn tensor to a learnable parameter.
- Training and Updating parameters using Autograd.

**Note**: I am assuming the reader has basic understanding of simple machine learning model.

### Importing Libraries

In [None]:
import os
import numpy as np

import torch

from PIL import Image
import matplotlib.pyplot as plt

import warnings
warnings.filterwarnings("ignore")

### How to create a Model

* Creating linear regression
* Mapping of x and y.
* Turing List into Tensor.
* Create a function for basic line equation
* Create a function to measure the loss using mean square error.


Declaring vectors t_c (temperature in celsius) and t_u (unknown variable along which the temp in celsius changes).
To be precise, t_u is the x, i.e. feature and t_c is the target/label.

In [None]:
t_c = [0.5, 14.0, 15.0, 28.0, 11.0, 8.0, 3.0, -4.0, 6.0, 13.0, 21.0]
t_u = [35.7, 55.9, 58.2, 81.9, 56.3, 48.9, 33.9, 21.8, 48.4, 60.4, 68.4]
t_c = torch.tensor(t_c)
t_u = torch.tensor(t_u)

In [None]:
"""Python function returns the line equation, where w represents the weight vector and b represents intercept(bias term)"""

def model(t_u, w, b):
    return w*t_u + b

In [None]:
"""Loss function - To find difference between estimated vs actual value. 
Loss function is called as Mean Squared Error"""

def loss(t_p, t_c):
    return torch.mean((t_p-t_c)**2)

In [None]:
"""Initializing Weight and Bias"""

w = torch.ones(())
b = torch.zeros(())

In [None]:
t_p = model(w, t_u, b)
Loss = loss(t_p, t_c)
print(f'Loss on Untrained Model: {Loss}')

In Machine Learning, the objective is to build a optimized function to estimate the acual target. Here, the optimized function is a function of weight, bias, and input. Thus, we optimize the weight and bias of the function, to accurately estimate the target.

To optimize the parameters, we use the popular gradient descent algorithm, the idea is to compute the rate of change of the loss with respect to each parameter, and modify each parameter in the direction of decreasing loss.

 The loss function represents the difference between our estimated output against the actual ground truth.

**Loss Rate Of Change In W and b**

This is saying that in the neighborhood of the current values of w and b, a unit
increase in w leads to some change in the loss. If the change is negative, then we need
to increase w to minimize the loss, whereas if the change is positive, we need to
decrease w.

In [None]:
delta = 0.1

loss_rate_of_change_w = (loss(model(t_u, w + delta, b), t_c) - loss(model(t_u, w - delta, b), t_c)) / (2.0 * delta)

learning_rate = 1e-2
w = w - learning_rate * loss_rate_of_change_w

print(f'Updated W: {w}')

In [None]:
loss_rate_of_change_b = \
(loss(model(t_u, w, b + delta), t_c) -
loss(model(t_u, w, b - delta), t_c)) / (2.0 * delta)
b = b - learning_rate * loss_rate_of_change_b
print(f'Updated b: {b}')

### Computing Derivative

To compute derivative of the loss w.r.t parameters, we can apply chain rule(partial derivatives). Would encourage users to read through basic calculus.

d_loss / d w = (d loss / d t_p) * (d t_p / d w)

In [None]:
def d_loss(t_p, t_c):
    dsq_diff = 2 * (t_p - t_c) / t_p.size(0)
    
    return dsq_diff

In [None]:
"""Differentiating linear equation w.r.t w parameter, linear equation is our objective function,
   d(w*t_u + b)/ dw = t_u + 0 = t_u"""

def d_model_dw(w, t_u, b):
    return t_u 

In [None]:
"""Differentiating linear equation w.r.t b parameter, d(w*t_u + b)/ db = 0 + 1.0 = 1.0"""

def d_model_db(w, t_u, b):
    return 1.0

In [None]:
"""grad_fn - Updating the weight and bias using learning rate after calculating the loss."""

def grad_fn(t_u, t_c, t_p, w, b):
    
    dloss_dtp = d_loss(t_p, t_c)
    dloss_dw = dloss_dtp * d_model_dw(t_u, w, b)
    dloss_db = dloss_dtp * d_model_db(t_u, w, b)
    

    return torch.stack([dloss_dw.sum(), dloss_db.sum()])

**Creating Training Loop**

To iterate over the training data to learn parameters(w and b) by updating it based on the loss.

In [None]:
def training_loop(n_epochs, learning_rate, params, t_u, t_c):
    for epoch in range(0, n_epochs + 1):
        w, b = params
        t_p = model(t_u, w, b) #Forward pass
        
        Loss = loss(t_p, t_c)
        grad = grad_fn(t_u, t_c, t_p, w, b) #Backward pass
        
        params = params - learning_rate * grad
        if epoch % 10 == 0:
            print('Epoch %d, Loss %f' % (epoch, float(Loss)))
            print(f'params: {params}')
            print(f'grad: {grad}\n')
    return params

**We are iterating through objective function for 100 epochs, meaning in each epoch we pass the entire dataset through objective function. We calculate the loss and update the parameters in each iteration.**

In [None]:
params = training_loop(n_epochs = 100, learning_rate = 1e-2, 

                       params = torch.tensor([1.0, 0.0]), t_u = t_u, t_c = t_c)

After each iteration, we see the change in the parameter value of W and b w.r.t loss. But we can notice that the parameters are not updated after the 20th epoch, meaning, that the parameters are at its best minimum possible in the current experiment setup.

### PyTorch - Autograd Function

Pytorch provides a **.grad** characteristic to each tensor. If a tensor is created as **require_grad = True**, then that tensor turns into learnable parameter. We can check if the parameters are getting updated after executing loss.backward(). 

Loss.backward() tells pytorch to update the all learnable parameters to update weight based on loss. We can turn **.grad** to zero by **zero_()** because if we don't turn the **.grad to zero then grad values gets accumulated into .grad after each epoch.**

In [None]:
params = torch.tensor([1.0, 0.0],requires_grad=True)
print(f'Learnable Parameter: {params.grad is None}')

In [None]:
Loss = loss(model(t_u, *params), t_c)
Loss.backward()
print(f'Parameter\'s Gradient: {params.grad}')

In [None]:
if params.grad is not None:
    params.grad.zero_()
print(f'Turning Gradient to Zero: {params.grad}')

In [None]:
def training_loop_AG(n_epochs, learning_rate, params, t_u, t_c):
    
    for epoch in range(1, n_epochs+1):
        
        if params.grad is not None:
            params.grad.zero_()
            
        t_p = model(t_u, *params)
        Loss = loss(t_p, t_c)
        Loss.backward()
        
        with torch.no_grad():
            params -= learning_rate * params.grad
            
        if epoch % 10 == 0:
            print(f'epoch {epoch}, Loss: {Loss}')
            
    return params

In [None]:
"""scaling Down the feature."""
t_un = 0.1 * t_u 

In [None]:
training_loop_AG(n_epochs = 100, learning_rate = 1e-2,
params = torch.tensor([1.0, 0.0], requires_grad=True), t_u = t_un, t_c = t_c)

From the above result, we can notice at the 100th epoch, the scratch code we've written earlier produces a loss of loss 29.66, while the result from the autograd gives a loss of 22.14. 

Increase the number of epochs for better results, because we can see that after each epoch the loss is continuously decreasing.

### Thanks For Reading. For Feedback, reach out on Github. Please don't spam.