### PyTorch

PyTorch is a popular open source machine learning library based on Torch library. Pytorch provides three set of libraries, i.e., torchvision, torchaudio, torchtext for Computer Vision, Audio and Text respectively.

It provides two high-level features:

* Tensor computation (like NumPy) with strong GPU acceleration.
* Deep neural networks built on a type-based autograd system.

**Topic Covered**

- Using Pytorch Optimizer.
- Splitting Dataset.
- Training without no_grad.
- Training with no_grad.
- Creating Polynomial Model.
- Building Neural Network Using nn.Module.
- Building Neural Network With One Hidden Layer.
- Finding total number of parameters in the model.
- Building Sequential Model using OrderedDict and Named Layers.
- Training and Predicting on validation set.

**Note**: I am assuming the reader has basic understanding of simple machine learning model. Would encourage the user to read notebook V to have clear understanding of what are the different terms like model, loss function, autograd etc.

### Importing Libraries

In [None]:
import os
import numpy as np
from collections import OrderedDict

import torch
from torch import optim
import torch.nn as nn

import matplotlib.pyplot as plt

import warnings
warnings.filterwarnings("ignore")

### Optimizer

Optimizer are the algorithms used to update the weights. Pytorch provides wide range of optimizer from SGD to ADAM & many more. Check docstring for more details.
* Iterate through epochs.
* Training the model and finding the predicted value.
* Pass the predicted value to loss function to calculate the loss.
* Turn the existing parameter/weights to zero.
* calculate the weights using optimizer based on loss.
* Apply the weights update to parameters.

In [None]:
"""
Declaring vectors t_c (temperature in celsius) and t_u (unknown variable along which the temp in celsius changes).
To be precise, t_u is the x, i.e. feature and t_c is the target/label.
"""

t_c = [0.5, 14.0, 15.0, 28.0, 11.0, 8.0, 3.0, -4.0, 6.0, 13.0, 21.0]
t_u = [35.7, 55.9, 58.2, 81.9, 56.3, 48.9, 33.9, 21.8, 48.4, 60.4, 68.4]
t_c = torch.tensor(t_c)
t_u = torch.tensor(t_u)

In [None]:
"""Python function returns the line equation, where w represents the weight vector and b represents intercept(bias term)"""

def model(t_u, w, b):
    return w*t_u + b

In [None]:
"""Loss function - To find difference between estimated vs actual value. 
Loss function is called as Mean Squared Error"""

def loss(t_p, t_c):
    return torch.mean((t_p-t_c)**2)

In [None]:
def training_loop_optim(n_epochs, optimizer, params, t_u, t_c):
    for epoch in range(1, n_epochs+1):
        
        t_p = model(t_u, *params)
        Loss = loss(t_p, t_c)
        optimizer.zero_grad()
        
        Loss.backward()
        optimizer.step()
        
        if epoch % 10 == 0:
            print(f'Epoch: {epoch}, Loss: {Loss}')
        
    return params

In [None]:
params = torch.tensor([1.0, 0.0], requires_grad=True)
learning_rate = 1e-2
optimizer = optim.SGD([params], lr=learning_rate)

In [None]:
"""scaling Down the feature."""
t_un = 0.1 * t_u

In [None]:
training_loop_optim(n_epochs=100, optimizer=optimizer,params= params,  t_u = t_un,t_c= t_c)

### Splitting Dataset into Train and Validation

* We use indexing to split the dataset. Using random selection of indices, we select the observations.

In [None]:
n_samples = t_u.shape[0]
n_val = int(0.2 * n_samples)
shuffled_indices = torch.randperm(n_samples)
train_indices = shuffled_indices[:-n_val]
val_indices = shuffled_indices[-n_val:]
print(f'Indices of Training Samples: {train_indices}')
print(f'Indices of validation Samples: {val_indices}')

* Scaling Down the training and validation samples by 0.1

In [None]:
train_t_u, train_t_c = t_u[train_indices], t_c[train_indices]
val_t_u, val_t_c = t_u[val_indices], t_c[val_indices] 
train_t_un = 0.1 * train_t_u
val_t_un = 0.1 * val_t_u

In the below cell, we write the training_loop_grad method to train the model and simultaneously validate the model on validation dataset.

Note, While performing an inference on the validation set, we should disable the gradients calculation. Also note the loss at the end of 100th epoch.

[PyTorch Documentation](https://pytorch.org/docs/stable/generated/torch.no_grad.html)
>Disabling gradient calculation is useful for inference, when you are sure that you will not call Tensor.backward(). It will reduce memory consumption for computations that would otherwise have requires_grad=True

In [None]:
def training_loop_grad(n_epochs, optimizer, params, train_t_u, val_t_u, train_t_c, val_t_c):
    for epoch in range(1, n_epochs + 1):
        train_t_p = model(train_t_u, *params)
        train_loss = loss(train_t_p, train_t_c)
        val_t_p = model(val_t_u, *params)
        val_loss = loss(val_t_p, val_t_c)

        optimizer.zero_grad()
        train_loss.backward()
        optimizer.step()

        if epoch % 10 == 0:
            print(f"Epoch {epoch}, Training loss {train_loss.item():.4f},"
                    f" Validation loss {val_loss.item():.4f} \n")

In [None]:
params = torch.tensor([1.0, 0.0], requires_grad=True)
learning_rate = 1e-2
optimizer = optim.SGD([params], lr=learning_rate)

In [None]:
training_loop_grad(n_epochs = 100, optimizer = optimizer,
params = params, train_t_u = train_t_un, val_t_u = val_t_un,
train_t_c = train_t_c, val_t_c = val_t_c)

In the below cell, we write the training_loop_no_grad method to train the model and simultaneously validate the model on validation dataset. Here, by using torch.no_grad(), we disable the gradients.

Check out `torch.set_grad_enabled`  for enabling and disabiling gradients

In [None]:
def training_loop_no_grad(n_epochs, optimizer, params, train_t_u, val_t_u,train_t_c, val_t_c):
    for epoch in range(1, n_epochs + 1):
        train_t_p = model(train_t_u, *params)
        train_loss = loss(train_t_p, train_t_c)
        
        with torch.no_grad():
            val_t_p = model(val_t_u, *params)
            val_loss = loss(val_t_p, val_t_c)
            assert val_loss.requires_grad == False

        optimizer.zero_grad()
        train_loss.backward()
        optimizer.step()
        
        if epoch % 10 == 0:
            print(f"Epoch {epoch}, Training loss {train_loss.item():.4f},"
                    f" Validation loss {val_loss.item():.4f} \n")

In [None]:
training_loop_no_grad(n_epochs = 100, optimizer = optimizer,
params = params, train_t_u = train_t_un, val_t_u = val_t_un,
train_t_c = train_t_c, val_t_c = val_t_c)

### Polynomial Equation - w2 * t_u ** 2 + w1 * t_u + b

In [None]:
"""
Three parameters - w2, w1, b. Input t_u and its square(t_u**2).

For instance, equation can be thought as 2X^2 + 4X + 5, the coefficient (2, 4, 5) are (w2, w1, b).

We are using in-built methods for loss calculation and for updating the parameters.
"""

params = torch.randn(3,) 
params.requires_grad=True
criterion = torch.nn.MSELoss()
optimizer = optim.Adam([params], lr=1e-2)

In [None]:
def polyModel(t_u, w1, w2, b):
    return w2 * t_u ** 2 + w1 * t_u + b

In [None]:
def train(n_epochs, optimizer, params, train_t_u, val_t_u,train_t_c, val_t_c):
    for epoch in range(1, n_epochs + 1):
        w1, w2, b = params
        train_t_p = polyModel(train_t_u, w1, w2, b)
        train_loss = criterion(train_t_p, train_t_c)
        
        with torch.no_grad():
            val_t_p = polyModel(val_t_u, *params)
            val_loss = criterion(val_t_p, val_t_c)
            assert val_loss.requires_grad == False

        optimizer.zero_grad()
        train_loss.backward()
        optimizer.step()
        if epoch % 10==0:
            print(f'Epoch: {epoch}, Loss: {train_loss}, Parameters: {params} \n')

In [None]:
train(n_epochs = 100, optimizer = optimizer, params=params,
train_t_u = train_t_un, val_t_u = val_t_un,
train_t_c = train_t_c, val_t_c = val_t_c)

**Note:** With increasing complexity of the model(polynomial), the performance of the model on that training data is improved. In polynomial model at epoch 10, we have a training loss of 4.48, while for the linear model at epoch 10, we have a training loss of 20.08.

### Building Neural Nets Using nn.Module

**nn.Module** is the base class of all neural network modules. Here, the neural network modules refers to the layers like linear layers, convolution layers, LSTM layers, etc, which are required to build the neural network architecture.

For our previous linear model, we used one feature and a target variable. Here, we'll build the same model using nn.Module.

In [None]:
"""Dummy Linear Equation model with one sample with one feature. 
We can notice that by declaring the Linear Layer, 
we automatically create weight tensor and the bias term is optional"""

x = torch.ones(1)
l_model = nn.Linear(1,1, bias=True)
l_model(x)
print(f'Weight of the Linear Layer: {l_model.weight.item()} \n')
print(f'Bias of the Linear Layer: {l_model.bias.item()}')

**Model which takes batch of samples with one feature**

In [None]:
"""
Creating a set of sample with one value.
"""
x = torch.ones(10, 1)
print(f'Passing set of samples through Linear Layer: \n{l_model(x)}')

### Building Neural Network with One Hidden Layer

* Pytorch has Sequence Module similar to Keras, We can add layers in sequence, in the order of operation.
* Finding Total number of parameters in the model.
* Assigning names to layers, it is helpful when you want to retrain particular layer.
* Training the network with optimizer and criterion function.

In [None]:
x = t_u.unsqueeze(1)
y = t_c.unsqueeze(1)

**Sequential Model**

* First Linear Layer is the hidden layer with 10 neurons.
* Tanh is the activation function.
* Second Linear layer is the output layer with 1 neuron (Label).

In [None]:
seq_model = nn.Sequential(
    nn.Linear(1, 10),
    nn.Tanh(),
    nn.Linear(10, 1)
    )

In [None]:
print(f'Total Number Of Parameters: {sum(p.numel() for p in seq_model.parameters() if p.requires_grad)}')

In [None]:
for name, param in seq_model.named_parameters():
    print(f'Parameter Detail: {name, param.shape}')

**Sequential Model Using OrderedDict With Named Layers**

In [None]:
seq_model = nn.Sequential(OrderedDict([
    ('hidden_linear', nn.Linear(1, 9)),
    ('hidden_activation', nn.Tanh()),
    ('output_linear', nn.Linear(9, 1))
    ]))
print(seq_model)

In [None]:
"""Named Layers"""
for name, param in seq_model.named_parameters():
    print(name, param.shape)

In [None]:
"""Random Initialization Of Weight and Bias Term"""

print(f'Weight Initialized: {seq_model.output_linear.weight}')
print(f'Bias Initialized: {seq_model.output_linear.bias.item()}')

In [None]:
def training_loop(n_epochs, optimizer, model, loss_fn, t_u_train, t_u_val,
t_c_train, t_c_val):
    for epoch in range(1, n_epochs + 1):
        t_p_train = model(t_u_train)
        loss_train = loss_fn(t_p_train, t_c_train)
        with torch.no_grad():
            t_p_val = model(t_u_val)
            loss_val = loss_fn(t_p_val, t_c_val)
        optimizer.zero_grad()
        loss_train.backward()
        optimizer.step()

        if epoch == 1 or epoch % 10 == 0:
            print(f"Epoch {epoch}, Training loss {loss_train.item():.4f},"
            f" Validation loss {loss_val.item():.4f}")

**Using Trained Sequential Model to predict on the validation samples.**

In [None]:
optimizer = optim.SGD(seq_model.parameters(), lr=1e-3)

training_loop(n_epochs = 100, optimizer = optimizer,model = seq_model, loss_fn = nn.MSELoss(),
t_u_train = train_t_un.unsqueeze(1), t_u_val = val_t_un.unsqueeze(1),
t_c_train = train_t_c.unsqueeze(1), t_c_val = val_t_c.unsqueeze(1))

print('\nValidation Set\'s Predicted Output: ', seq_model(val_t_un.unsqueeze(1)),"\n")
print('Validation Set\'s Ground Truth:', val_t_c.unsqueeze(1), '\n')
print('Hidden Tensor Weights:', seq_model.hidden_linear.weight.grad)

### Thanks For Reading. For Feedback, reach out on Github. Please don't spam.