## Hyperparameter Tuning

In this notebook we will play around with arguibly the most important parameter in Deep Learning, the *learning rate*. The notebook follows the code in the article <a href="https://www.geeksforgeeks.org/adjusting-learning-rate-of-a-neural-network-in-pytorch/">Adjusting Learning Rate of a Neural Netowrk in Pytorch</a> by Herumb Shandilya.

In [1]:
# First, importing modules
import torch
from torch import nn
import torch.nn.functional as F
from torchvision import datasets
from torchvision.transforms import ToTensor
from torch.utils.data import DataLoader
from torch.optim import SGD
from torch.optim.lr_scheduler import ReduceLROnPlateau
import numpy as np
from tqdm.notebook import trange

We will train a simple network to recognize hand written digits uring the MNIST dataset. First, we need to load the data, which we have to pre-download and copy to the HPC. An alternative is to download using the login node, as follows.

* Open a terminal and SSH into `login1` or `login2`.
* Load an appropriate module to use torch, e.g., `hubpy3.7-tf2.3`.

```module load hubpy3.7-tf2.3```

* Change directory to your data directory.
* Start `ipython` and load the dataset using the `dowload=True` flag.
* In the Jupyter notebook running on the HPC node, change the data load command appropriately to point to the location of the dataset.


In [2]:
# LOADING DATA
train = datasets.MNIST(root='/WAVE/projects/COEN-342-Sp21/data/', train = True, download = False, transform=ToTensor())
valid = datasets.MNIST(root='/WAVE/projects/COEN-342-Sp21/data/', train = False, download = False, transform=ToTensor())

trainloader = DataLoader(train, batch_size= 32, shuffle=True)
validloader = DataLoader(valid, batch_size= 32, shuffle=True)

**Creating the model**

We are defining a neural network with the following architecture:

- Input Layer: 784 nodes, MNIST images are of dimension 28*28 which have 784 pixels so when flatted it’ll become the input to the neural network with 784 input nodes.
- Hidden Layer 1: 256 nodes
- Hidden Layer 2: 128 nodes
- Output Layer: 10 nodes, for 10 classes i.e. numbers 0-9

`nn.Linear()` or Linear Layer is used to apply a linear transformation to the incoming data. For those familiar with TensorFlow, it’s pretty much like the Dense Layer. 

In the forward() method, we start by flattening the image and passing it through each layer, applying the `ReLU` activation before moving on to the next layer. After that, we create our neural network instance, and we're ready to go.

In [3]:
# CREATING OUR MODEL
class Net(nn.Module):
    def __init__(self):
        super(Net,self).__init__()
        self.fc1 = nn.Linear(28*28,64)
        self.fc2 = nn.Linear(64,32)
        self.out = nn.Linear(32,10)
        self.lr = 0.01
        self.loss = nn.CrossEntropyLoss()
    
    def forward(self,x):
        batch_size, _, _, _ = x.size()
        x = x.view(batch_size,-1)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        return self.out(x)

model = Net()

Finally, we check to see if a GPU is available and, if it is, send the model to the GPU memory.

In [4]:
# Send the model to GPU if available
if torch.cuda.is_available():
    print("Using GPU for training.")
    model = model.cuda()

Using GPU for training.


**Defining Criterion and Optimizer**

Optimizers define how the weights of the neural network are to be updated, in this tutorial we’ll use the Stochastic Gradient Descent (SGD) optimizer. Optimizers take model parameters and learning rate as the input arguments. There are various optimizers you can try like Adam, Adagrad, etc.

The criterion is the loss that you want to minimize which in this case is the CrossEntropyLoss() which is the combination of log_softmax() and NLLLoss(). Different losses are applicable to different types of problems, so be sure to understand the assumptions of the loss and whether they apply to the problem you are trying to solve.

In [5]:
# SETTING OPTIMIZER, LOSS AND SCHEDULER
optimizer = SGD(model.parameters(), lr = 0.1)
loss = nn.CrossEntropyLoss()
scheduler = ReduceLROnPlateau(optimizer, 'min', patience = 5)

**Training Neural Network with Validation**

The training step in PyTorch is almost identical almost every time you train it. But before implementing that let’s learn about 2 modes of the model object:

- Training Mode:  Set by model.train(), it tells your model that you are training the model. So layers like dropout etc. which behave differently while training and testing can behave accordingly.
- Evaluation Mode:  Set by model.eval(), it tells your model that you are testing the model.

For each training step, the following operations must occur, in order:

- Move data to GPU (Optional)
- Clear the gradients using `optimizer.zero_grad()`
- Make a forward pass
- Calculate the loss
- Perform a backward pass using `.backward()` to calculate the gradients
- Take optimizer step using `optimizer.step()` to update the weights

Finally, we keep track of the lowest validation error we have so far and save the model any time it is better than our best saved model thus far.

In [6]:
# TRAINING THE NEURAL NETWORK
nepochs = 25
min_valid_loss = np.inf
for e in trange(nepochs):
    train_loss, valid_loss = 0.0, 0.0
    
    # Set model to training mode
    model.train()
    for data, label in trainloader:
        # Transfer Data to GPU if available
        if torch.cuda.is_available():
            data, label = data.cuda(), label.cuda()
        # Clear the gradients
        optimizer.zero_grad()
        # Forward Pass
        target = model(data)
        # Find the Loss
        train_step_loss = loss(target, label)
        # Calculate gradients 
        train_step_loss.backward()
        # Update Weights
        optimizer.step()
        # Calculate Loss
        train_loss += train_step_loss.item() * data.size(0)

    # Set model to inference/evaluation mode
    model.eval()
    for data, label in validloader:
        if torch.cuda.is_available():
            data, label = data.cuda(), label.cuda()

        target = model(data)
        valid_step_loss = loss(target, label)

        valid_loss += valid_step_loss.item() * data.size(0)
    
    curr_lr = optimizer.param_groups[0]['lr']

    print(f'Epoch {e}\t \
            Training Loss: {train_loss/len(trainloader)}\t \
            Validation Loss:{valid_loss/len(validloader)}\t \
            LR:{curr_lr}')
    if min_valid_loss > valid_loss:
        print(f'Validation Loss Decreased({min_valid_loss:.6f}--->{valid_loss:.6f}) \t Saving the model.')
        min_valid_loss = valid_loss
        # Saving State Dict
        torch.save(model.state_dict(), 'mnist.pth')
    scheduler.step(valid_loss/len(validloader))

  0%|          | 0/25 [00:00<?, ?it/s]

Epoch 0	             Training Loss: 12.023097681681316	             Validation Loss:5.081164587229585	             LR:0.1
Validation Loss Decreased(inf--->1590.404516) 	 Saving the model.
Epoch 1	             Training Loss: 4.5322415368715925	             Validation Loss:3.7228228597642894	             LR:0.1
Validation Loss Decreased(1590.404516--->1165.243555) 	 Saving the model.
Epoch 2	             Training Loss: 3.31743111812671	             Validation Loss:4.258071411769992	             LR:0.1
Epoch 3	             Training Loss: 2.6708932235181333	             Validation Loss:2.7514589838088512	             LR:0.1
Validation Loss Decreased(1165.243555--->861.206662) 	 Saving the model.
Epoch 4	             Training Loss: 2.2209462110926705	             Validation Loss:2.9226300595476986	             LR:0.1
Epoch 5	             Training Loss: 1.9061553725590308	             Validation Loss:2.691448827074787	             LR:0.1
Validation Loss Decreased(861.206662--->842.423483) 	 