 In this tutorial ,  we will create a Deep Learning model for building a handwritten digit classifier. We will make use of the MNIST dataset included in the torchvision package.
 
 Mandatory first step is to do the basic data pre-processing steps , using the a utility called transforms which comes from 
 torchvision package we will do two below mentioned basic data preprocessing operations (this will be explained more detail in case of CNN).
 
- Transform the raw dataset into tensors.
- Normalize the dataset.

We will also import the dataset from torch vision package.

In [1]:
import torch
from torch import nn
from torchvision.datasets import MNIST

In [2]:
from torchvision import transforms

transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
    ])

For the sake of simplicity we are only using the training dataset in this tutorial

Test and Validation dataset have been used in the previous tutorial.

In [3]:
# choose the training dataset
train_data = MNIST(root='data', train=True,
                                   download=True, transform=transform)

In [4]:
#size of train dataset
len(train_data)

60000

In [5]:
# how many samples per batch to load
batch_size = 50

In [6]:
import numpy as np
from torch.utils.data import DataLoader

# data loader preparation
train_loader = DataLoader(train_data, batch_size=batch_size)

DataLoader creates iterables for all the batches and we will use this inside the traning loop.

In [7]:
# Let's check the shape of the input/target data
for data, target in train_loader:
    print(data.shape)
    print(target.shape)
    break

torch.Size([50, 1, 28, 28])
torch.Size([50])


## Batch Normalization:

Added Batch Normalization after the linear but before the non linear activation function

In [8]:
from torch import nn, optim
import torch.nn.functional as F

class Model(nn.Module):
    def __init__(self):
        super().__init__()
        self.fc1 = nn.Linear(784, 512)
        self.bn1 = nn.BatchNorm1d(num_features=512) # batch norm layer 1
        self.fc2 = nn.Linear(512, 256)
        self.bn2 = nn.BatchNorm1d(num_features=256) # batch norm layer 2
        self.fc3 = nn.Linear(256, 128)
        self.bn3 = nn.BatchNorm1d(num_features=128)  # batch norm layer 3                         
        self.fc4 = nn.Linear(128, 56)
        self.bn4 = nn.BatchNorm1d(num_features=56)   # batch norm layer 4 
        self.fc5 = nn.Linear(56, 10)
        
        #drop out with 0.3 probability
        self.dropout = nn.Dropout(p=0.3)
        
    def forward(self, x):
        # input tensor is flattened 
        x = x.view(x.shape[0], -1)
        
        # applied dropout layer
        x = self.dropout(F.relu(self.bn1(self.fc1(x))))
        x = self.dropout(F.relu(self.bn2(self.fc2(x))))
        x = self.dropout(F.relu(self.bn3(self.fc3(x))))
        x = self.dropout(F.relu(self.bn4(self.fc4(x))))
        
        #no dropout at the output layer
        x = self.fc5(x)
        
        return x

In [9]:
model = Model()

In [10]:
# Loss Function and Optimizer

criterion = nn.CrossEntropyLoss()

from torch import optim

optimizer = optim.SGD(model.parameters(), lr=0.05)

### Learning Rate Scheduler:

Here we will use the StepLR i.e. Step Learning Rate scheduler. Below is the reference and example from Pytorch doc:

```python
- torch.optim.lr_scheduler.StepLR(optimizer, step_size, gamma=0.1, last_epoch=-1)
```

Sets the learning rate of each parameter group to the initial lr decayed by gamma every step_size epochs. 

Parameters:	
#### optimizer (Optimizer) – Wrapped optimizer.
#### step_size (int) – Period of learning rate decay.
#### gamma (float) – Multiplicative factor of learning rate decay. Default: 0.1.
#### last_epoch (int) – The index of last epoch. Default: -1.

Example:
scheduler = StepLR(optimizer, step_size=5, gamma=0.1)
#### Assuming optimizer uses lr = 0.05 for all groups
#### lr = 0.05     if epoch < 5
#### lr = 0.005    if 5 <= epoch < 10
#### lr = 0.0005   if 10 <= epoch < 15

Examples describes that the initial learning rate defined in the <b>optimizer</b> step was 0.05 , which will be reducuded after every 5 epochs.Learning Rate will be decreased with the multiplication factor of 0.1(i.e. the value defined in gamma).

Other learning rate scheduler details are here - https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate

In [11]:
# Creating LR scheduler
scheduler = optim.lr_scheduler.StepLR(optimizer, step_size=10, gamma=0.1)

In [12]:
print(torch.cuda.is_available())

False


In [13]:
# Creating a device object 
device = torch.device(
    "cuda") if torch.cuda.is_available() else torch.device("cpu")
print(device)

cpu


In [14]:
# Taking the model to avialable 'device'
model.to(device);

<b>Traning the Model:</b>
    
Note: scheduler.step() has to be added to decay the learning rate with epochs.

In [15]:
%%time
for epoch in range(1, 16): ## run the model for 15 epochs
    train_loss = []
    ## training part 
    model.train()
    scheduler.step() # for LR scheduler
    for data, target in train_loader:
        
        # Move input and label tensors to the avialable device
        data, target = data.to(device), target.to(device)
        
        #Reshaping the input data before sending into the model
        data = data.view(data.shape[0], -1)
        
        optimizer.zero_grad()
        
        ## 1. forward propagation
        output = model(data)
        
        ## 2. loss calculation
        loss = criterion(output, target)
        
        ## 3. backward propagation
        loss.backward()
        
        ## 4. weight optimization
        optimizer.step()
        
        train_loss.append(loss.item())
        
    print ("Epoch:", epoch, "Training Loss: ", np.mean(train_loss))

Epoch: 1 Training Loss:  0.591511638003091
Epoch: 2 Training Loss:  0.27450931679923085
Epoch: 3 Training Loss:  0.21339781960472465
Epoch: 4 Training Loss:  0.18062251378704483
Epoch: 5 Training Loss:  0.1567279348600035
Epoch: 6 Training Loss:  0.13593307446455583
Epoch: 7 Training Loss:  0.1244783455634024
Epoch: 8 Training Loss:  0.11661969905040072
Epoch: 9 Training Loss:  0.1055293292296119
Epoch: 10 Training Loss:  0.09655629589455202
Epoch: 11 Training Loss:  0.0824143057805486
Epoch: 12 Training Loss:  0.07365655489324126
Epoch: 13 Training Loss:  0.06981207851921985
Epoch: 14 Training Loss:  0.06699953551093737
Epoch: 15 Training Loss:  0.0663414308197874
Wall time: 10min 49s
