 In this tutorial ,  we will create a Deep Learning model for building a handwritten digit classifier. We will make use of the MNIST dataset included in the torchvision package.
 
 Mandatory first step is to do the basic data pre-processing steps , using the a utility called transforms which comes from 
 torchvision package we will do two below mentioned basic data preprocessing operations (this will be explained more detail in case of CNN).
 
- Transform the raw dataset into tensors.
- Normalize the dataset.

We will also import the dataset from torch vision package.

In [1]:
import torch
from torchvision.datasets import MNIST

In [2]:
from torchvision import transforms

transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
    ])

## Validation & Test Dataset:
When creating a machine learning model, the ultimate goal is for it to be accurate on new data, not just the data you are using to build it. The Idea of Validation set is to see if there is any overfitting going on or not. 

As this blog post by fast.ai says (https://www.fast.ai/2017/11/13/validation-sets/) about the different datasets:

- The training set is used to train a given model
- The validation set is used to choose between models (for instance, does a random forest or a neural net work better for your problem? do you want a random forest with 40 trees or 50 trees?)
- The test set tells you how you’ve done. If you’ve tried out a lot of different models, you may get one that does well on your validation set just by chance, and having a test set helps make sure that is not the case.

In [3]:
# choose the training and test datasets
train_data = MNIST(root='data', train=True,
                                   download=True, transform=transform)
test_data = MNIST(root='data', train=False,
                                  download=True, transform=transform)

In [4]:
#size of train and test data
len(train_data) , len(test_data)

(60000, 10000)

## DataLoaders:

Few Terminology to understand first:

When the no of tranining examples are very big , then we don't pass the dataset for traninig , we create batches first and then do one forward pass + one backward pass.

- <b>epoch</b>: one forward pass + one backward pass of all traning example
- <b>batch size</b>: number of traning examples in one epoch.
- <b>Iterations</b>: if you have 1000 no of traning examples(or rows) and your batch size is 100 then you will need 10 iterations to complete one epoch.

Pytorch's DataLoader is responsible for managing & creating batches. DataLoader makes it easier to iterate over batches. 

In [5]:
# how many samples per batch to load
batch_size = 50
# percentage of training set to use as validation
valid_size = 0.2

In [6]:
import numpy as np
from torch.utils.data.sampler import SubsetRandomSampler
from torch.utils.data import DataLoader

# Here we will use a subset of traning set for validation
# obtain training indices that will be used for validation
num_train = len(train_data)
ix = list(range(num_train))
np.random.shuffle(ix)
split = int(np.floor(valid_size * num_train))
train_idx, valid_idx = ix[split:], ix[:split]

# create sampler objects using SubsetRandomSampler
train_sampler = SubsetRandomSampler(train_idx)
valid_sampler = SubsetRandomSampler(valid_idx)

# data loaders preparation
train_loader = DataLoader(train_data, batch_size=batch_size,
    sampler=train_sampler)
valid_loader = DataLoader(train_data, batch_size=batch_size, 
    sampler=valid_sampler)
test_loader = DataLoader(test_data, batch_size=batch_size)

DataLoader creates iterables for all the batches and we will use this inside the traning loop.

In [7]:
# Let's check the shape of the input/target data
for data, target in train_loader:
    print(data.shape)
    print(target.shape)
    break

torch.Size([50, 1, 28, 28])
torch.Size([50])


## Dropout:
A simple but effective regularization technique where randomly selected neurons are ignored during training. They are “dropped-out” randomly. This means that their contribution to the activation of downstream neurons is temporally removed on the forward pass and any weight updates are not applied to the neuron on the backward pass. Dropout is again used to reduce the '<b>overfitting problem</b>'. Drop is more useful when we have deep network. We give a dropout probablity(to switch off the weights randomly) in the configuration. 

Dropout is generally used during the training phase only and we switch off dropout during test/validation phase.

In [8]:
from torch import nn, optim
import torch.nn.functional as F

class Model(nn.Module):
    def __init__(self):
        super().__init__()
        self.fc1 = nn.Linear(784, 512)
        self.fc2 = nn.Linear(512, 256)
        self.fc3 = nn.Linear(256, 128)
        self.fc4 = nn.Linear(128, 56)
        self.fc5 = nn.Linear(56, 10)
        
        #drop out with 0.3 probability
        self.dropout = nn.Dropout(p=0.3)
        
    def forward(self, x):
        # input tensor is flattened 
        x = x.view(x.shape[0], -1)
        
        # applied dropout layer
        x = self.dropout(F.relu(self.fc1(x)))
        x = self.dropout(F.relu(self.fc2(x)))
        x = self.dropout(F.relu(self.fc3(x)))
        x = self.dropout(F.relu(self.fc4(x)))
        
        #no dropout at the output layer
        x = self.fc5(x)
        
        return x

In [9]:
model = Model()

In [10]:
criterion = nn.CrossEntropyLoss()

from torch import optim

optimizer = optim.SGD(model.parameters(), lr=0.01)

<b>Traning the Model:<b>

In [11]:
for epoch in range(1, 16): ## run the model for 15 epochs
    train_loss, valid_loss = [], []
    ## training part 
    model.train()
    for data, target in train_loader:
        optimizer.zero_grad()
        ## 1. forward propagation
        output = model(data)
        
        ## 2. loss calculation
        loss = criterion(output, target)
        
        ## 3. backward propagation
        loss.backward()
        
        ## 4. weight optimization
        optimizer.step()
        
        train_loss.append(loss.item())
        
    ## evaluation part
    with torch.no_grad():
        model.eval()
        for data, target in valid_loader:
            output = model(data)
            loss = criterion(output, target)
            valid_loss.append(loss.item())
    print ("Epoch:", epoch, "Training Loss: ", np.mean(train_loss), "Valid Loss: ", np.mean(valid_loss))

Epoch: 1 Training Loss:  2.1871625877916814 Valid Loss:  1.667852665980657
Epoch: 2 Training Loss:  1.1969949032490452 Valid Loss:  0.6726877966274818
Epoch: 3 Training Loss:  0.7181357967667281 Valid Loss:  0.4598126212755839
Epoch: 4 Training Loss:  0.5422619129065425 Valid Loss:  0.34589868066832424
Epoch: 5 Training Loss:  0.4389304482378066 Valid Loss:  0.3017367086062829
Epoch: 6 Training Loss:  0.3676402278554936 Valid Loss:  0.24089869467231134
Epoch: 7 Training Loss:  0.3171034183508406 Valid Loss:  0.20431646563423175
Epoch: 8 Training Loss:  0.27840177100539826 Valid Loss:  0.18733948226242017
Epoch: 9 Training Loss:  0.2446655814012047 Valid Loss:  0.17578503578746071
Epoch: 10 Training Loss:  0.22932315228584532 Valid Loss:  0.17078397002769635
Epoch: 11 Training Loss:  0.20719245270205042 Valid Loss:  0.15078213748444494
Epoch: 12 Training Loss:  0.19494042920608384 Valid Loss:  0.14527840617811308
Epoch: 13 Training Loss:  0.17940984132001175 Valid Loss:  0.1354785961370

   Few Steps to note:

- <b>torch.no_grad()</b>: impacts the autograd engine and deactivate it. It will reduce memory usage and speed up computations but you won’t be able to backprop. We generally don't want backpropagation in validation and test phase.
- <b>model.eval()</b>: This will switch off the dropouts for validation phase. 
- <b>model.train()</b>: Will bring the model again into traning phase by switching on the dropouts.

If the loss of traning set and validation set are very close that means there is lesser overfitting.

## Test the network
See the performence on the test dataset and also check the classwise accuracy.

In [12]:
# initialize lists to monitor test loss and accuracy
test_loss = 0.0
class_correct = list(0. for i in range(10))
class_total = list(0. for i in range(10))

model.eval() # prep model for evaluation

for data, target in test_loader:
    # forward pass: compute predicted outputs by passing inputs to the model
    output = model(data)
    # calculate the loss
    loss = criterion(output, target)
    # update test loss 
    test_loss += loss.item()*data.size(0)
    #test_loss.append(loss.item())
    # convert output probabilities to predicted class
    _, pred = torch.max(output, 1)
    # compare predictions to true label
    correct = np.squeeze(pred.eq(target.data.view_as(pred)))
    # calculate test accuracy for each object class
    for i in range(batch_size):
        label = target.data[i]
        class_correct[label] += correct[i].item()
        class_total[label] += 1

# calculate and print avg test loss
test_loss = test_loss/len(test_loader.dataset)
print('Test Loss: {:.6f}\n'.format(test_loss))

for i in range(10):
    if class_total[i] > 0:
        print('Test Accuracy of %5s: %2d%% (%2d/%2d)' % (
            str(i), 100 * class_correct[i] / class_total[i],
            np.sum(class_correct[i]), np.sum(class_total[i])))
    else:
        print('Test Accuracy of %5s: N/A (no training examples)' % (classes[i]))

print('\nTest Accuracy (Overall): %2d%% (%2d/%2d)' % (
    100. * np.sum(class_correct) / np.sum(class_total),
    np.sum(class_correct), np.sum(class_total)))

# Code Credit : Udacity

Test Loss: 0.117385

Test Accuracy of     0: 98% (966/980)
Test Accuracy of     1: 99% (1124/1135)
Test Accuracy of     2: 96% (999/1032)
Test Accuracy of     3: 95% (963/1010)
Test Accuracy of     4: 95% (933/982)
Test Accuracy of     5: 94% (846/892)
Test Accuracy of     6: 97% (938/958)
Test Accuracy of     7: 95% (981/1028)
Test Accuracy of     8: 95% (929/974)
Test Accuracy of     9: 95% (963/1009)

Test Accuracy (Overall): 96% (9642/10000)
