In the previous section one sample one how to train a neaural network from scratch. In this notebook we'll use nn.Module and nn.Parameter along with autograd, optim utility packages provided within PyTorch. We subclass nn.Module (which itself is a class and able to keep track of state). nn.Module has a number of attributes and methods (such as .parameters() and .zero_grad()) which we will be using to perform different operations required. 

 In this tutorial ,  we will create a Deep Learning model for building a handwritten digit classifier. We will make use of the MNIST dataset included in the torchvision package.
 
 Mandatory first step is to do the basic data pre-processing steps , using the a utility called transforms which comes from 
 torchvision package we will do two below mentioned basic data preprocessing operations (this will be explained more detail in case of CNN).
 
- Transform the raw dataset into tensors.
- Normalize the dataset.

We will also import the dataset from torch vision package.

In [1]:
import torch
from torchvision.datasets import MNIST

In [2]:
from torchvision import transforms

transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
    ])

In [3]:
# choose the training and test datasets
train_data = MNIST(root='data', train=True,
                                   download=True, transform=transform)
test_data = MNIST(root='data', train=False,
                                  download=True, transform=transform)

In [4]:
#size of train and test data
len(train_data) , len(test_data)

(60000, 10000)

In [5]:
# number of subprocesses to use for data loading
num_workers = 0
# how many samples per batch to load
batch_size = 50
# percentage of training set to use as validation
valid_size = 0.2

In [6]:
import numpy as np
from torch.utils.data.sampler import SubsetRandomSampler
from torch.utils.data import DataLoader

# obtain training indices that will be used for validation
num_train = len(train_data)
ix = list(range(num_train))
np.random.shuffle(ix)
split = int(np.floor(valid_size * num_train))
train_idx, valid_idx = ix[split:], ix[:split]

# create sampler objects using SubsetRandomSampler
train_sampler = SubsetRandomSampler(train_idx)
valid_sampler = SubsetRandomSampler(valid_idx)

# data loaders preparation
train_loader = DataLoader(train_data, batch_size=batch_size,
    sampler=train_sampler, num_workers=num_workers)
valid_loader = DataLoader(train_data, batch_size=batch_size, 
    sampler=valid_sampler, num_workers=num_workers)
test_loader = DataLoader(test_data, batch_size=batch_size, 
    num_workers=num_workers)

In [7]:
# Let's check the shape of the input/target data
for data, target in train_loader:
    print(data.shape)
    print(target.shape)
    break

torch.Size([50, 1, 28, 28])
torch.Size([50])


As we can see that input data is not flattened to pass through the linear layers , so we will need to reshape the batch in a format of (batch size , no of features(1*28*28 = 784)).Shape of target data is an expected.

### Architecture

We create a new class (which inherits the properties from the base class from nn package called Module) to define the archietecture of the Neaural Network. 

- Layer defination should be inside the constructor of the class.
- Forward propagation step should be included inside forward method.

Activations(Relu,Sigmoid,Tanh etc) and loss functions(cross entropy,softmax etc) comes from torch.nn.functional. This module contains all the functions in the torch.nn module.

Syntax of nn.Linear() is (input size, output size)

This NN architecture below represents the 784 nodes (28*28 pixels) in the input layer, 256 in the hidden layer, and 10 in the output layer(0-9 numbers). Inside the forward function, we will use the relu activation function in the hidden layer which present under torch.nn.functional module.

In [8]:
from torch import nn, optim
import torch.nn.functional as F

class Model(nn.Module):
    def __init__(self):
        super().__init__()
        self.fc1 = nn.Linear(784, 256)
        self.fc2 = nn.Linear(256, 10)
        
    def forward(self, x):
        # input tensor is flattened 
        x = x.view(x.shape[0], -1)
        
        x = F.relu(self.fc1(x))
        x = self.fc2(x)
        
        return x

In [9]:
model = Model()

Now we will define the loss and optimizer functions

- <u><b>Loss Function</b></u>:

Here we have used the CrossEntropyLoss() function 
Generally loss assigned to `criterion`. for MNIST classification , we generally use softmax function to predict class probabilities. With a softmax output, you want to use cross-entropy as the loss.

Few things to keep in mind while using the CrossEntropyLoss() in Pytorch 

    - CrossEntropyLoss criterion combines nn.LogSoftmax() and nn.NLLLoss() in one single class.
    - The input is expected to contain scores for each class.

This means we need to pass in the raw output of our network into the loss, not the output of the softmax function.That's why there is no activation after self.fc2(x).

- <u><b>Optimizer</b></u>:

Pytorch also has a package with various optimization algorithms, torch.optim. We can use the step method from our optimizer to take a forward step, instead of manually updating each parameter.Below are the few availavle optimizer in pytorch -

    - optim.Adam
    - optim.RMSprop
    - optim.SGD
    - optim.Adagrad
 In the optimizer we need to pass model parameters(can be accesed using model.parameters()) for the back propagation operation.

In [10]:
criterion = nn.CrossEntropyLoss()

from torch import optim

optimizer = optim.SGD(model.parameters(), lr=0.01)

<b>Traning the Model:<b>

Few Steps to note
    - torch.no_grad() impacts the autograd engine and deactivate it. It will reduce memory usage and speed up computations but you won’t be able to backprop (which you don’t want in an eval script)
    
    Few Steps to note:

- <b>optimizer.zero_grad()</b>: - will zero out the gradients from previous traning step , in this way gradients won't be   accumulated. This should be done before calculating the gradients at each epoch.
- <b>criterion(output, target)</b>: - we feed in the model predicted values along with actual values to calculate the loss.
- <b>optimizer.step()</b>: Once we call loss.backward() , gradients will be calculated and we will use this gradients to update the weights in this step using the learning rate defined in optim.SGD(model.parameters(), lr=0.01).
- <b>torch.no_grad()</b>: impacts the autograd engine and deactivate it. It will reduce memory usage and speed up computations but you won’t be able to backprop. We generally don't want backpropagation in validation and test phase.

In [11]:
for epoch in range(1, 11): ## run the model for 10 epochs
    train_loss, valid_loss = [], []
    ## training part 
    model.train()
    for data, target in train_loader:
        optimizer.zero_grad()
        ## 1. forward propagation
        output = model(data)
        
        ## 2. loss calculation
        loss = criterion(output, target)
        
        ## 3. backward propagation
        loss.backward()
        
        ## 4. weight optimization
        optimizer.step()
        
        train_loss.append(loss.item())
        
    ## evaluation part
    with torch.no_grad():
        #model.eval()
        for data, target in valid_loader:
            output = model(data)
            loss = criterion(output, target)
            valid_loss.append(loss.item())
    print ("Epoch:", epoch, "Training Loss: ", np.mean(train_loss), "Valid Loss: ", np.mean(valid_loss))

Epoch: 1 Training Loss:  0.7252699347989012 Valid Loss:  0.40598030549784503
Epoch: 2 Training Loss:  0.36131680904266733 Valid Loss:  0.33169983032469946
Epoch: 3 Training Loss:  0.3144735790944348 Valid Loss:  0.29959434183935324
Epoch: 4 Training Loss:  0.2850455914158374 Valid Loss:  0.2795327842701226
Epoch: 5 Training Loss:  0.2613767802792912 Valid Loss:  0.262356000362585
Epoch: 6 Training Loss:  0.2414350723537306 Valid Loss:  0.24396263679179053
Epoch: 7 Training Loss:  0.2229606965285105 Valid Loss:  0.22929387559803824
Epoch: 8 Training Loss:  0.20704525047913194 Valid Loss:  0.21653890158049763
Epoch: 9 Training Loss:  0.1923978082719259 Valid Loss:  0.20423881189587215
Epoch: 10 Training Loss:  0.18020653621448826 Valid Loss:  0.19436424857315918


In [12]:
#Test the train network

# initialize lists to monitor test loss and accuracy
test_loss = 0.0
class_correct = list(0. for i in range(10))
class_total = list(0. for i in range(10))

model.eval() # prep model for evaluation

for data, target in test_loader:
    # forward pass: compute predicted outputs by passing inputs to the model
    output = model(data)
    # calculate the loss
    loss = criterion(output, target)
    # update test loss 
    test_loss += loss.item()*data.size(0)
    #test_loss.append(loss.item())
    # convert output probabilities to predicted class
    _, pred = torch.max(output, 1)
    # compare predictions to true label
    correct = np.squeeze(pred.eq(target.data.view_as(pred)))
    # calculate test accuracy for each object class
    for i in range(batch_size):
        label = target.data[i]
        class_correct[label] += correct[i].item()
        class_total[label] += 1

# calculate and print avg test loss
test_loss = test_loss/len(test_loader.dataset)
print('Test Loss: {:.6f}\n'.format(test_loss))

for i in range(10):
    if class_total[i] > 0:
        print('Test Accuracy of %5s: %2d%% (%2d/%2d)' % (
            str(i), 100 * class_correct[i] / class_total[i],
            np.sum(class_correct[i]), np.sum(class_total[i])))
    else:
        print('Test Accuracy of %5s: N/A (no training examples)' % (classes[i]))

print('\nTest Accuracy (Overall): %2d%% (%2d/%2d)' % (
    100. * np.sum(class_correct) / np.sum(class_total),
    np.sum(class_correct), np.sum(class_total)))

# Code Credit : Udacity

Test Loss: 0.180090

Test Accuracy of     0: 98% (967/980)
Test Accuracy of     1: 98% (1117/1135)
Test Accuracy of     2: 93% (965/1032)
Test Accuracy of     3: 95% (969/1010)
Test Accuracy of     4: 93% (921/982)
Test Accuracy of     5: 91% (817/892)
Test Accuracy of     6: 96% (927/958)
Test Accuracy of     7: 93% (964/1028)
Test Accuracy of     8: 91% (894/974)
Test Accuracy of     9: 93% (946/1009)

Test Accuracy (Overall): 94% (9487/10000)
