 In this tutorial ,  we will create a Deep Learning model for building a handwritten digit classifier. We will make use of the MNIST dataset included in the torchvision package.
 
 Mandatory first step is to do the basic data pre-processing steps , using the a utility called transforms which comes from 
 torchvision package we will do two below mentioned basic data preprocessing operations (this will be explained more detail in case of CNN).
 
- Transform the raw dataset into tensors.
- Normalize the dataset.

We will also import the dataset from torch vision package.

In [25]:
import torch
from torch import nn
from torchvision.datasets import MNIST

In [26]:
from torchvision import transforms

transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.1307,), (0.3081,)) # The values 0.1307 and 0.3081 used for the Normalize() transformation are the global mean and standard deviation of the MNIST dataset.
    ])

For the sake of simplicity we are only using the training dataset in this tutorial

Test and Validation dataset have been used in the previous tutorial.

In [27]:
# choose the training dataset
train_data = MNIST(root='data', train=True,
                                   download=True, transform=transform)

In [28]:
#size of train dataset
len(train_data)

60000

In [29]:
# how many samples per batch to load
batch_size = 50
# percentage of training set to use as validation
valid_size = 0.2

In [30]:
import numpy as np
from torch.utils.data import DataLoader

# data loader preparation
train_loader = DataLoader(train_data, batch_size=batch_size)

DataLoader creates iterables for all the batches and we will use this inside the traning loop.

In [31]:
# Let's check the shape of the input/target data
for data, target in train_loader:
    print(data.shape)
    print(target.shape)
    break

torch.Size([50, 1, 28, 28])
torch.Size([50])


## nn.Sequential:

Below we have defined the Deep Neural Network archietecture with the help of nn.Sequential.

In [32]:
model = nn.Sequential(nn.Linear(784, 512),
                      nn.ReLU(),
                      nn.Dropout(0.3),
                      nn.Linear(512, 128),
                      nn.ReLU(),
                      nn.Dropout(0.5),
                      nn.Linear(128, 10))

In [33]:
# Loss Function and Optimizer

criterion = nn.CrossEntropyLoss()

from torch import optim

optimizer = optim.SGD(model.parameters(), lr=0.01)

## GPU Support:
First check that your GPU is working in Pytorch:

In [34]:
print(torch.cuda.is_available())

False


In [35]:
# Creating a device object 
device = torch.device(
    "cuda") if torch.cuda.is_available() else torch.device("cpu")
print(device)

cpu


In [36]:
# Taking the model to avialable 'device'
model.to(device);

<b>Traning the Model:<b>

In [37]:
%%time
for epoch in range(1, 16): ## run the model for 15 epochs
    train_loss = []
    ## training part 
    model.train()
    for data, target in train_loader:
        
        # Move input and label tensors to the avialable device
        data, target = data.to(device), target.to(device)
        
        #Reshaping the input data before sending into the model
        data = data.view(data.shape[0], -1)
        
        optimizer.zero_grad()
        
        ## 1. forward propagation
        output = model(data)
        
        ## 2. loss calculation
        loss = criterion(output, target)
        
        ## 3. backward propagation
        loss.backward()
        
        ## 4. weight optimization
        optimizer.step()
        
        train_loss.append(loss.item())
        
    print ("Epoch:", epoch, "Training Loss: ", np.mean(train_loss))

Epoch: 1 Training Loss:  1.0559210053210457
Epoch: 2 Training Loss:  0.4704279328820606
Epoch: 3 Training Loss:  0.3786520552448928
Epoch: 4 Training Loss:  0.3260291467048228
Epoch: 5 Training Loss:  0.28475500740421317
Epoch: 6 Training Loss:  0.25306461652430395
Epoch: 7 Training Loss:  0.23030062804309032
Epoch: 8 Training Loss:  0.21214023194275797
Epoch: 9 Training Loss:  0.19736038566101344
Epoch: 10 Training Loss:  0.1836888543376699
Epoch: 11 Training Loss:  0.17243863903917372
Epoch: 12 Training Loss:  0.1637192848386864
Epoch: 13 Training Loss:  0.15155088650450732
Epoch: 14 Training Loss:  0.14521185631630942
Epoch: 15 Training Loss:  0.1373947342322208
Wall time: 5min 56s


## Save & Load The Model:
As now the model has been trained , we will save the model and load again for future use.

In [40]:
print("printing our model: \n\n", model)

printing our model: 

 Sequential(
  (0): Linear(in_features=784, out_features=512, bias=True)
  (1): ReLU()
  (2): Dropout(p=0.3)
  (3): Linear(in_features=512, out_features=128, bias=True)
  (4): ReLU()
  (5): Dropout(p=0.5)
  (6): Linear(in_features=128, out_features=10, bias=True)
)


To see the weights and biases of the model 

The parameters for PyTorch models are stored in a model's state_dict. state_dict containts the weights & biases of each of the layer , which can be accesed by <b>state_dict().keys()</b>. 

Below we can see that , every layer's weight and biases have been printed out -

In [41]:
print("Models layer keys: \n\n", model.state_dict().keys())

Models layer keys: 

 odict_keys(['0.weight', '0.bias', '3.weight', '3.bias', '6.weight', '6.bias'])


####  Weights and Bias Values

In [42]:
for params, values in model.state_dict().items(): 
    print(params, ":", values)
    break

0.weight : tensor([[ 0.0233,  0.0113,  0.0321,  ...,  0.0232,  0.0103,  0.0278],
        [-0.0340,  0.0217, -0.0018,  ...,  0.0128,  0.0121,  0.0353],
        [-0.0092,  0.0054,  0.0285,  ...,  0.0012,  0.0117, -0.0085],
        ...,
        [-0.0331,  0.0108,  0.0326,  ..., -0.0342, -0.0133,  0.0317],
        [ 0.0123, -0.0274,  0.0345,  ..., -0.0264,  0.0374, -0.0304],
        [ 0.0237, -0.0239, -0.0043,  ...,  0.0268,  0.0203, -0.0229]])


Model's statedict can be saved using the torch.save which also accepts the models name as parameter as - <b>model.pth</b>

In [43]:
torch.save(model.state_dict(), 'model.pth')

Saved model can also be loaded using the <b>torch.load()</b> using the saved model's path

In [44]:
state_dict = torch.load('model.pth')
print(state_dict.keys())

odict_keys(['0.weight', '0.bias', '3.weight', '3.bias', '6.weight', '6.bias'])


To load the state dict in to the new model, you do <b>model.load_state_dict(state_dict)</b>.

In [45]:
model.load_state_dict(state_dict)

<b>Please Note:</b> Loading the state dict will work only if the new model architecture is exactly the same as the saved's model's architecture