# Tutorial 3 - Model

In this tutorial a very simple PyTorch CNN model is prepared that should help classify information in the images we have reviewed we are working with in the `Tutorial 2`. Specific details will not be over explained in what is done, as this will turn into a seperate Deep Learning type tutorial, but attempt at commenting the most crucial parts of the model and what is done are done.

If you have zero understanding of working with PyTorch, then my recommendations is just to stick arround to understand how the deployment concept will be delivered. While if you are interested in learning more, I do heavily recommend visiting the following links (I manged to learn PyTorch with them quite quickly and do my MSc):

* https://github.com/yunjey/pytorch-tutorial/tree/master/tutorials
* https://deeplizard.com/learn/playlist/PLZbbT5o_s2xrfNyHZsM6ufI0iZENK9xgG

In [1]:
# Let us load the main packages and their components
import torch                                 # For PyTorch
import torch.nn as nn                        # Neural Network components of PyTorch
import torch.nn.functional as F              # PyTorches functionability of various Deep Learning components
import torch.optim as optim                  # Optimization functionability of PyTorch during the training process
from torchvision import datasets, transforms # Datasets and gradual loading class

In [16]:
# First I will switch the processing power of PyTorch from cpu to gpu
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
device = torch.device('cpu') # But we will chose cpu eitherway for deployment reasons
# PyTorch models need to be saved and ran on the same systems
# If you do not have cuda or a GPU, you will still work with CPU further, it should not be an issue
print("The following device is used: {}".format(device))

The following device is used: cpu


In [17]:
# Now we need to prepare the training and testing datasets. As this is a new project, they have to be re-downloaded.
# Alternativelly, you can locate where your earlier downloaded training data was stored and re-use it.

# Loading the training set
mnist_training = datasets.MNIST(root = "./data",                    # Location where the data will be saved
                                train = True,                       # It is a training set
                                transform = transforms.ToTensor(),  # We will transform the data to actual torch Tensors
                                download = True)                    # We want to download it again


# Loading the testing set
mnist_testing = datasets.MNIST(root = "./data",                     # The data is not redownloaded but taken from existing
                               train = False,                       # This is not a training data, so false
                               transform = transforms.ToTensor())   # Data is converted to torch Tensors

In [18]:
# Now just take a quick glance what changed about the data, as it was transformed into tensors
print(mnist_training.data[0])
print(mnist_training.targets[0])

tensor([[  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
           0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0],
        [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
           0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0],
        [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
           0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0],
        [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
           0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0],
        [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
           0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0],
        [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   3,  18,
          18,  18, 126, 136, 175,  26, 166, 255, 247, 127,   0,   0,   0,   0],
        [  0,   0,   0,   0,   0,   0,   0,   

As you can see - not much changed. But this format will make our lifes easier in working with PyTorch. In reality it would also be great to re-scale the pixel values into the range from 0 to 1, with all the values being in the middle, however, for such a simple example this step is omitted.

In [19]:
# Sizes of our samples:
print("The training set is {} datapoints long\nThe testing set is {} datapoints long".format(len(mnist_training), len(mnist_testing)))

The training set is 60000 datapoints long
The testing set is 10000 datapoints long


Now prepare the dataloader that will be feeding infromation to the model. This should save on the processing memory.

In [20]:
# Chose a batch size the files will be loaded to the network to be worked with
BATCH = 16

# Loader for the training data
train_loader = torch.utils.data.DataLoader(dataset = mnist_training,  #Specify our training data
                                           batch_size = BATCH,        #Specify the Batch size
                                           shuffle = True)            #Shufle data for more efficient training

# Loader for the testing data
test_loader = torch.utils.data.DataLoader(dataset = mnist_testing,    #Specify our testing data
                                          batch_size = BATCH,         #Specify the Batch size
                                          shuffle = False)            #No point in allowing for constant shuffling

Now that you have everything prepared for working with the data feeding process - we can create a model that will be working with predicting the data. It willl be a simple 3 layer convolutional network.

In [21]:
# Create a network
class Network(nn.Module):
    def __init__(self):                                             # Make few inits in the beginning
        super(Network, self).__init__()                             # Extend the Network class
        self.conv1 = nn.Sequential(
            nn.Conv2d(in_channels = 1,
                      out_channels = 16,
                      kernel_size = 5,
                      stride = 1,
                      padding = 2),                                     # First convolutional layer
            nn.ReLU(),                                                  # ReLU activation applied
            nn.MaxPool2d(kernel_size=2, 
                         stride=2)                                      # Pooling applied
            )
        
        self.conv2 = nn.Sequential(
            nn.Conv2d(in_channels = 16,
                      out_channels = 32,
                      kernel_size = 5,
                      stride = 1,
                      padding = 2),                                     # Second convolutional layer
            nn.ReLU(),                                                  # ReLU activation applied
            nn.MaxPool2d(kernel_size = 2,
                         stride = 2)                                    # Pooling applied 
            )
        
        self.fc1 = nn.Linear(in_features = 7*7*32,                  # For more information on calculations: https://www.deeplearningwizard.com/deep_learning/practical_pytorch/pytorch_convolutional_neuralnetwork/
                             out_features = 10)                     # A fully connected output layer (Dense layer)
    
    # A method for putting information through the network
    # t - is a tensor of information essentially
    def forward(self, t):
        out = self.conv1(t)
        out = self.conv2(out)
        out = out.reshape(out.size(0), -1) # Small reshaping of information is required before going to the Dense Layer
        out = self.fc1(out)
        return out

Constructor for the network has been made. Now it is time to initiate a network-object, but also few parameters will have to be passed to it. Such as an optimizer for training. A learning rate and similare. Let's do all of that now.

In [22]:
# Set some hyperparameters
EPOCHS = 10                       # Amount of Epochs to traing the network for            
LEARNING_RATE = 0.001             # The learning rate of the optimizer network will use

In [23]:
# Now create a network object
model = Network().to(device)
# The model is stored on a device, so the gpu could work with it

In [24]:
# Check if it saved
model

Network(
  (conv1): Sequential(
    (0): Conv2d(1, 16, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2))
    (1): ReLU()
    (2): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  )
  (conv2): Sequential(
    (0): Conv2d(16, 32, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2))
    (1): ReLU()
    (2): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  )
  (fc1): Linear(in_features=1568, out_features=10, bias=True)
)

In [25]:
# Now set few conditions on how the network will process data through it
CRITERION = nn.CrossEntropyLoss()                 # Criteria for classification the network will work with
OPTIMIZER = torch.optim.Adam(model.parameters(),  # Optimizer the network will use
                             lr = LEARNING_RATE)

In [26]:
# Now prepare the training loop for the network. They are often similar in how they are structured

# Put the network first in the training mode
model.train()

# Set conditions for the network to train
# For how many steps will the network train
total_step = len(train_loader)

# Initiate training per epochs
for epoch in range(EPOCHS):
    epoch_loss = 0
    for i, (images, labels) in enumerate(train_loader): # images - data // labels - targets (classes)
        images = images.to(device) #Put the values on the device for PyTorch to process (here it is a GPU)
        labels = labels.to(device) #Put the values on the device for PyTorch to process (here it is a GPU)
        
        # Forward pass
        outputs = model(images)            # Let the network pass the information through it
        loss = CRITERION(outputs, labels)  # Calculate the loss of predictions the network makes
        epoch_loss += loss                 # Calculate epoch loss
        
        # Backward and optimize
        OPTIMIZER.zero_grad()
        loss.backward()         # Backpropogate the information
        OPTIMIZER.step()      
        
        # Give a bit of a visual que on what is going on with the network to the one who trains it
        if (i + 1) % 500 == 0:
            print('Epoch [{}/{}], Step [{}/{}], Loss: {:.4f}'.format(epoch + 1, EPOCHS, i + 1, total_step, loss.item()))
        if (i + 1 == total_step):
            print("Total Loss per Epoch: {}\n".format(round(epoch_loss.item(), 2)))

Epoch [1/10], Step [500/3750], Loss: 0.1854
Epoch [1/10], Step [1000/3750], Loss: 0.0292
Epoch [1/10], Step [1500/3750], Loss: 0.2691
Epoch [1/10], Step [2000/3750], Loss: 0.0983
Epoch [1/10], Step [2500/3750], Loss: 0.0022
Epoch [1/10], Step [3000/3750], Loss: 0.1077
Epoch [1/10], Step [3500/3750], Loss: 0.1953
Total Loss per Epoch: 465.56

Epoch [2/10], Step [500/3750], Loss: 0.0047
Epoch [2/10], Step [1000/3750], Loss: 0.0017
Epoch [2/10], Step [1500/3750], Loss: 0.0619
Epoch [2/10], Step [2000/3750], Loss: 0.0975
Epoch [2/10], Step [2500/3750], Loss: 0.0103
Epoch [2/10], Step [3000/3750], Loss: 0.0275
Epoch [2/10], Step [3500/3750], Loss: 0.0001
Total Loss per Epoch: 173.17

Epoch [3/10], Step [500/3750], Loss: 0.0048
Epoch [3/10], Step [1000/3750], Loss: 0.0027
Epoch [3/10], Step [1500/3750], Loss: 0.0093
Epoch [3/10], Step [2000/3750], Loss: 0.0595
Epoch [3/10], Step [2500/3750], Loss: 0.0088
Epoch [3/10], Step [3000/3750], Loss: 0.0248
Epoch [3/10], Step [3500/3750], Loss: 0.000

As can be seen - loss has been dereasing over time. Now we can test the accuracy of our predictions.

In [27]:
# Put the model into the evaluation mode
model.eval()

with torch.no_grad(): # Turn off the gradient accumulator
    correct = 0       # Correct predictions
    total = 0         # Total predictions
    
    # Now go through the testing set
    for images, labels in test_loader:
        images = images.to(device)      # images - data // labels - targets (classes)
        labels = labels.to(device)      # Put everything to a device
        outputs = model(images)
        _, predicted = torch.max(outputs.data, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

    print('Test Accuracy of the model on the 10000 test images: {} %'.format(100 * correct / total))

Test Accuracy of the model on the 10000 test images: 99.05 %


We have created a model with a 99.05% accuracy rate. Seems quite good. Now time to save it and move on to making it deployable to the real world.

In [28]:
# Save the model
torch.save(model.state_dict(), 'model.ckpt')

In [29]:
# Essentially what is being saved is the the set of weights a model uses between neurons
model.state_dict()

# Next time we will reload them into a new model

OrderedDict([('conv1.0.weight',
              tensor([[[[ 1.1127e-01, -2.8221e-01, -4.1433e-01, -3.0182e-01,  1.2833e-01],
                        [-6.2853e-01, -4.6515e-01, -1.8315e-01, -2.7313e-02,  3.4319e-01],
                        [-3.3926e-01, -2.1037e-01, -1.3936e-01,  1.7931e-01,  1.0549e-01],
                        [-4.2705e-01, -1.3187e-01,  1.5330e-02,  3.8126e-01,  1.7717e-01],
                        [ 1.9778e-01,  2.3199e-02,  2.3313e-01,  1.2574e-01,  2.0287e-02]]],
              
              
                      [[[ 2.5177e-01,  3.9931e-02,  1.7255e-01, -2.3502e-01, -9.1032e-02],
                        [-8.6563e-02,  8.0684e-02,  1.9807e-01, -7.1137e-02, -7.4522e-01],
                        [ 6.0676e-02, -4.1934e-01,  1.0396e-01,  1.2970e-01, -2.4970e-01],
                        [-1.4771e-01, -2.8255e-01, -9.8548e-02,  2.6374e-01,  4.1852e-01],
                        [ 2.6158e-01,  3.3006e-01,  1.0052e-02, -1.5940e-01,  9.1756e-02]]],
              
         