# PyTorch 101: Building Neural Networks 

In this tutorial, we are going to build a Resnet based Neural network to classify the CIFAR 10 Dataset. Before, we begin, let me say that the purpose of this tutorial is not to achieve the best possible accuracy on the task, but to show you how to use PyTorch.

In [0]:
import torch
import torch.nn as nn
import torch.utils.data
import torch.optim as optim
import numpy as np
import pickle
import os
from PIL import Image
import random
import time
import torchvision

cuda_available = torch.cuda.is_available()

### Building the Network
While PyTorch provided many layers out of the box with it's torch.nn module, we will have to implement the residual block ourselves. We first begin by defining the the resnet block.  


## A Simple Neural Network
In this tutorial, we will be implementing a very simple neural network.   

### Building the Network
The `torch.nn.Module` is the cornerstone of designing neural networks in PyTorch. This class can be used to implement a layer like a fully connected layer, a convolutional layer, a pooling layer, an activation function, and also an entire neural network by instantiating a `torch.nn.Module` object. (From now on, I'll refer to it as merely nn.module) 

Multiple `nn.Module` objects can be strung together to form a bigger nn.Module object, which is how we can implement a neural network using many layers. In fact, nn.Module can be used to represent an arbitrary function f in PyTorch.

The nn.Module class has two methods that you have to override.
1. `__init__` function. This function is invoked when you create an instance of the nn.Module. Here you will define the various parameters of a layer such as filters, kernel size for a convolutional layer, dropout probability for the dropout layer. 

2. `forward` function. This is where you define how your output is computed. This function doesn't need to be explicitly called, and can be run by just calling the nn.Module instance like a function with the input as it's argument. 

In [20]:
class MyLayer(nn.Module):
  def __init__(self, param):
    super().__init__()
    self.param = param 
  
  def forward(self, x):
    return x * self.param
  
myLayerObject = MyLayer(5)
output = myLayerObject(torch.Tensor([5, 4, 3]) )    #calling forward inexplicitly 
print(output)

tensor([25., 20., 15.])


Another widely used and important class is the `nn.Sequential` class.  When initiating this class we can pass a list of `nn.Module` objects in a particular sequence. The object returned by `nn.Sequential` is itself a `nn.Module` object. When this object is run with an input, it sequentially runs the input through all the nn.Module object we passed to it, in the very same order as we passed them. 

In [0]:
combinedNetwork = nn.Sequential(MyLayer(5), MyLayer(10))

output = combinedNetwork([3,4])

#equivalent to..
# out = MyLayer(5)([3,4])
# out = MyLayer(10)(out)

Let us now start implementing our classification network. We will make use of convolutional and pooling layers, as well as a custom implemented residual block.

While PyTorch provided many layers out of the box with it's `torch.nn` module, we will have to implement the residual block ourselves. Before implementing the neural network, we implement the ResNet Block.

In [0]:
class ResidualBlock(nn.Module):
    expansion = 1

    def __init__(self, in_channels, out_channels, stride=1):
        super(ResidualBlock, self).__init__()
        
        # Conv Layer 1
        self.conv1 = nn.Conv2d(
            in_channels=in_channels, out_channels=out_channels,
            kernel_size=(3, 3), stride=stride, padding=1, bias=False
        )
        self.bn1 = nn.BatchNorm2d(out_channels)
        
        # Conv Layer 2
        self.conv2 = nn.Conv2d(
            in_channels=out_channels, out_channels=out_channels,
            kernel_size=(3, 3), stride=1, padding=1, bias=False
        )
        self.bn2 = nn.BatchNorm2d(out_channels)
    
        # Shortcut connection to downsample residual
        # In case the output dimensions of the residual block is not the same 
        # as it's input, have a convolutional layer downsample the layer 
        # being bought forward by approporate striding and filters
        self.shortcut = nn.Sequential()
        if stride != 1 or in_channels != out_channels:
            self.shortcut = nn.Sequential(
                nn.Conv2d(
                    in_channels=in_channels, out_channels=out_channels,
                    kernel_size=(1, 1), stride=stride, bias=False
                ),
                nn.BatchNorm2d(out_channels)
            )

    def forward(self, x):
        out = nn.ReLU()(self.bn1(self.conv1(x)))
        out = self.bn2(self.conv2(out))
        out += self.shortcut(x)
        out = nn.ReLU()(out)
        return out

Now, we can define our full network. 

In [0]:
class ResNet(nn.Module):
    def __init__(self, num_classes=10):
        super(ResNet, self).__init__()
        
        # Initial input conv
        self.conv1 = nn.Conv2d(
            in_channels=3, out_channels=64, kernel_size=(3, 3),
            stride=1, padding=1, bias=False
        )
        self.bn1 = nn.BatchNorm2d(64)
        
        # Create blocks
        self.block1 = self._create_block(64, 64, stride=1)
        self.block2 = self._create_block(64, 128, stride=2)
        self.block3 = self._create_block(128, 256, stride=2)
        self.block4 = self._create_block(256, 512, stride=2)
        self.linear = nn.Linear(512, num_classes)
    
    # A block is just two residual blocks for ResNet18
    def _create_block(self, in_channels, out_channels, stride):
        return nn.Sequential(
            ResidualBlock(in_channels, out_channels, stride),
            ResidualBlock(out_channels, out_channels, 1)
        )

    def forward(self, x):
        out = nn.ReLU()(self.bn1(self.conv1(x)))
        out = self.block1(out)
        out = self.block2(out)
        out = self.block3(out)
        out = self.block4(out)
        out = nn.AvgPool2d(4)(out)
        out = out.view(out.size(0), -1)
        out = self.linear(out)
        return out

## Loading Data
We first start by downloading the CIFAR-10 dataset in the same directory as our code file. 

In [0]:
!wget http://pjreddie.com/media/files/cifar.tgz
!tar xzf cifar.tgz

--2019-05-01 13:18:06--  http://pjreddie.com/media/files/cifar.tgz
Resolving pjreddie.com (pjreddie.com)... 128.208.4.108
Connecting to pjreddie.com (pjreddie.com)|128.208.4.108|:80... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: https://pjreddie.com/media/files/cifar.tgz [following]
--2019-05-01 13:18:07--  https://pjreddie.com/media/files/cifar.tgz
Connecting to pjreddie.com (pjreddie.com)|128.208.4.108|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 168584360 (161M) [application/octet-stream]
Saving to: ‘cifar.tgz’


2019-05-01 13:18:09 (65.5 MB/s) - ‘cifar.tgz’ saved [168584360/168584360]



We now read the labels of the classes present in the CIFAR dataset.

In [0]:
data_dir = "cifar/train/"

with open("cifar/labels.txt") as label_file:
    labels = label_file.read().split()
    label_mapping = dict(zip(labels, list(range(len(labels)))))
     
    

We then write a preprocessing function that will take in a `PIL.Image` that will 

1. Randomly horizontally the image with a probability of 0.5 
2. Normalise the image with mean and standard deviation of CIFAR dataset
3. Reshape it from W X H X C to C X H X W. 


In [0]:
def preprocess(image):
  
    image = np.array(image)
    
    
    if random.random() > 0.5:
        image = image[::-1,:,:]
    
    cifar_mean = np.array([0.4914, 0.4822, 0.4465]).reshape(1,1,-1)
    cifar_std  = np.array([0.2023, 0.1994, 0.2010]).reshape(1,1,-1)
    
    
    image = (image - cifar_mean) / cifar_std
      
    image = image.transpose(2,1,0)
    return image
    


### Input Format

The input format for images is `[B C H W]`. Where `B` is the batch size, `C` are the channels, `H` is the height and `W` is the width. 

Normally, there are two classes PyTorch provides you in relation to build input pipelines to load data. 

1. `torch.data.utils.dataset`, which we will just refer as the `dataset` class now. 
2. `torch.data.utils.dataloader` , which we will just refer as the `dataloader` class now. 


### torch.utils.data.dataset

`dataset` is a class that loads the data and returns a generator so that you iterate over it. It also lets you incorporate data augmentation techniques into the input Pipeline. 

If you want to create a `dataset` object for your data, you need to overload three functions. 

1. `__init__` function. Here, you define things related to your dataset here. Most importantly, the location of your data. You can also define various data augmentations you want to apply.
2. `__len__` function. Here, you just return the length of the dataset. 
3. `__getitem__` function. The function takes as an argument an index `i` and returns a data example. This function would be called every iteration during our training loop with a different i by the dataset object. 

Here is a implementation of our dataset object for the CIFAR dataset. 


In [0]:
class Cifar10Dataset(torch.utils.data.Dataset):
    def __init__(self, data_dir, data_size = 0, transforms = None):
        files = os.listdir(data_dir)
        files = [os.path.join(data_dir,x) for x in files]
        
        
        if data_size < 0 or data_size > len(files):
            assert("Data size should be between 0 to number of files in the dataset")
        
        if data_size == 0:
            data_size = len(files)
        
        self.data_size = data_size
        self.files = random.sample(files, self.data_size)
        self.transforms = transforms
        
    def __len__(self):
        return self.data_size
    
    def __getitem__(self, idx):
        image_address = self.files[idx]
        image = Image.open(image_address)
        image = preprocess(image)
        label_name = image_address[:-4].split("_")[-1]
        label = label_mapping[label_name]
        
        image = image.astype(np.float32)
        
        if self.transforms:
            image = self.transforms(image)

        return image, label
        

### torch.utils.data.Dataloader
The `Dataloader` class facilitates 
1. Batching of Data
2. Shuffling of Data 
3. Loading multiple data at a single time using threads 
4. Prefetching, that is, while GPU crunches the current batch, Dataloader can load the next batch into memory in meantime. This means GPU doesn't have to wait for the next batch and it speeds up training.

You instantiate a `Dataloader` object with a `Dataset` object. Then you can iterate over a Dataloader object instance just like you do with any python generator 


In [0]:
trainset = Cifar10Dataset(data_dir = "cifar/train/", transforms=None)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=128, shuffle=True, num_workers=2)


testset = Cifar10Dataset(data_dir = "cifar/test/", transforms=None)
testloader = torch.utils.data.DataLoader(testset, batch_size=128, shuffle=True, num_workers=2)


## Training and Evaluation

We now define an optimiser for our training. We use a cross entropy loss, with momentum based SGD optimisation algorithm. Our learning rate is decayed by a factor of 0.1 at 150th and 200th epoch. 

For this we use the `torch.optim` class, which provides us with the `SGD` function that implements mini batch stochastic gradient descent with momentum. We pass `net.parameters()` to the function `SGD`. `net.parameters()` is actually a list of all the trainable parameters of our network, and by passing them to `SGD` we make sure that at each step, `SGD` updates them. 

We also use `optim.lr_scheduler.MultiStepLR` class. We pass out optimiser object to this class. By calling `step` on an object of this class, we make sure that the parameters like learning rate are updated for `optim` accordingly. 

In [0]:
clf = ResNet()
if cuda_available:
    clf = clf.cuda()
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(clf.parameters(), lr=0.1, momentum=0.9, weight_decay=5e-4)
scheduler = torch.optim.lr_scheduler.MultiStepLR(optimizer, milestones=[150, 200], gamma=0.1)

We finally train for 20 epochs. You can increase the number of epochs. This might take a while on a GPU. Again the idea of this tutorial is to show how PyTorch works and not to attain the best accuracy.


We evaluate classification accuracy every epoch.

In [0]:
for epoch in range(10):
    losses = []
    scheduler.step()
    # Train
    start = time.time()
    for batch_idx, (inputs, targets) in enumerate(trainloader):
        if cuda_available:
            inputs, targets = inputs.cuda(), targets.cuda()            # If the inputs and network is on different GPUs, PyTorch will give an error

           
        optimizer.zero_grad()                                          # If we don't do this, then the new grads will be merely added to the value 
                                                                       # of gradient computed in the previous step, i.e gradients will be accumulated 
                                                                       # over iterations. To prevent this, we set the grad to zero.
            
        outputs = clf(inputs)                                          # Compute output
        
        loss = criterion(outputs, targets)                             # Compute Loss   
        
        loss.backward()                                                # Compute Gradients 
        
        optimizer.step()                                               # Update the values of net.params() with the computed gradients
        
        losses.append(loss.item())        
      
    # Evaluate
    clf.eval()                                                         # A network has a eval mode and a train mode. This is related to how layers like 
                                                                       # Batch Norm and Dropout have different behaviours during inference (eval()) and 
                                                                       # training (train())
        
    total = 0
    correct = 0
    
    with torch.no_grad():                                             # Put the inference code in the `torch.no_grad()` context manager so that no graph 
                                                                      # is created and memory is saved (We don't need graphs as we don't backprop)
        
      for batch_idx, (inputs, targets) in enumerate(testloader):
          if cuda_available:
              inputs, targets = inputs.cuda(), targets.cuda()

          outputs = clf(inputs)
          _, predicted = torch.max(outputs.data, 1)
          total += targets.size(0)
          correct += predicted.eq(targets.data).cpu().sum()

      print('Epoch : %d Test Acc : %.3f' % (epoch, 100.*correct/total))
      print('--------------------------------------------------------------')
    clf.train()    

writer.close()

Epoch : 0 Loss : 2.310 Time : 0.477 seconds 
Epoch : 100 Loss : 1.962 Time : 19.268 seconds 
Epoch : 200 Loss : 1.830 Time : 20.404 seconds 
Epoch : 300 Loss : 1.735 Time : 21.622 seconds 
Epoch : 0 Test Acc : 46.000
--------------------------------------------------------------
Epoch : 0 Loss : 1.269 Time : 0.355 seconds 
Epoch : 100 Loss : 1.335 Time : 20.406 seconds 
Epoch : 200 Loss : 1.282 Time : 20.475 seconds 
Epoch : 300 Loss : 1.236 Time : 20.828 seconds 
Epoch : 1 Test Acc : 55.000
--------------------------------------------------------------
Epoch : 0 Loss : 0.983 Time : 0.354 seconds 
Epoch : 100 Loss : 1.011 Time : 20.605 seconds 
Epoch : 200 Loss : 0.999 Time : 20.538 seconds 
Epoch : 300 Loss : 0.979 Time : 20.531 seconds 
Epoch : 2 Test Acc : 59.000
--------------------------------------------------------------
Epoch : 0 Loss : 0.788 Time : 0.364 seconds 
Epoch : 100 Loss : 0.852 Time : 20.558 seconds 
Epoch : 200 Loss : 0.852 Time : 20.571 seconds 
Epoch : 300 Loss : 

KeyboardInterrupt: ignored