##The Resnet Research paper can be accessed from here https://arxiv.org/pdf/1512.03385v1.pdf

In [1]:
import torch
import numpy as np

# check if CUDA is available
train_on_gpu = torch.cuda.is_available()

if not train_on_gpu:
  print('CUDA is not available.  Training on CPU ...')
else:
  print('CUDA is available!  Training on GPU ...')

CUDA is available!  Training on GPU ...


In [2]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [3]:
import logging
DRIVE_PATH = './drive/MyDrive/DynamicResNet'
DROPOUT_LOC = 'LastLayer'

#**Downloading the CIFAR10 datset and loading the data in Normalized form as torch.FloatTensor datatype and generating a validation set by dividing the training set in 80-20 ratio**
#**CIFAR10**
The CIFAR10 and CIFAR-100 are labeled subsets of the 80 million tiny images dataset. They were collected by Alex Krizhevsky, Vinod Nair, and Geoffrey Hinton.

The CIFAR-10 dataset consists of 60000 32x32 colour images in 10 classes, with 6000 images per class. There are 50000 training images and 10000 test images.

The dataset is divided into five training batches and one test batch, each with 10000 images. The test batch contains exactly 1000 randomly-selected images from each class. The training batches contain the remaining images in random order, but some training batches may contain more images from one class than another. Between them, the training batches contain exactly 5000 images from each class.

Here are the classes in the dataset:
1. airplane
2. automobile
3. bird
4. cat
5. deer
6. dog
7. frog
8. horse
9. ship
10. truck

The classes are completely mutually exclusive. There is no overlap between automobiles and trucks. "Automobile" includes sedans, SUVs, things of that sort. "Truck" includes only big trucks. Neither includes pickup trucks.

More can be read from their page at https://www.cs.toronto.edu/~kriz/cifar.html

#**Image Augmentation**
In this cell, we perform some simple data augmentation by randomly flipping and cropping the given image data. We do this by defining a torchvision transform, and you can learn about all the transforms that are used to pre-process and augment data from the [PyTorch documentation](https://pytorch.org/docs/stable/torchvision/transforms.html)

In [4]:
from torchvision import datasets
import torchvision.transforms as transforms
from torch.utils.data.sampler import SubsetRandomSampler

# number of subprocesses to use for data loading
num_workers = 0
# how many samples per batch to load
batch_size = 20
# percentage of training set to use as validation
valid_size = 0.2

# convert data to a normalized torch.FloatTensor
print('==> Preparing data..')
#Image augmentation is used to train the model
transform_train = transforms.Compose([
    transforms.RandomCrop(32, padding=4),
    transforms.RandomHorizontalFlip(),
    transforms.ToTensor(),
    transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010)),
])
#Only the data is normalaized we do not need to augment the test data
transform_test = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010)),
])

# choose the training and test datasets
train_data = datasets.CIFAR10('data', train=True,
                              download=True, transform=transform_train)
test_data = datasets.CIFAR10('data', train=False,
                             download=True, transform=transform_test)

# obtain training indices that will be used for validation
num_train = len(train_data)
indices = list(range(num_train))
np.random.shuffle(indices)
split = int(np.floor(valid_size * num_train))
train_idx, valid_idx = indices[split:], indices[:split]

# define samplers for obtaining training and validation batches
train_sampler = SubsetRandomSampler(train_idx)
valid_sampler = SubsetRandomSampler(valid_idx)

# prepare data loaders (combine dataset and sampler)
train_loader = torch.utils.data.DataLoader(train_data, batch_size=batch_size,
    sampler=train_sampler, num_workers=num_workers)
valid_loader = torch.utils.data.DataLoader(train_data, batch_size=batch_size, 
    sampler=valid_sampler, num_workers=num_workers)
test_loader = torch.utils.data.DataLoader(test_data, batch_size=batch_size, 
    num_workers=num_workers)

# specify the image classes
classes = ['airplane', 'automobile', 'bird', 'cat', 'deer',
           'dog', 'frog', 'horse', 'ship', 'truck']


==> Preparing data..
Files already downloaded and verified
Files already downloaded and verified


#**Defining the Network Architecture**
In this section the entire Research Paper is implemented to define the Residual Network approach taken by the researchers

NOTE:

Output volume for a convolutional layer
To compute the output size of a given convolutional layer we can perform the following calculation (taken from Stanford's cs231n course):

We can compute the spatial size of the output volume as a function of the input volume size (W), the kernel/filter size (F), the stride with which they are applied (S), and the amount of zero padding used (P) on the border. The correct formula for calculating how many neurons define the output_W is given by (W−F+2P)/S+1.

For example for a 7x7 input and a 3x3 filter with stride 1 and pad 0 we would get a 5x5 output. With stride 2 we would get a 3x3 output.

In [11]:
import torch.backends.cudnn as cudnn
import torch.nn as nn
import torch.nn.functional as F
from torch.nn import Dropout

class BasicBlock(nn.Module):
  expansion = 1
  def __init__(self, in_planes, planes, stride=1):
    super(BasicBlock, self).__init__()
    self.conv1 = nn.Conv2d(in_planes, planes, kernel_size=3, stride=stride, padding=1, bias=False)
    self.bn1 = nn.BatchNorm2d(planes)
    self.conv2 = nn.Conv2d(planes, planes, kernel_size=3, stride=1, padding=1, bias=False)
    self.bn2 = nn.BatchNorm2d(planes)

    self.shortcut = nn.Sequential()
    if stride != 1 or in_planes != self.expansion*planes:
      self.shortcut = nn.Sequential(
          nn.Conv2d(in_planes, self.expansion*planes, kernel_size=1, stride=stride, bias=False),
          nn.BatchNorm2d(self.expansion*planes)
      )
  
  def forward(self, x):
    out = F.relu(self.bn1(self.conv1(x)))
    out = self.bn2(self.conv2(out))
    out += self.shortcut(x)
    out = F.relu(out)
    return out

class BottleNeck(nn.Module):
  expansion = 4

  def __init__(self, in_planes, planes, stride=1):
    super(BottleNeck, self).__init__()
    self.conv1 = nn.Conv2d(in_planes , planes, kernel_size=1, bias=False)
    self.bn1 = nn.BatchNorm2d(planes)
    self.conv2 = nn.Conv2d(planes, planes, kernel_size=3, stride=stride, padding=1, bias=False)
    self.bn2 = nn.BatchNorm2d(planes)
    self.conv3 = nn.Conv2d(planes, self.expansion*planes, kernel_size=1, bias=False)
    self.bn3 = nn.BatchNorm2d(self.expansion*planes)

    self.shortcut = nn.Sequential()
    if stride != 1 or in_planes != self.expansion*planes :
      self.shortcut = nn.Sequential(
          nn.Conv2d(in_planes, self.expansion*planes, kernel_size=1, stride=stride, bias=False),
          nn.BatchNorm2d(self.expansion*planes)
      )

  def forward(self, x):
    out = F.relu(self.bn1(self.conv1(x)))
    out = F.relu(self.bn2(self.conv2(out)))
    out = self.bn3(self.conv3(out))
    out += self.shortcut(x)
    out = F.relu(out)
    return out

class ResNet(nn.Module):
  def __init__(self, block, num_blocks, dropout_rate, dropout_location, logger_path, num_classes=10, curr_epoch=0):
    super(ResNet, self).__init__()
    self.in_planes = 64

    #### variables we added
    """
    :param dropout_location: "last", "middle", "all"
    """
    self.dropout_rate = dropout_rate
    self.dropout_location = dropout_location
    self.curr_epoch = curr_epoch
    ####

    #### logger
    # self.logger = logging.getLogger(__name__)
    # self.logger.setLevel(logging.DEBUG)
    
    # formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s')

    # fh = logging.FileHandler(logger_path)
    # fh.setLevel(logging.DEBUG)
    # fh.setFormatter(formatter)
    # self.logger.addHandler(fh)
    ####

    self.conv1 = nn.Conv2d(3, 64, kernel_size=3, stride=1, padding=1, bias=False)
    self.bn1 = nn.BatchNorm2d(64)
    if self.dropout_location == 'all':
      self.layer1 = self._make_layer(block, 64, num_blocks[0], stride=1, is_dropout=True)
      self.layer2 = self._make_layer(block, 128, num_blocks[1], stride=2, is_dropout=True)
      self.layer3 = self._make_layer(block, 256, num_blocks[2], stride=2, is_dropout=True)
      self.layer4 = self._make_layer(block, 512, num_blocks[3], stride=2, is_dropout=True)
    else:
      self.layer1 = self._make_layer(block, 64, num_blocks[0], stride=1, is_dropout=False)
      self.layer2 = self._make_layer(block, 128, num_blocks[1], stride=2, is_dropout=False)
      if self.dropout_location == 'middle': 
        self.layer3 = self._make_layer(block, 256, num_blocks[2], stride=2, is_dropout=True)
        self.layer4 = self._make_layer(block, 512, num_blocks[3], stride=2, is_dropout=False)
      else: # self.dropout_location == 'last':
        self.layer3 = self._make_layer(block, 256, num_blocks[2], stride=2, is_dropout=False)
        self.layer4 = self._make_layer(block, 512, num_blocks[3], stride=2, is_dropout=True)      
    self.linear = nn.Linear(512*block.expansion, num_classes)

  def get_dropout_rate(self):
      return self.dropout_rate

  def _make_layer(self, block, planes, num_blocks, stride, is_dropout):
    strides = [stride] + [1]*(num_blocks-1)
    layers = []
    for stride in strides:
      layers.append(block(self.in_planes, planes, stride))
      self.in_planes = planes * block.expansion
    if is_dropout:
      return nn.Sequential(*layers, Dropout(self.dropout_rate))
    return nn.Sequential(*layers)

  def forward(self, x):
    out = F.relu(self.bn1(self.conv1(x)))
    out = self.layer1(out)
    out = self.layer2(out)
    out = self.layer3(out)
    out = self.layer4(out)
    out = F.avg_pool2d(out, 4)
    out = out.view(out.size(0), -1)
    out = self.linear(out)
    return out

def ResNet18(rate, location, path):
    return ResNet(BasicBlock, [2, 2, 2, 2], dropout_rate=rate, dropout_location=location, logger_path=path)

def ResNet34(rate, location, path):
    return ResNet(BasicBlock, [3, 4, 6, 3], dropout_rate=rate, dropout_location=location, logger_path=path)

def ResNet50(rate, location, path):
    return ResNet(Bottleneck, [3, 4, 6, 3], dropout_rate=rate, dropout_location=location, logger_path=path)

def ResNet101(rate, location, path):
    return ResNet(Bottleneck, [3, 4, 23, 3], dropout_rate=rate, dropout_location=location, logger_path=path)

def ResNet152(rate, location, path):
    return ResNet(Bottleneck, [3, 8, 36, 3], dropout_rate=rate, dropout_location=location, logger_path=path)

In [19]:
dropout_rate = 0.0
dropout_location = 'last'
logger_path = f"{DRIVE_PATH}/example.log"
net = ResNet18(dropout_rate, dropout_location, logger_path)

print(net)

if train_on_gpu:
  net = torch.nn.DataParallel(net)
  cudnn.benchmark = True

ResNet(
  (conv1): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
  (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (layer1): Sequential(
    (0): BasicBlock(
      (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (shortcut): Sequential()
    )
    (1): BasicBlock(
      (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=

#**Specifying the Loss Function and Optimizer**
We use CrossEntropyLoss as Loss function and

[Stochastic Gradient Descent](https://leon.bottou.org/publications/pdf/compstat-2010.pdf) as Optimizer with momentum and weight decay specified by the research paper of ResNet

In [20]:
import torch.optim as optim
# specify loss function (categorical cross-entropy)
criterion = nn.CrossEntropyLoss()

# specify optimizer
optimizer = optim.SGD(net.parameters(), lr=0.01, momentum=0.9, weight_decay=0.0001)

In [21]:
def save_checkpoint(epoch, model, optimizer, loss):
    """Saves model checkpoint"""
    torch.save({
                'epoch': epoch,
                'model_state_dict': model.state_dict(),
                'optimizer_state_dict': optimizer.state_dict(),
                'loss': loss.state_dict(),
                'dropout':model.get_dropout_rate(),
                }, f'{DRIVE_PATH}/{DROPOUT_LOC}/ResNet18_{model.get_dropout_rate()}.pth')

In [22]:
#### Load model checkpoint
def load_checkpoint(path, dropout_location, logger_path):
  try:
    checkpoint = torch.load(path)
    curr_epoch = checkpoint.get('epoch',0)
    dropout_rate = checkpoint.get('dropout',0.0)
    net = ResNet18(dropout_rate=dropout_rate ,curr_epoch=curr_epoch+1, dropout_location=dropout_location, logger_path=logger_path)
    #### Load model and optimizer state dictionaries
    net.load_state_dict(checkpoint['model_state_dict'])
    optimizer.load_state_dict(checkpoint['optimizer_state_dict'])
    criterion.load_state_dict(checkpoint['loss'])
    return net
  except Exception as e:
    print(e)

In [23]:
model_map = {}
for dropout_val in [0.0,0.1,0.2,0.3,0.4,0.5]:
  net_path = f'{DRIVE_PATH}/{DROPOUT_LOC}/ResNet18_{dropout_val}'
  logger_path = f"{net_path}.log"
  net = load_checkpoint(f'{net_path}.pth', 
                        dropout_location=dropout_location, logger_path=logger_path)
  if net is None:
    net = ResNet18(dropout_val, dropout_location, logger_path)
    # net.apply(weights_init) ??????????????
  else:
    print("loaded model from disk")  
  if train_on_gpu:
    net = torch.nn.DataParallel(net)
    cudnn.benchmark = True

  model_map[dropout_val] = net

[Errno 2] No such file or directory: './drive/MyDrive/DynamicResNet/LastLayer/ResNet18_0.0.pth'
[Errno 2] No such file or directory: './drive/MyDrive/DynamicResNet/LastLayer/ResNet18_0.1.pth'
[Errno 2] No such file or directory: './drive/MyDrive/DynamicResNet/LastLayer/ResNet18_0.2.pth'
[Errno 2] No such file or directory: './drive/MyDrive/DynamicResNet/LastLayer/ResNet18_0.3.pth'
[Errno 2] No such file or directory: './drive/MyDrive/DynamicResNet/LastLayer/ResNet18_0.4.pth'
[Errno 2] No such file or directory: './drive/MyDrive/DynamicResNet/LastLayer/ResNet18_0.5.pth'


#**Training Loop**
Here we train the architecture on training data and check its validation loss by using the validation set and saving the model only if there is an improvement ie decrease in the validation loss.

In [24]:
# number of epochs to train the model
n_epochs = 1

valid_loss_min = np.Inf # track change in validation loss

for epoch in range(1, n_epochs+1):
  # keep track of training and validation loss
  train_loss = 0.0
  valid_loss = 0.0
    
  ###################
  # train the model #
  ###################
  net.train()
  for batch_idx, (data, target) in enumerate(train_loader):
    # move tensors to GPU if CUDA is available
    if train_on_gpu:
      data, target = data.cuda(), target.cuda()
    # clear the gradients of all optimized variables
    optimizer.zero_grad()
    # forward pass: compute predicted outputs by passing inputs to the model
    output = net(data)
    # calculate the batch loss
    loss = criterion(output, target)
    # backward pass: compute gradient of the loss with respect to model parameters
    loss.backward()
    # perform a single optimization step (parameter update)
    optimizer.step()
    # update training loss
    train_loss += loss.item()*data.size(0)
        
  ######################    
  # validate the model #
  ######################
  net.eval()
  for batch_idx, (data, target) in enumerate(valid_loader):
    # move tensors to GPU if CUDA is available
    if train_on_gpu:
      data, target = data.cuda(), target.cuda()
    # forward pass: compute predicted outputs by passing inputs to the model
    output = net(data)
    # calculate the batch loss
    loss = criterion(output, target)
    # update average validation loss 
    valid_loss += loss.item()*data.size(0)
    
  # calculate average losses
  train_loss = train_loss/len(train_loader.sampler)
  valid_loss = valid_loss/len(valid_loader.sampler)

  epoch_info = 'Epoch: {} \tTraining Loss: {:.6f} \tValidation Loss: {:.6f}'.format(epoch, train_loss, valid_loss)      
  # print training/validation statistics 
  print(epoch_info)


  logging.info(epoch_info)

  # save model if validation loss has decreased
  if valid_loss <= valid_loss_min:
    print('Validation loss decreased ({:.6f} --> {:.6f}).  Saving model ...'.format(
    valid_loss_min,
    valid_loss))
    torch.save(net.state_dict(), 'ResNet18.pt')
    valid_loss_min = valid_loss

Epoch: 1 	Training Loss: 2.407289 	Validation Loss: 2.401601
Validation loss decreased (inf --> 2.401601).  Saving model ...


#**Loading the Best Model**

In [None]:
net.load_state_dict(torch.load('ResNet18.pt'))

#**Testing Loop**
The real test of the model architecture how well does the model recognizes the image and what is the accuracy on the test data

In [None]:
# track test loss
test_loss = 0.0
class_correct = list(0. for i in range(10))
class_total = list(0. for i in range(10))

net.eval()
# iterate over test data
for batch_idx, (data, target) in enumerate(test_loader):
  # move tensors to GPU if CUDA is available
  if train_on_gpu:
    data, target = data.cuda(), target.cuda()
  # forward pass: compute predicted outputs by passing inputs to the model
  output = net(data)
  # calculate the batch loss
  loss = criterion(output, target)
  # update test loss 
  test_loss += loss.item()*data.size(0)
  # convert output probabilities to predicted class
  _, pred = torch.max(output, 1)    
  # compare predictions to true label
  correct_tensor = pred.eq(target.data.view_as(pred))
  correct = np.squeeze(correct_tensor.numpy()) if not train_on_gpu else np.squeeze(correct_tensor.cpu().numpy())
  # calculate test accuracy for each object class
  for i in range(batch_size):
    label = target.data[i]
    class_correct[label] += correct[i].item()
    class_total[label] += 1

# average test loss
test_loss = test_loss/len(test_loader.dataset)
print('Test Loss: {:.6f}\n'.format(test_loss))

for i in range(10):
  if class_total[i] > 0:
    print('Test Accuracy of %5s: %2d%% (%2d/%2d)' % (
        classes[i], 100 * class_correct[i] / class_total[i],
        np.sum(class_correct[i]), np.sum(class_total[i])))
  else:
    print('Test Accuracy of %5s: N/A (no training examples)' % (classes[i]))

print('\nTest Accuracy (Overall): %2d%% (%2d/%2d)' % (
    100. * np.sum(class_correct) / np.sum(class_total),
    np.sum(class_correct), np.sum(class_total)))