# Assignment 3

# Instructions

1. You have to use only this notebook for all your code.
2. All the results and plots should be mentioned in this notebook.
3. For final submission, submit this notebook along with the report ( usual 2-4 pages, latex typeset, which includes the challenges faces and details of additional steps, if any)
4. Marking scheme
    -  **60%**: Your code should be able to detect bounding boxes using resnet 18, correct data loading and preprocessing. Plot any 5 correct and 5 incorrect sample detections from the test set in this notebook for both the approached (1 layer and 2 layer detection), so total of 20 plots.
    -  **20%**: Use two layers (multi-scale feature maps) to detect objects independently as in SSD (https://arxiv.org/abs/1512.02325).  In this method, 1st detection will be through the last layer of Resnet18 and the 2nd detection could be through any layer before the last layer. SSD uses lower resolution layers to detect larger scale objects. 
    -  **20%**: Implement Non-maximum suppression (NMS) (should not be imported from any library) on the candidate bounding boxes.
    
5. Report AP for each of the three class and mAP score for the complete test set.

In [15]:
from __future__ import division, print_function, unicode_literals
import numpy as np
import torch
import os
import cv2
import copy
import time
import random
from PIL import Image
import torch.nn as nn
import torch.utils.data
import torchvision.models as models
import torchvision.transforms as transforms
from torch.autograd import Variable
from torch.utils.data.sampler import SubsetRandomSampler
import matplotlib.pyplot as plt
import xml.etree.ElementTree as ET
%matplotlib inline
plt.ion()
# Import other modules if required
# Can use other libraries as well

print(torch.__version__)
resnet_input = 224 #size of resnet18 input images

1.0.1.post2


In [16]:
# Choose your hyper-parameters using validation data
batch_size = 128
num_epochs = 5
learning_rate =  0.001
hyp_momentum = 0.9
validation_split = 0.1

## Build the data
Use the following links to locally download the data:
<br/>Training and validation:
<br/>http://host.robots.ox.ac.uk/pascal/VOC/voc2007/VOCtrainval_06-Nov-2007.tar
<br/>Testing data:
<br/>http://host.robots.ox.ac.uk/pascal/VOC/voc2007/VOCtest_06-Nov-2007.tar
<br/>The dataset consists of images from 20 classes, with detection annotations included. The JPEGImages folder houses the images, and the Annotations folder has the object-wise labels for the objects in one xml file per image. You have to extract the object information, i.e. the [xmin, ymin] (the top left x,y co-ordinates) and the [xmax, ymax] (the bottom right x,y co-ordinates) of only the objects belonging to the three classes(aeroplane, bottle, chair). For parsing the xml file, you can import xml.etree.ElementTree for you. <br/>
<br/> Organize the data as follows:
<br/> For every image in the dataset, extract/crop the object patch from the image one by one using their respective co-ordinates:[xmin, ymin, xmax, ymax], resize the image to resnet_input, and store it with its class label information. Do the same for training/validation and test datasets. <br/>
##### Important
You also have to collect data for an extra background class which stands for the class of an object which is not a part of any of the 20 classes. For this, you can crop and resize any random patches from an image. A good idea is to extract patches that have low "intersection over union" with any object present in the image frame from the 20 Pascal VOC classes. The number of background images should be roughly around those of other class objects' images. Hence the total classes turn out to be four. This is important for applying the sliding window method later.


In [17]:
classes = ('__background__',
           'aeroplane',
           'bottle',
           'chair'
           )


In [34]:
def get_iou(boxA, boxB):
    # print(boxA, boxB)
    # determine the (x, y)-coordinates of the intersection rectangle
    xA = max(boxA[0], boxB[0])
    yA = max(boxA[1], boxB[1])
    xB = min(boxA[2], boxB[2])
    yB = min(boxA[3], boxB[3])

    # compute the area of intersection rectangle
    interArea = max(0.0, xB - xA + 1.0) * max(0.0, yB - yA + 1.0)

    # compute the area of both the prediction and ground-truth
    # rectangles
    boxAArea = (boxA[2] - boxA[0] + 1.0) * (boxA[3] - boxA[1] + 1.0)
    boxBArea = (boxB[2] - boxB[0] + 1.0) * (boxB[3] - boxB[1] + 1.0)

    # compute the intersection over union by taking the intersection
    # area and dividing it by the sum of prediction + ground-truth
    # areas - the interesection area
    iou = 1.0*interArea / float(boxAArea + boxBArea - interArea)

    # return the intersection over union value
    return iou

In [19]:
class voc_dataset(torch.utils.data.Dataset): # Extend PyTorch's Dataset class
    def __init__(self, root_dir, train, transform=None):
        # Begin
        self.root_dir = root_dir
        self.transform = transform
        self.allowed_labels = ['aeroplane', 'bottle', 'chair']
        self.dataset = []
        self.train = train
        self.make_dataset()
    
    def make_dataset(self):
        images_folder = os.path.join(self.root_dir, 'VOC2007/JPEGImages/')
        objects_folder = os.path.join(self.root_dir, 'VOC2007/Annotations/')
        bg_entries = []
        for image_name in os.listdir(images_folder):
            # print(image_name)
            data_coords = []
            image_file = os.path.join(images_folder, image_name)
            objects_file = os.path.join(objects_folder, image_name.split(".")[0]+'.xml')
            tree = ET.parse(objects_file)
            # img = cv2.imread(image_file)
            img = Image.open(image_file).convert('RGB')
            root = tree.getroot()
            # iterate over all objects in the image
            for obj in root.findall('object'):
                label = obj.find('name').text
                # skip this object if it isnt one of the required ones
                if label not in self.allowed_labels:
                    continue
                bndbox = obj.find('bndbox')
                coord = []
                for cd in bndbox:
                    coord.append(int(cd.text))
                # coord is in order: xmin, ymin, xmax, ymax
                
                # exract the object image from the complete image
                object_img = img.crop(tuple(coord))
                if self.transform:
                    object_img = self.transform(object_img)
                
                # save the object and the its label in our dataset
                label_vec = np.array([i==classes.index(label) for i in range(4)]).astype(int)
                data_entry = {'image': object_img, 'label':classes.index(label)}
                self.dataset.append(data_entry)
                data_coords.append(coord)
                # print(label)
            
            # get random backgrounds from the image
            width = int(root.find('./size/width').text)
            height = int(root.find('./size/height').text)
            maxTries = 4
            try:
                while(maxTries>0):
                    maxTries -= 1
                    isValidBG = True
                    x1 = random.randint(0, width-resnet_input-1)
                    y1 = random.randint(0, height-resnet_input-1)
                    x2, y2 = x1+resnet_input-1, y1+resnet_input-1
                    bg_image = img.crop((x1, y1, x2, y2))
                    if self.transform:
                        bg_image = self.transform(bg_image)
                    # check for IoU match of the backgroud with extracted object images 
                    for data_c in data_coords:
                        if get_iou([x1,y1,x2,y2], data_c) >= 0.5:
                            isValidBG = False
                            break
                    if isValidBG == False:
                        continue
                    bg_label = '__background__'
                    label_vec = np.array([i==classes.index(bg_label) for i in range(4)]).astype(int)
                    bg_entries.append({'image':bg_image, 'label':classes.index(label)})
                    break
            except ValueError:
                pass
        
        # randomly pick the required number of background images
        random.shuffle(bg_entries)
        numToAdd = int(len(self.dataset)/len(self.allowed_labels))+1
        self.dataset = self.dataset + bg_entries[:numToAdd]
        random.shuffle(self.dataset)
        
    def __len__(self):
        # Begin
        return len(self.dataset)
        
    def __getitem__(self, idx):
       # Begin
        return self.dataset[idx]
    


In [10]:
def build_dataset():
    # Begin
    dataset = voc_dataset('./trial', 0)
    print(len(dataset))

In [164]:
# to run the build dataset funcion
build_dataset()

6


## Train the netwok
<br/>You can train the network on the created dataset. This will yield a classification network on the 4 classes of the VOC dataset. 

In [20]:
composed_transform = transforms.Compose([transforms.Resize((resnet_input,resnet_input)),
                                         transforms.RandomHorizontalFlip(),
                                         transforms.ToTensor(),])


In [12]:
train_dataset = voc_dataset(root_dir='./train', train=True, transform=composed_transform) # Supply proper root_dir
test_dataset = voc_dataset(root_dir='./test', train=False, transform=composed_transform) # Supply proper root_dir

# to split training in train and val sets
dataset_size = len(train_dataset)
indices = list(range(dataset_size))
val_size = int(np.floor(validation_split * dataset_size))
train_indices, val_indices = indices[val_size:], indices[:val_size]

# Creating PT data samplers and loaders:
train_sampler = SubsetRandomSampler(train_indices)
valid_sampler = SubsetRandomSampler(val_indices)

tr_loader = torch.utils.data.DataLoader(train_dataset, batch_size=batch_size, 
                                           sampler=train_sampler)
val_loader = torch.utils.data.DataLoader(train_dataset, batch_size=batch_size,
                                                sampler=valid_sampler)


# train_loader = torch.utils.data.DataLoader(dataset=train_dataset, batch_size=batch_size, shuffle=True)

In [13]:
test_loader = torch.utils.data.DataLoader(dataset=test_dataset, batch_size=batch_size, shuffle=False)
data_loaders = {'train': tr_loader, 'val':val_loader, 'test':test_loader}

### Fine-tuning
Use the pre-trained network to fine-tune the network in the following section:

In [21]:
resnet18 = models.resnet18(pretrained=True)

resnet18.fc = torch.nn.Linear(resnet18.fc.in_features, 4)

print(resnet18)
# Add code for using CUDA here
# CUDA has not been used

ResNet(
  (conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
  (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (relu): ReLU(inplace)
  (maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
  (layer1): Sequential(
    (0): BasicBlock(
      (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace)
      (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    )
    (1): BasicBlock(
      (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace)
      (conv2): Co

In [22]:
criterion = torch.nn.CrossEntropyLoss()
# Update if any errors occur
optimizer = torch.optim.SGD(resnet18.parameters(), learning_rate, hyp_momentum)

In [31]:
#One Layer Detection
def train(model, dataloaders, criterion, optimizer, num_epochs):
    # Begin
    val_acc_history = []
    data_size = {'train':dataset_size - val_size, 'val':val_size}
    
    print(len(dataloaders['val'].dataset))
    best_model_wts = copy.deepcopy(model.state_dict())
    best_acc = 0.0
    
    for epoch in range(num_epochs):
        print('Epoch {}/{}'.format(epoch+1, num_epochs))
        
        # Each epoch has a training and validation phase
        for phase in ['train', 'val']:
            print(phase)
            if phase == 'train':
                model.train()  # Set model to training mode
            else:
                model.eval()   # Set model to evaluate mode

            running_loss = 0.0
            running_corrects = 0

            # Iterate over data.
            for item in dataloaders[phase]:

                inputs = item['image']
                labels = item['label']
                
                # zero the parameter gradients
                optimizer.zero_grad()

                # forward
                # track history if only in train
                with torch.set_grad_enabled(phase == 'train'):
                
                    # Get model outputs and calculate loss
                    outputs = model(Variable(inputs))
                    loss = criterion(outputs, Variable(labels))
                    
                    _, preds = torch.max(outputs, 1)

                    # backward + optimize only if in trainig phase
                    if phase == 'train':
                        loss.backward()
                        optimizer.step()

                # statistics
                running_loss += loss.item() * inputs.size(0)
                running_corrects += torch.sum(preds == labels.data)
                
            epoch_loss = running_loss / data_size[phase]
            epoch_acc = (1.0 * running_corrects.item()) / data_size[phase]

            # print validation phase statistics
            if phase == 'val':
                print('{} Loss: {:.4f} Acc: {:.4f}'.format(phase, epoch_loss, epoch_acc))

            # deep copy the model
            if phase == 'val' and epoch_acc > best_acc:
                best_acc = epoch_acc
                best_model_wts = copy.deepcopy(model.state_dict())
            if phase == 'val':
                val_acc_history.append(epoch_acc)
                
            torch.save(model.state_dict(), 'one_layer_models/epoch_'+str(epoch)+'.wts')
        
        print () 
    
    print('Best val Acc: {:4f}'.format(best_acc))

    # load best model weights
    model.load_state_dict(best_model_wts)
    
    # test accuracy
    model.eval()
    running_loss = 0.0
    running_corrects = 0
    
    for item in dataloaders['test']:
        inputs = item['image']
        labels = item['label']
        
        with torch.set_grad_enabled(False):
            outputs = model(Variable(inputs))
            loss = criterion(outputs, Variable(labels))
            _, preds = torch.max(outputs, 1)
            
        running_loss += loss.item() * inputs.size(0)
        running_corrects += torch.sum(preds == labels.data)
        
    epoch_loss = running_loss / len(dataloaders[phase].dataset)
    epoch_acc = (1.0 * running_corrects.item()) / len(dataloaders['test'].dataset)

    print('{} loss: {:.4f} Acc: {:.4f}'.format('test', epoch_loss, epoch_acc))
    print()
    
    # save the best model weights
    torch.save(model.state_dict(), 'one_layer_best_model.wts')
    
    return model, val_acc_history
        
    
    

In [54]:
%time train(resnet18, data_loaders, criterion, optimizer, num_epochs)

2997
Epoch 1/5
train
val
val Loss: 0.4186 Acc: 0.8763

Epoch 2/5
train
val
val Loss: 0.2217 Acc: 0.9298

Epoch 3/5
train
val
val Loss: 0.1766 Acc: 0.9365

Epoch 4/5
train
val
val Loss: 0.1600 Acc: 0.9465

Epoch 5/5
train
val
val Loss: 0.1566 Acc: 0.9431

Best val Acc: 0.946488
test loss: 0.2097 Acc: 0.9206

CPU times: user 1h 10min 27s, sys: 8min 4s, total: 1h 18min 32s
Wall time: 41min 50s


(ResNet(
   (conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
   (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
   (relu): ReLU(inplace)
   (maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
   (layer1): Sequential(
     (0): BasicBlock(
       (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
       (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
       (relu): ReLU(inplace)
       (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
       (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
     )
     (1): BasicBlock(
       (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
       (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
       (relu): ReLU(inplace)


In [23]:
# two layer model v1 architecture
class SSDTypeResnet(nn.Module):
    def __init__(self, resnet18):
        super(SSDTypeResnet, self).__init__()
        rc = list(resnet18.children())
        self.tillLayer2resnet = nn.Sequential(*rc[:6])
        self.avgpool2 = nn.AdaptiveAvgPool2d((1, 1))
        self.fc2 = torch.nn.Linear(128, 4)
        self.afterLayer2resnet = nn.Sequential(*rc[6:8])
        self.avgpool = rc[8]
        self.fc = rc[9]
        print(self.tillLayer2resnet)
        print(self.afterLayer2resnet)
        
 
    def forward(self, x):
        x = self.tillLayer2resnet(x)
        
        # added a prediction from layer 2 as well
        y = self.avgpool2(x)
        y = y.view(y.size(0), -1)
        out2 = self.fc2(y)
        
        x = self.afterLayer2resnet(x)
        x = self.avgpool(x)
        x = x.view(x.size(0), -1)
        out1 = self.fc(x)

        return out1, out2


ssdResnet = SSDTypeResnet(resnet18) 

# aT = resnet18.state_dict()['conv1.weight']
# bT = ssdResnet.state_dict()['tillLayer2resnet.0.weight']

print(ssdResnet)

Sequential(
  (0): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
  (1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (2): ReLU(inplace)
  (3): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
  (4): Sequential(
    (0): BasicBlock(
      (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace)
      (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    )
    (1): BasicBlock(
      (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace)
      (conv2): Conv2d(64, 64, ker

In [26]:
# two layer model v2 architecture
class SSDTypeResnet2(nn.Module):
    def __init__(self, resnet18):
        super(SSDTypeResnet2, self).__init__()
        rc = list(resnet18.children())
        self.tillLayer3resnet = nn.Sequential(*rc[:7])
        self.layer4first = rc[7][0]
        self.avgpool2 = nn.AdaptiveAvgPool2d((1, 1))
        self.fc2 = torch.nn.Linear(512, 4)
        self.layer4second = rc[7][1]
        self.avgpool = rc[8]
        self.fc = rc[9]
        
 
    def forward(self, x):
        x = self.tillLayer3resnet(x)
        x = self.layer4first(x)
        
        # added a prediction from the conv layer of layer 4 also
        y = self.avgpool2(x)
        y = y.view(y.size(0), -1)
        out2 = self.fc2(y)
        
        x = self.layer4second(x)
        x = self.avgpool(x)
        x = x.view(x.size(0), -1)
        out1 = self.fc(x)

        return out1, out2


ssdResnet2 = SSDTypeResnet2(resnet18) 

# aT = resnet18.state_dict()['conv1.weight']
# bT = ssdResnet.state_dict()['tillLayer2resnet.0.weight']

print(ssdResnet2)

SSDTypeResnet2(
  (tillLayer3resnet): Sequential(
    (0): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
    (1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (2): ReLU(inplace)
    (3): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
    (4): Sequential(
      (0): BasicBlock(
        (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu): ReLU(inplace)
        (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
      (1): BasicBlock(
        (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats

In [27]:
# two layer model v3 architecture
class SSDTypeResnet3(nn.Module):
    def __init__(self, resnet18):
        super(SSDTypeResnet3, self).__init__()
        rc = list(resnet18.children())
        self.tillLayer3resnet = nn.Sequential(*rc[:7])
        self.layer4first = rc[7][0]
        self.avgpool2 = nn.AdaptiveAvgPool2d((1, 1))
        self.layer4second = rc[7][1]
        self.avgpool = rc[8]
        self.fc = nn.Linear(1024,4)
        
 
    def forward(self, x):
        x = self.tillLayer3resnet(x)
        x = self.layer4first(x)
        
        y = self.avgpool2(x)
        y = y.view(y.size(0), -1)

        x = self.layer4second(x)
        x = self.avgpool(x)
        x = x.view(x.size(0), -1)
        
        # concatenated vectors from the last layer and from after 
        # the second conv layer in layer 4 of resnet 
        # and predicted using fc layer on that feature
        x = torch.cat((x,y),1)
        out = self.fc(x)

        return out
    

ssdResnet3 = SSDTypeResnet3(resnet18) 

# aT = resnet18.state_dict()['conv1.weight']
# bT = ssdResnet.state_dict()['tillLayer2resnet.0.weight']

print(ssdResnet3)

SSDTypeResnet3(
  (tillLayer3resnet): Sequential(
    (0): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
    (1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (2): ReLU(inplace)
    (3): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
    (4): Sequential(
      (0): BasicBlock(
        (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu): ReLU(inplace)
        (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
      (1): BasicBlock(
        (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats

In [29]:
#Two Layer Detection (SSD)
def train_two_layer(model, dataloaders, criterion, optimizer, num_epochs):
    # Begin
    val_acc_history = []
    data_size = {'train':dataset_size - val_size, 'val':val_size}
    
    print(len(dataloaders['val'].dataset))
    best_model_wts = copy.deepcopy(model.state_dict())
    best_acc = 0.0
    for epoch in range(num_epochs):
        print('Epoch {}/{}'.format(epoch+1, num_epochs))
        
        # Each epoch has a training and validation phase
        for phase in ['train', 'val']:
            print(phase)
            if phase == 'train':
                model.train()  # Set model to training mode
            else:
                model.eval()   # Set model to evaluate mode

            running_loss = 0.0
            running_corrects = 0

            # Iterate over data.
            for item in dataloaders[phase]:
                inputs = item['image']
                labels = item['label']
                
                # zero the parameter gradients
                optimizer.zero_grad()

                # forward
                # track history if only in train
                with torch.set_grad_enabled(phase == 'train'):
                
                    # Get model outputs and calculate loss
                    # output1, output2 = model(Variable(inputs))
                    output1 = model(Variable(inputs))
                    loss = criterion(output1, Variable(labels))
                    # loss1 = criterion(output1, Variable(labels))
                    #loss2 = criterion(output2, Variable(labels))
                    #loss = loss1 + loss2
                    
                    _, preds1 = torch.max(output1, 1)
                    #_, preds2 = torch.max(output2, 1)

                    # backward + optimize only if in trainig phase
                    if phase == 'train':
                        loss.backward()
                        optimizer.step()

                # statistics
                running_loss += loss.item() * inputs.size(0)
                
                sum1 = torch.sum(preds1 == labels.data)
                #sum2 = torch.sum(preds2 == labels.data)
                running_corrects += (sum1)
                
            epoch_loss = running_loss / data_size[phase]
            epoch_acc = (1.0 * running_corrects.item()) / ( data_size[phase])

            if phase == 'val':
                print('{} Loss: {:.4f} Acc: {:.4f}'.format(phase, epoch_loss, epoch_acc))

            # deep copy the model
            if phase == 'val' and epoch_acc > best_acc:
                best_acc = epoch_acc
                best_model_wts = copy.deepcopy(model.state_dict())
            if phase == 'val':
                val_acc_history.append(epoch_acc)
                
            torch.save(model.state_dict(), 'two_layer_models/epoch_'+str(epoch)+'.wts')
        
        print () 
    
    print('Best val Acc: {:4f}'.format(best_acc))

    # load best model weights
    model.load_state_dict(best_model_wts)
    
    # test accuracy
    model.eval()
    running_loss = 0.0
    running_corrects = 0
    
    for item in dataloaders['test']:
        inputs = item['image']
        labels = item['label']
        
        with torch.set_grad_enabled(False):
            # output1, output2 = model(Variable(inputs))
            output1 = model(Variable(inputs))
            loss = criterion(output1, Variable(labels))
            # loss2 = criterion(output2, Variable(labels))
            # loss = loss1 + loss2
            _, pred1 = torch.max(output1, 1)
            # _, pred2 = torch.max(output2, 1)
            
        running_loss += loss.item() * inputs.size(0)
        sum1 = torch.sum(pred1 == labels.data)
        # sum2 = torch.sum(pred2 == labels.data)
        running_corrects += (sum1)
        
    epoch_loss = running_loss / len(dataloaders[phase].dataset)
    epoch_acc = (1.0 * running_corrects.item()) / (len(dataloaders['test'].dataset))

    print('{} loss: {:.4f} Acc: {:.4f}'.format('test', epoch_loss, epoch_acc))
    print()
    
    # save the best model weights
    torch.save(model.state_dict(), 'two_layer_best_model.wts')
    
    return model, val_acc_history
        
    
    

In [18]:
%time train_two_layer(ssdResnet, data_loaders, criterion, optimizer, num_epochs)

2997
Epoch 1/5
train
val
val Loss: 1.7407 Acc: 0.7140

Epoch 2/5
train
val
val Loss: 1.5611 Acc: 0.7207

Epoch 3/5
train
val
val Loss: 1.4920 Acc: 0.7291

Epoch 4/5
train
val
val Loss: 1.4847 Acc: 0.7291

Epoch 5/5
train
val
val Loss: 1.4450 Acc: 0.7341

Best val Acc: 0.734114
test loss: 1.4340 Acc: 0.7424

CPU times: user 1h 11min 47s, sys: 8min 44s, total: 1h 20min 32s
Wall time: 43min


(SSDTypeResnet(
   (tillLayer2resnet): Sequential(
     (0): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
     (1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
     (2): ReLU(inplace)
     (3): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
     (4): Sequential(
       (0): BasicBlock(
         (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
         (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
         (relu): ReLU(inplace)
         (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
         (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
       )
       (1): BasicBlock(
         (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
         (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, tra

In [26]:
%time train_two_layer(ssdResnet2, data_loaders, criterion, optimizer, num_epochs)

2997
Epoch 1/5
train
val
val Loss: 1.5269 Acc: 0.6137

Epoch 2/5
train
val
val Loss: 1.5080 Acc: 0.6622

Epoch 3/5
train
val


KeyboardInterrupt: 

In [30]:
# two layer v3
%time train_two_layer(ssdResnet3, data_loaders, criterion, optimizer, num_epochs)

2997
Epoch 1/5
train
val
val Loss: 0.8354 Acc: 0.7860

Epoch 2/5
train
val
val Loss: 0.5551 Acc: 0.8863

Epoch 3/5
train
val
val Loss: 0.4512 Acc: 0.9097

Epoch 4/5
train
val
val Loss: 0.3910 Acc: 0.9064

Epoch 5/5
train
val
val Loss: 0.3482 Acc: 0.9197

Best val Acc: 0.919732
test loss: 0.3581 Acc: 0.9229

CPU times: user 1h 11min 20s, sys: 8min 6s, total: 1h 19min 27s
Wall time: 40min 56s


(SSDTypeResnet3(
   (tillLayer3resnet): Sequential(
     (0): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
     (1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
     (2): ReLU(inplace)
     (3): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
     (4): Sequential(
       (0): BasicBlock(
         (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
         (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
         (relu): ReLU(inplace)
         (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
         (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
       )
       (1): BasicBlock(
         (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
         (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, tr

# Testing and Accuracy Calculation
For applying detection, use a slding window method to test the above trained trained network on the detection task:<br/>
Take some windows of varying size and aspect ratios and slide it through the test image (considering some stride of pixels) from left to right, and top to bottom, detect the class scores for each of the window, and keep only those which are above a certain threshold value. There is a similar approach used in the paper -Faster RCNN by Ross Girshick, where he uses three diferent scales/sizes and three different aspect ratios, making a total of nine windows per pixel to slide. You need to write the code and use it in testing code to find the predicted boxes and their classes.

In [49]:
#function to run through sliding windows over the image
def sliding_window(img, model, predict_class, sthreshold=0.98):
    selec_threshold = sthreshold
    boxes_a = []
    boxes_b = []
    boxes_c = []
    
    # the various window sizes and aspect ratios to be chosen
    window_sizes = [128, 224, 480]
    aspect_ratios = [1, 0.5, 2]
    strides = [32]
    for window_size in window_sizes:
        for aspect_ratio in aspect_ratios:
            for stride in strides:
                img_batch = []
                bndbox_batch = []
                for y in range(0, img.size[1], stride):
                    for x in range(0, img.size[0], stride):
                        x1 = x
                        y1 = y
                        x2 = x1 + window_size*aspect_ratio
                        y2 = y1 + window_size
                        # check if it extends outside image borders
                        if x2 > img.size[0]:
                            break
                           
                        # crop the image, and append it to the batch
                        cropped_img = img.crop((x1,y1,x2,y2))
                        rescaled_img = composed_transform(cropped_img)
                        img_batch.append(rescaled_img)
                        bndbox_batch.append([x1, y1, x2, y2])
                        
                    # check if it extends outside image borders
                    if y2 > img.size[1]:
                        break
                
                if len(img_batch) == 0:
                    continue
                
                # predict the class labels of the img windows
                prediction_batch = predict_class(torch.stack(img_batch))
                # segregate the separate bounding boxes of the different classes
                for pred, bnd in zip(prediction_batch, bndbox_batch):
                    if pred[0] == classes.index('__background__'):
                        continue
                    elif pred[0] == classes.index('aeroplane') and pred[1] > selec_threshold: 
                        boxes_a.append([pred[1]] + bnd)
                    elif pred[0] == classes.index('bottle') and pred[1] > selec_threshold: 
                        boxes_b.append([pred[1]] + bnd)
                    elif pred[0] == classes.index('chair') and pred[1] > selec_threshold:
                        boxes_c.append([pred[1]] + bnd)
                        
    return boxes_a, boxes_b, boxes_c

                        
                
                

Apply non_maximum_supression to reduce the number of boxes. You are free to choose the threshold value for non maximum supression, but choose wisely [0,1].

In [46]:
# Non Maximum Suppression
def non_maximum_supression(boxes,image,threshold = 0.01):
    keep_boxes = []
    # if the bBoxes list is empty, then return empty list
    if len(boxes) == 0:
        return []
    boxes = np.array(boxes)
    # the get list of separate coordinates 
    x1 = boxes[:, 1]
    y1 = boxes[:, 2]
    x2 = boxes[:, 3]
    y2 = boxes[:, 4]
    # get the probability scores of all boxes
    scores = boxes[:, 0]
    # get the areas of all boxes
    areas = (x2 - x1 + 1)*(y2 - y1 + 1)
    
    # sort in decreasing order of probability scores
    order = scores.argsort()[::-1]
    
    # decide which all boxes to keep
    keep = []
    while order.size > 0:
        # take the highest probabilyt box remaining
        i = order[0]
        keep.append(i)
        xx1 = np.maximum(x1[i], x1[order[1:]])
        yy1 = np.maximum(y1[i], y1[order[1:]])
        xx2 = np.minimum(x2[i], x2[order[1:]])
        yy2 = np.minimum(y2[i], y2[order[1:]])

        w = np.maximum(0.0, xx2 - xx1 + 1)
        h = np.maximum(0.0, yy2 - yy1 + 1)
        inter = w * h
        ovr = inter / (areas[i] + areas[order[1:]] - inter)

        # discard all boxes that match this box considerably
        inds = np.where(ovr <= threshold)[0]
        order = order[inds + 1]
    for args in keep:
        keep_boxes.append(boxes[args].tolist() + [image])
        

    return keep_boxes
    

Test the trained model on the test dataset.

In [250]:
# One layer detection
def test(model):
    
    # load best model weights
    model_wts = torch.load('one_layer_v2/one_layer_best_model.wts')
    
    model.load_state_dict(model_wts)
    model.eval()
    print("Model Loaded")
    
    pred_bndboxes = [[],[],[],[]]
    actual = {}
    pr = [[],[],[],[]]  
    re = [[],[],[],[]]  
    
    # function to predict the class of given images, using our trained model
    def predict_class(inputs):
        # get the raw output from the model
        with torch.set_grad_enabled(False):
            outputs = model(Variable(inputs))
        _, preds = torch.max(outputs, 1)
        preds = preds.tolist()
        # convert to probability scores
        prob_layer = nn.Softmax(dim=1)
        probs = prob_layer(outputs).tolist()
        out_probs = np.amax(probs, axis=1)
        # return bbox along with the prob score
        return list(zip(preds, out_probs))
        
        
    # print bounding box on the image    
    def print_boxes(img, bbs, color):
        for bb in bbs:
            cv2.rectangle(img, (int(bb[1]), int(bb[2])), (int(bb[3]), int(bb[4])), color, 2)
        return img
        
    count = 0
    allowed_labels = ['aeroplane', 'bottle', 'chair']
    label_counts = [0,0,0,0]
    test_images_folder = os.path.join('./test', 'VOC2007/JPEGImages/')
    test_objects_folder = os.path.join('./test', 'VOC2007/Annotations/')
    # read the test images
    for idx, image_name in enumerate(os.listdir(test_images_folder)):
        image_file = os.path.join(test_images_folder, image_name)
        objects_file = os.path.join(test_objects_folder, image_name.split(".")[0]+'.xml')
        tree = ET.parse(objects_file)
        img = Image.open(image_file).convert('RGB')
        root = tree.getroot()
        labels = []
        bndboxes = []
        for obj in root.findall('object'):
            label = obj.find('name').text
            if label not in allowed_labels:
                continue
            label_counts[classes.index(label)] += 1
            bndbox = obj.find('bndbox')
            coord = []
            for cd in bndbox:
                coord.append(int(cd.text))
            labels.append(label)
            bndboxes.append(coord)
            
        # ignore the image if no object present in it
        if len(labels) == 0:
            continue
        count = count + 1
        print(idx+1, ": ", count, ": ", image_name)
    
        # run the sliding window approach to extract all possible bounding boxes with objects
        boxes_a, boxes_b, boxes_c = sliding_window(img, model, predict_class)
        
        # run NMS separately on all boxes of a specific class to get the reduced number of boxes
        pred_boxes = [[],[],[],[]]
        pred_boxes[classes.index('aeroplane')] = non_maximum_supression(boxes_a, image_name)
        pred_boxes[classes.index('bottle')] = non_maximum_supression(boxes_b, image_name)
        pred_boxes[classes.index('chair')] = non_maximum_supression(boxes_c, image_name)
             
        # save the predicted object Bboxes in the image to the global list
        for idx, class_name in enumerate(classes):
            if class_name == '__background__':
                continue
            pred_bndboxes[idx] = pred_bndboxes[idx] + pred_boxes[idx]
            
        # print the bBoxes on the image and store it
        img = cv2.imread(image_file)
        img = print_boxes(img, pred_boxes[classes.index('aeroplane')], (255,0,0)) #blue 
        img = print_boxes(img, pred_boxes[classes.index('bottle')], (0,255,0))    #green
        img = print_boxes(img, pred_boxes[classes.index('chair')], (0,0,255))     #red
        cv2.imwrite("outputs/"+image_name, img)
        actual[image_name] = [[],[],[],[]]
        
        # save the actual object bBoxes in the image to the global
        for lbl, bd in zip(labels, bndboxes):
            actual[image_name][classes.index(lbl)].append(bd)
    
    # calculate the precision and recall vectors for each class
    correct = [[],[],[],[]]
    for idx, class_name in enumerate(classes):
        if class_name == '__background__':
            continue
        pr[idx], re[idx], correct[idx] = calculate_precision_recall(pred_bndboxes[idx], actual, idx, label_counts[idx])
    
    # get the Average precision for each class
    ap = [0, 0, 0, 0]
    for idx, class_name in enumerate(classes):
        if class_name == '__background__':
            continue
        # print(pr[idx],re[idx])
        ap[idx] = calculate_average_precision(pr[idx], re[idx])
        
    # print the average precision for all classes
    # and the correctly predicted images for all classes
    ap_sum = 0.0
    for idx, class_name in enumerate(classes):
        if class_name == '__background__':
            continue
        print("Average Precision for class "+class_name+" is "+str(ap[idx]))
        print("Correctly predicted images for class "+class_name+" are "+str(correct[idx]))
        ap_sum += ap[idx]
     
    # get the mAP score as the average of AP of all classes
    mAP = ap_sum/(len(ap)-1)
    
    print("MAP:",mAP)
    return mAP
        
            
        
        

In [32]:
# function to calculate AP over a class
def calculate_average_precision(pr, re):
    sre, spr = zip(*sorted(zip(re, pr)))
    ipr = []       # to store the interpolated precision
    
    # to get the interpolated precision from the sorted precision values
    for i, p in enumerate(reversed(spr)):
        if i == 0:
            ipr.append(p)
        elif p <= ipr[i-1]:
            ipr.append(ipr[i-1])
        else:
            ipr.append(p)
        
    ipr.reverse()
    
    # the values of recall at which the ipr is taken
    markers= [0.0,0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1.0]
    mark = 0
    ipr_sum = 0.0
    count = 0
    
    # to get sum of ipr at all these recall values
    for i,r in enumerate(sre):
        if r >= markers[mark]:
            ipr_sum += ipr[i]
            while(mark <= 10 and markers[mark] <= r):
                mark += 1
            count += 1
        if mark > 10:
            break
    
    ap = ipr_sum/count
    return ap
    
    

In [33]:
# function to calculate the precision and recall at each instance of the class
def calculate_precision_recall(pred_bndboxes, actual, label_idx, label_count):
    
    tp = 0
    # sort the predictions in decreasing order of probability scores
    pred_sorted = sorted(pred_bndboxes, key=lambda x: x[0])
    precision = []
    recall = []
    correct = []
    
    # iterating over the predictions 
    for idx, prb in enumerate(pred_sorted):
        flag = 0
        img = prb[5]
        # check the IoU match of the predicition with the
        # objects in that image
        for acb in actual[img][label_idx]:
            iou = get_iou(acb, prb[1:5])
            
            # count as a True Positive (tp) only if IoU >= 0.5
            if iou >= 0.5:
                flag = 1
                correct.append(img)
                break
        tp = tp + flag 
    
        # get precision and recall at this rank in the list
        precision.append(1.0*tp/(idx+1))
        recall.append(1.0*tp/label_count)
        
    # return the complete precision and recall vectors
    # return the correctly predicted images
    return precision, recall, correct

In [251]:
%time test(resnet18)

Model Loaded
9 :  1 :  009570.jpg
10 :  2 :  004055.jpg
16 :  3 :  004964.jpg
23 :  4 :  005537.jpg
29 :  5 :  000573.jpg
36 :  6 :  005622.jpg
53 :  7 :  009742.jpg
54 :  8 :  001921.jpg
56 :  9 :  005725.jpg
72 :  10 :  007348.jpg
74 :  11 :  005043.jpg
77 :  12 :  002907.jpg
81 :  13 :  000316.jpg
86 :  14 :  007738.jpg
89 :  15 :  005907.jpg
92 :  16 :  006263.jpg
93 :  17 :  009012.jpg
99 :  18 :  009663.jpg
102 :  19 :  008131.jpg
103 :  20 :  004262.jpg
117 :  21 :  005827.jpg
119 :  22 :  003544.jpg
121 :  23 :  004744.jpg
126 :  24 :  005286.jpg
127 :  25 :  005218.jpg
128 :  26 :  007564.jpg
137 :  27 :  000277.jpg
142 :  28 :  000642.jpg
151 :  29 :  005103.jpg
152 :  30 :  003649.jpg
157 :  31 :  008646.jpg
159 :  32 :  005050.jpg
163 :  33 :  002583.jpg
165 :  34 :  005474.jpg
175 :  35 :  009846.jpg
179 :  36 :  008110.jpg
195 :  37 :  009076.jpg
206 :  38 :  005994.jpg
211 :  39 :  003532.jpg
218 :  40 :  000696.jpg
225 :  41 :  007698.jpg
227 :  42 :  006780.jpg
229 :  

1678 :  328 :  006546.jpg
1692 :  329 :  008440.jpg
1693 :  330 :  004045.jpg
1698 :  331 :  009356.jpg
1706 :  332 :  007806.jpg
1709 :  333 :  000216.jpg
1722 :  334 :  003144.jpg
1729 :  335 :  001086.jpg
1730 :  336 :  005294.jpg
1734 :  337 :  002777.jpg
1737 :  338 :  007335.jpg
1744 :  339 :  006830.jpg
1745 :  340 :  009929.jpg
1748 :  341 :  006142.jpg
1756 :  342 :  003488.jpg
1765 :  343 :  005323.jpg
1778 :  344 :  006051.jpg
1779 :  345 :  000008.jpg
1780 :  346 :  003574.jpg
1788 :  347 :  008198.jpg
1789 :  348 :  008563.jpg
1791 :  349 :  003221.jpg
1796 :  350 :  001814.jpg
1799 :  351 :  009154.jpg
1802 :  352 :  004032.jpg
1804 :  353 :  006923.jpg
1811 :  354 :  009222.jpg
1833 :  355 :  009802.jpg
1840 :  356 :  001783.jpg
1843 :  357 :  008153.jpg
1847 :  358 :  005226.jpg
1853 :  359 :  005313.jpg
1856 :  360 :  002301.jpg
1857 :  361 :  006592.jpg
1868 :  362 :  000817.jpg
1872 :  363 :  005216.jpg
1877 :  364 :  004645.jpg
1879 :  365 :  007202.jpg
1894 :  366 

3250 :  644 :  006954.jpg
3254 :  645 :  001629.jpg
3255 :  646 :  008791.jpg
3266 :  647 :  009853.jpg
3269 :  648 :  009435.jpg
3270 :  649 :  007832.jpg
3277 :  650 :  003819.jpg
3282 :  651 :  009824.jpg
3284 :  652 :  002207.jpg
3289 :  653 :  003776.jpg
3292 :  654 :  004712.jpg
3293 :  655 :  006888.jpg
3297 :  656 :  002560.jpg
3301 :  657 :  002707.jpg
3306 :  658 :  007785.jpg
3314 :  659 :  008751.jpg
3320 :  660 :  005279.jpg
3321 :  661 :  005976.jpg
3333 :  662 :  006274.jpg
3342 :  663 :  006752.jpg
3352 :  664 :  003067.jpg
3358 :  665 :  000327.jpg
3360 :  666 :  009478.jpg
3370 :  667 :  006895.jpg
3385 :  668 :  001025.jpg
3397 :  669 :  006195.jpg
3419 :  670 :  001105.jpg
3423 :  671 :  002809.jpg
3428 :  672 :  000652.jpg
3437 :  673 :  001996.jpg
3439 :  674 :  009329.jpg
3440 :  675 :  007391.jpg
3446 :  676 :  001720.jpg
3451 :  677 :  005464.jpg
3457 :  678 :  005491.jpg
3467 :  679 :  003049.jpg
3472 :  680 :  000157.jpg
3485 :  681 :  003707.jpg
3493 :  682 

0.16989973129835356

In [51]:
#Two Layer Detection
def test_two_layer(model):
    model_wts = torch.load('two_layer_v3/two_layer_best_model.wts')
    # load best model weights
    model.load_state_dict(model_wts)
    model.eval()
    print("Model Loaded")
    
    pred_bndboxes = [[],[],[],[]]
    actual = {}
    pr = [[],[],[],[]]  
    re = [[],[],[],[]]  
    
    # function to predict the class of given images, using our trained model
    def predict_class(inputs):
        # get the raw output from the model
        with torch.set_grad_enabled(False):
            outputs = model(Variable(inputs))
        _, preds = torch.max(outputs, 1)
        preds = preds.tolist()
        # convert to probability scores
        prob_layer = nn.Softmax(dim=1)
        probs = prob_layer(outputs).tolist()
        out_probs = np.amax(probs, axis=1)
        # return bbox along with the prob score
        return list(zip(preds, out_probs))
    
    # print the bBoxes on the image
    def print_boxes(img, bbs, color):
        for bb in bbs:
            cv2.rectangle(img, (int(bb[1]), int(bb[2])), (int(bb[3]), int(bb[4])), color, 2)
        return img
        
    count = 0
    allowed_labels = ['aeroplane', 'bottle', 'chair']
    label_counts = [0,0,0,0]
    test_images_folder = os.path.join('./test', 'VOC2007/JPEGImages/')
    test_objects_folder = os.path.join('./test', 'VOC2007/Annotations/')
    # read the test images
    for idx, image_name in enumerate(os.listdir(test_images_folder)):
        image_file = os.path.join(test_images_folder, image_name)
        objects_file = os.path.join(test_objects_folder, image_name.split(".")[0]+'.xml')
        tree = ET.parse(objects_file)
        img = Image.open(image_file).convert('RGB')
        root = tree.getroot()
        labels = []
        bndboxes = []
        for obj in root.findall('object'):
            label = obj.find('name').text
            if label not in allowed_labels:
                continue
            label_counts[classes.index(label)] += 1
            bndbox = obj.find('bndbox')
            coord = []
            for cd in bndbox:
                coord.append(int(cd.text))
            labels.append(label)
            bndboxes.append(coord)
        
        # ignore the image if no object present in it
        if len(labels) == 0:
            continue
        count = count + 1
        print(idx+1, ": ", count, ": ", image_name)
    
        # run the sliding window approach to extract all possible bounding boxes with objects
        boxes_a, boxes_b, boxes_c = sliding_window(img, model, predict_class, sthreshold=0.85)
        
        # run NMS separately on all boxes of a specific class to get the reduced number of boxes
        pred_boxes = [[],[],[],[]]
        pred_boxes[classes.index('aeroplane')] = non_maximum_supression(boxes_a, image_name)
        pred_boxes[classes.index('bottle')] = non_maximum_supression(boxes_b, image_name)
        pred_boxes[classes.index('chair')] = non_maximum_supression(boxes_c, image_name)
             
        # save the predicted object Bboxes in the image to the global list
        for idx, class_name in enumerate(classes):
            if class_name == '__background__':
                continue
            pred_bndboxes[idx] = pred_bndboxes[idx] + pred_boxes[idx]
        
        # print the bBoxes on the image and store it
        img = cv2.imread(image_file)
        img = print_boxes(img, pred_boxes[classes.index('aeroplane')], (255,0,0)) #blue 
        img = print_boxes(img, pred_boxes[classes.index('bottle')], (0,255,0))    #green
        img = print_boxes(img, pred_boxes[classes.index('chair')], (0,0,255))     #red
        cv2.imwrite("outputs/"+image_name, img)
        actual[image_name] = [[],[],[],[]]
        
        # save the actual object bBoxes in the image to the global
        for lbl, bd in zip(labels, bndboxes):
            actual[image_name][classes.index(lbl)].append(bd)
        
    # calculate the precision and recall vectors for each class
    correct = [[],[],[],[]]
    for idx, class_name in enumerate(classes):
        if class_name == '__background__':
            continue
        pr[idx], re[idx], correct[idx] = calculate_precision_recall(pred_bndboxes[idx], actual, idx, label_counts[idx])
    
    # get the Average precision for each class
    ap = [0, 0, 0, 0]
    for idx, class_name in enumerate(classes):
        if class_name == '__background__':
            continue
        # print(pr[idx],re[idx])
        ap[idx] = calculate_average_precision(pr[idx], re[idx])
        
    
    # print the average precision for all classes
    # and the correctly predicted images for all classes
    ap_sum = 0.0
    for idx, class_name in enumerate(classes):
        if class_name == '__background__':
            continue
        print("Average Precision for class "+class_name+" is "+str(ap[idx]))
        print("Correctly predicted images for class "+class_name+" are "+str(correct[idx]))
        ap_sum += ap[idx]
         
    # get the mAP score as the average of AP of all classes
    mAP = ap_sum/(len(ap)-1)
    
    print("MAP:",mAP)
    return mAP

In [52]:
%time test_two_layer(ssdResnet3)

Model Loaded
9 :  1 :  009570.jpg
10 :  2 :  004055.jpg
16 :  3 :  004964.jpg
23 :  4 :  005537.jpg
29 :  5 :  000573.jpg
36 :  6 :  005622.jpg
53 :  7 :  009742.jpg
54 :  8 :  001921.jpg
56 :  9 :  005725.jpg
72 :  10 :  007348.jpg
74 :  11 :  005043.jpg
77 :  12 :  002907.jpg
81 :  13 :  000316.jpg
86 :  14 :  007738.jpg
89 :  15 :  005907.jpg
92 :  16 :  006263.jpg
93 :  17 :  009012.jpg
99 :  18 :  009663.jpg
102 :  19 :  008131.jpg
103 :  20 :  004262.jpg
117 :  21 :  005827.jpg
119 :  22 :  003544.jpg
121 :  23 :  004744.jpg
126 :  24 :  005286.jpg
127 :  25 :  005218.jpg
128 :  26 :  007564.jpg
137 :  27 :  000277.jpg
142 :  28 :  000642.jpg
151 :  29 :  005103.jpg
152 :  30 :  003649.jpg
157 :  31 :  008646.jpg
159 :  32 :  005050.jpg
163 :  33 :  002583.jpg
165 :  34 :  005474.jpg
175 :  35 :  009846.jpg
179 :  36 :  008110.jpg
195 :  37 :  009076.jpg
206 :  38 :  005994.jpg
211 :  39 :  003532.jpg
218 :  40 :  000696.jpg
225 :  41 :  007698.jpg
227 :  42 :  006780.jpg
229 :  

1678 :  328 :  006546.jpg
1692 :  329 :  008440.jpg
1693 :  330 :  004045.jpg
1698 :  331 :  009356.jpg
1706 :  332 :  007806.jpg
1709 :  333 :  000216.jpg
1722 :  334 :  003144.jpg
1729 :  335 :  001086.jpg
1730 :  336 :  005294.jpg
1734 :  337 :  002777.jpg
1737 :  338 :  007335.jpg
1744 :  339 :  006830.jpg
1745 :  340 :  009929.jpg
1748 :  341 :  006142.jpg
1756 :  342 :  003488.jpg
1765 :  343 :  005323.jpg
1778 :  344 :  006051.jpg
1779 :  345 :  000008.jpg
1780 :  346 :  003574.jpg
1788 :  347 :  008198.jpg
1789 :  348 :  008563.jpg
1791 :  349 :  003221.jpg
1796 :  350 :  001814.jpg
1799 :  351 :  009154.jpg
1802 :  352 :  004032.jpg
1804 :  353 :  006923.jpg
1811 :  354 :  009222.jpg
1833 :  355 :  009802.jpg
1840 :  356 :  001783.jpg
1843 :  357 :  008153.jpg
1847 :  358 :  005226.jpg
1853 :  359 :  005313.jpg
1856 :  360 :  002301.jpg
1857 :  361 :  006592.jpg
1868 :  362 :  000817.jpg
1872 :  363 :  005216.jpg
1877 :  364 :  004645.jpg
1879 :  365 :  007202.jpg
1894 :  366 

3250 :  644 :  006954.jpg
3254 :  645 :  001629.jpg
3255 :  646 :  008791.jpg
3266 :  647 :  009853.jpg
3269 :  648 :  009435.jpg
3270 :  649 :  007832.jpg
3277 :  650 :  003819.jpg
3282 :  651 :  009824.jpg
3284 :  652 :  002207.jpg
3289 :  653 :  003776.jpg
3292 :  654 :  004712.jpg
3293 :  655 :  006888.jpg
3297 :  656 :  002560.jpg
3301 :  657 :  002707.jpg
3306 :  658 :  007785.jpg
3314 :  659 :  008751.jpg
3320 :  660 :  005279.jpg
3321 :  661 :  005976.jpg
3333 :  662 :  006274.jpg
3342 :  663 :  006752.jpg
3352 :  664 :  003067.jpg
3358 :  665 :  000327.jpg
3360 :  666 :  009478.jpg
3370 :  667 :  006895.jpg
3385 :  668 :  001025.jpg
3397 :  669 :  006195.jpg
3419 :  670 :  001105.jpg
3423 :  671 :  002809.jpg
3428 :  672 :  000652.jpg
3437 :  673 :  001996.jpg
3439 :  674 :  009329.jpg
3440 :  675 :  007391.jpg
3446 :  676 :  001720.jpg
3451 :  677 :  005464.jpg
3457 :  678 :  005491.jpg
3467 :  679 :  003049.jpg
3472 :  680 :  000157.jpg
3485 :  681 :  003707.jpg
3493 :  682 

0.40353290655275126