<a href="https://colab.research.google.com/github/JitindraFartiyal/Object-Detection/blob/object-detection-v1/Yolov1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Connecting to Google drive to upload dataset. This step is only required if you are using Google Colab and uploading dataset from Google Drive


In [88]:
from google.colab import drive
drive.mount('/content/gdrive')

Drive already mounted at /content/gdrive; to attempt to forcibly remount, call drive.mount("/content/gdrive", force_remount=True).


Importing all libraries

In [0]:
from comet_ml import Experiment

import os
import pandas as pd
import numpy as np
import math

import torch
import torch.nn as nn
import torch.optim as optim
import torchvision
import pdb
import matplotlib.pyplot as plt
import cv2

from collections import OrderedDict 
from google.colab.patches import cv2_imshow
from torch.utils.data import Dataset, DataLoader
from skimage import io, transform
from torchvision import transforms, datasets, utils


We need to convert class ['Car','Cyclist'....] in the label file into an integer. As, we are not using label file into our model, we need not to use one hot encoding or other encoding techniques. We are simply converting it for ease of use.

In [0]:
def class_encoding(label):
   
  for i in range(label.shape[0]):

    if label.iloc[i,0] == 'Car':
      label.iloc[i,0] = 0
    elif label.iloc[i,0] == 'Cyclist':
      label.iloc[i,0] = 1
    elif label.iloc[i,0] == 'Pedestrian':
      label.iloc[i,0] = 2
    elif label.iloc[i,0] == 'Tram':
      label.iloc[i,0] = 3
    elif label.iloc[i,0] == 'Truck':
      label.iloc[i,0] = 4
    elif label.iloc[i,0] == 'Van':
      label.iloc[i,0] = 5
    elif label.iloc[i, 0] == 'DontCare':
      label.iloc[i, 0] = 6
    elif label.iloc[i,0] == 'Misc':
      label.iloc[i,0] = 7
    elif label.iloc[i,0] == 'Person_sitting':
      label.iloc[i,0] = 8


Kitti Dataset has a different format for label file as compared to the YOLO format for label file. We need to convert format of our Kitti Dataset label file into format of YOLO label file.                                                       

---


Note : We are rescaling coordinates of our bounding box into output image size and not the input image size as we need to compare the labels with the ouput of our model

In [0]:
def transform_label(label, number_of_classes, image, input_image_size):
  
  # In case of Bounding boxes, coordinate system doesnot start from bottom-left as we see normally in our mathematics, instead it starts from top-left corner
  top_left_x = label[:,1]
  top_left_y = label[:,2]
  bottom_right_x = label[:,3]
  bottom_right_y = label[:,4]

  height = bottom_right_y - top_left_y
  width = bottom_right_x - top_left_x
  center_x = top_left_x + width/2
  center_y = top_left_y + height/2

  # Reducing the scale of the coordinates of bounding box in the label file into output image scale.
  # We need to do this, so that at training and testing, we can compute loss easily, if all are in the same scale. 
  label[:,1] = (center_x/image.shape[1])*input_image_size
  label[:,2] = (center_y/image.shape[0])*input_image_size
  label[:,3] = (height /image.shape[0])*input_image_size
  label[:,4] = (width/image.shape[1])*input_image_size

  # Adding classes probabilites columns
  target = np.zeros((label.shape[0],label.shape[1] + number_of_classes)) 
  target[:,0:5] = label
  
  for i in range(0,label.shape[0]):
    if(target[i,0:1] == 0): # Prob_Class(Car) = 1 and rest 0, if Car is detected 
      target[i,5:6] = 1
    elif(target[i,0:1] == 1): # Prob_Class(Cyclist) = 1 and rest 0, if Cyclist is detected
      target[i,6:7] = 1
    elif(target[i,0:1] == 2): # Prob_Class(Pedestrian) = 1 and rest 0, if Pedestrian is detected
      target[i,7:8] = 1
    elif(target[i,0:1] == 3): # Prob_Class(Tram) = 1 and rest 0, if Tram is detected
      target[i,8:9] = 1
    elif(target[i,0:1] == 4): # Prob_Class(Truck) = 1 and rest 0, if Truck is detected 
      target[i,9:10] = 1
    elif(target[i,0:1] == 5): # Prob_Class(Van) = 1 and rest 0, if Van is detected 
      target[i,10:11] = 1

  return target


We need to preprocess the data. It means to keep the data i.e the input to our Convolutional Neural Network (CNN model) into an uniform form.
Here, our input will be a dictionary of image and its label. Only, the images will be the input for our CNN model and label will be used for calculating loss. Largely, we only need to preprocess the data which we input in our CNN model with resizing, normalizing, mean subtraction etc.
We also need to do padding on our label files, so that they become of same size tensor for collating in batch size.

In [0]:
class Resize(object):

  def __init__(self, input_image_size):
    # Input image size is the size of the image that we are putting it into our CNN Model. In this case, it is [270 X 270]
    self.input_image_size = (input_image_size, input_image_size)
  
  def __call__(self, data_sample):
    image, label = data_sample['image'], data_sample['label']
    image = transform.resize(image, self.input_image_size, preserve_range=True, anti_aliasing=True)

    return {'image' : image, 'label' : label}

class ToTensor(object):
  
  def __call__(self, data_sample):
    image, label = data_sample['image'], data_sample['label']
    image = image.transpose((2, 0, 1)) # Converting the image form from (H X W X C) into (C X H X W)

    # If we donot use float() at end, by default, torch.from_numpy() will convert our input of our CNN model into a Float64 type
    # We have to convert our CNN model type also in Float64 i.e Double or else it will throw error. 
    # By default, CNN model type is Float16, so better to convert the input into Float16 type here only
    return {'image' : torch.from_numpy(image).float(),
            'label' : torch.from_numpy(label).float()
            }

class Normalization(object):

  def __call__(self, data_sample):
    image, label = data_sample['image'], data_sample['label']
    
    image_mean = np.mean(image, axis = 0)
    image_std = np.std(image, axis = 0)
    image = (image_mean-image) / (image_std)

    return {'image' : image, 'label' : label}

# Our Dataset has different lengths data in our label file, so when stacking into a single batch during training, it throw error because of
# variable dimensions. One of the solution is to pad the label file with an arbitrary number.  
class BatchPadding(object):

  def __init__(self, pad):
    self.pad = pad
  
  def __call__(self, data_sample):
    image, label = data_sample['image'], data_sample['label']
    batched_label = np.zeros((self.pad,label.shape[1]))
    batched_label[0:label.shape[0],:] = label

    return {'image' : image, 'label' : batched_label}


Here, we are defining a class for our dataset. For our problem of Object Detection for Self Driving Cars, we are using KittiDataset


In [0]:
class KittiDataset(Dataset):

    def __init__(self, labels_dir, images_dir, number_of_classes, input_image_size, transform=None):
      self.labels_dir = labels_dir
      self.images_dir = images_dir
      self.number_of_classes = number_of_classes
      self.input_image_size = input_image_size
      self.transform = transform

      self.labels_dict = {}
      self.filename = []
      self.__init__dataset()

    def __init__dataset(self):
      print('...............Initializing Dataset...............')
      
      index = 0
      for file in os.listdir(self.labels_dir):
        print('Reading label file : ' + file + '...')
        
        label_path = self.labels_dir + '/' + file
        label = pd.read_csv(filepath_or_buffer=label_path, sep=' ', header=None, index_col=False)
        
        # Taking out relevant features out from the label dataframe
        label = label.iloc[:,[0,4,5,6,7]] 
        label.columns = ['Class','TopLeftX','TopLeftY','BottomRightX','BottomRightY'] 
    
        # Class Encoding
        # Car=0, Cyclist=1, Pedestrian=2, Tram=3, Truck=4, Van=5.......
        class_encoding(label)

        self.labels_dict[index] = label
        self.filename.append(file[0:6])
        index = index + 1

    def __len__(self):
      return len(self.labels_dict)

    def __getitem__(self, index):
      image_path = self.images_dir + '/' + self.filename[index] + '.png'
      image = io.imread(image_path)
      
      label = self.labels_dict[index]
      label = label.to_numpy(dtype = np.float16) 
      
      # Convert the label into YOLO format (class, center_x, center_y, height, width, class_prob1 ..... class_probn)
      target = transform_label(label, self.number_of_classes, image, self.input_image_size)

      data_sample = {'image' : image, 'label' : target}
      
      if self.transform:
        data_sample = self.transform(data_sample)
        
      return data_sample


After creating the datatset class, we now need to create our CNN model class, where we define our resnet34 architecture and our own custom Functional Layer. We also going to unfrezze the resnet50 layers as these are already having predetermined weights for classifying objects and we don't want to flush off them in our back propagation

In [0]:
class Net(nn.Module):
    def __init__(self, input_grids, number_of_cnn_output):

      super(Net, self).__init__()
      print('...............Initializing Convolutional Neural Network...............')
      self.resnet50 = torchvision.models.resnet50(pretrained = True) # Using Resnet50 architecture
      
      # Freezing all the layers
      self.resnet50.layer1.requires_grad=False
      self.resnet50.layer2.requires_grad=False
      self.resnet50.layer3.requires_grad=False
      self.resnet50.layer4.requires_grad=False

      # Adding new Fully Connected and Sigmoid layer
      self.number_of_filters = self.resnet50.fc.out_features
      self.input_grids = input_grids
      self.number_of_cnn_output = number_of_cnn_output
      
      self.leaky_relu = nn.LeakyReLU(negative_slope=0.01)
      self.batch_norm_fc = nn.BatchNorm1d(num_features=self.number_of_filters)

      self.fc1 = nn.Linear(self.number_of_filters, input_grids*number_of_cnn_output, bias=True)
      self.sigmoid = nn.Sigmoid()

    def forward(self,x):
      x = self.resnet50.conv1(x)
      x = self.resnet50.bn1(x)
      x = self.resnet50.relu(x)
      x = self.resnet50.maxpool(x)

      x = self.resnet50.layer1(x)
      x = self.resnet50.layer2(x)
      x = self.resnet50.layer3(x)
      x = self.resnet50.layer4(x)
      x = self.resnet50.avgpool(x)

      x = x.view(-1,self.resnet50.fc.in_features)
      x = self.resnet50.fc(x)

      x = self.leaky_relu(x)
      x = self.batch_norm_fc(x)

      x = self.fc1(x)
      x = self.sigmoid(x)

      return x


After prediction, we will get many bounding boxes for a single class. To eliminate that, we need an algorithm to find which bounding box matches the ground truth bounding box by how much. We call this algorithm Intersection of Union (IOU).


```
IOU = (area of intersection) / (area of bounding box1 + area of bounding box2 - area of intersection) 
```


In [0]:
def calculate_IOU(b1X, b1Y, b2X, b2Y, b3X, b3Y, b4X, b4Y):

  # b1X, b1Y, b2X, b2Y corresponds to topleft and bottom right coordinates of bounding box1 
  # b3X, b3Y, b4X, b4Y corresponds to topleft and bottom right coordinates of bounding box2
  xA = max(b1X,b3X)
  yA = max(b1Y,b3Y)
  xB = min(b2X,b4X)
  yB = min(b2Y,b4Y)
 
  area_intersection = max(0,xB-xA+1) * max(0,yB-yA+1)
  area_of_boundingbox1 = (b2X-b1X+1) * (b2Y-b1Y+1)
  area_of_boundingbox2 = (b4X-b3X+1) * (b4Y-b3Y+1)
 
  iou = area_intersection/float(area_of_boundingbox1 + area_of_boundingbox2 - area_intersection + 0.0001)
  return iou


After calculating IOU of all the bouding boxes, we need to return the bounding box whose IOU is the highest. 


---
Remember the coordinates of the bounding box are scaled i.e x,y are offsets with respect to grid and h,w are scaled between 0 and 1 with respect to image height and width 


In [0]:
def find_highest_IOU(predicted_grid_output, ground_truth_grid_output, bounding_boxes, grid_cell, grid_offset, input_grids):
  grid = np.sqrt(input_grids)
  max_iou = 0
  max_iou_index = 0
  
  x_offset = int(grid_cell/grid)
  y_offset = int(grid_cell%grid)

  topLeftX = x_offset * grid_offset
  topLeftY = y_offset * grid_offset

  for number_of_bbox in range(0,bounding_boxes):

    predicted_center_x = (predicted_grid_output[(number_of_bbox*5) + 1].item() * grid_offset) + topLeftX
    predicted_center_y = (predicted_grid_output[(number_of_bbox*5) + 2].item() * grid_offset) + topLeftY
    predicted_height = predicted_grid_output[(number_of_bbox*5) + 3].item() * grid_offset * grid
    predicted_width = predicted_grid_output[(number_of_bbox*5) + 4].item() * grid_offset * grid
    
    predicted_topLeftX = predicted_center_x - predicted_width/2
    predicted_topLeftY = predicted_center_y - predicted_height/2 
    predicted_bottomRightX = predicted_center_x + predicted_width/2
    predicted_bottomRightY = predicted_center_y + predicted_height/2
    
    ground_truth_topLeftX = ground_truth_grid_output[1].item() - ground_truth_grid_output[4].item()/2
    ground_truth_topLeftY = ground_truth_grid_output[2].item() - ground_truth_grid_output[3].item()/2
    ground_truth_bottomRightX = ground_truth_grid_output[1].item() + ground_truth_grid_output[4].item()/2
    ground_truth_bottomRightY = ground_truth_grid_output[2].item() + ground_truth_grid_output[3].item()/2

    iou = calculate_IOU(predicted_topLeftX,predicted_topLeftY,predicted_bottomRightX,predicted_bottomRightY,
                        ground_truth_topLeftX,ground_truth_topLeftY,ground_truth_bottomRightX,ground_truth_bottomRightY)

    if(iou > max_iou):
      max_iou = iou
      max_iou_index = number_of_bbox

  return max_iou_index


Now, we will calculate the Loss function. It comprises of three losses :

*   Classification Loss : if object is detected, the mean squared error loss of class probabilites
*   Localization Loss : if object is detected, the mean squared error loss of coordinates of bounding box
*   Confidence Loss : the mean squared error loss of box confidence, when object is detected and when it is not

In the end, we will mulitply our loss with lambda_coord and lambda_noobject which regularize the imbalance and reduce the effect of background noise







In [0]:
def yolo_loss(batched_output, batched_label, input_grids, grid_offset, bounding_boxes, classes, lambda_coord, lambda_noobject):
  
  total_loss = torch.tensor([0], dtype=torch.float)
  grid = np.sqrt(input_grids)

  for batch_number in range(batched_output.size()[0]):

    classification_loss = torch.tensor([0], dtype=torch.float)
    localization_loss_centerpoint = torch.tensor([0], dtype=torch.float)
    localization_loss_aspect_ratio = torch.tensor([0], dtype=torch.float)
    confidence_loss_object = torch.tensor([0], dtype=torch.float)
    confidence_loss_noobject = torch.tensor([0], dtype=torch.float)

    for grid_cell in range(batched_output.size()[1]):
      
      predicted_grid_output = batched_output[batch_number,grid_cell,:]
     
      # Logic to get the center coordinates of grid cell
      x_offset = int(grid_cell / grid)
      y_offset = int(grid_cell % grid)
     
      grid_cell_center_x = (x_offset*grid_offset) + (grid_offset/2)
      grid_cell_center_y = (y_offset*grid_offset) + (grid_offset/2)
     
      object_present = -1
      ground_truth_grid_output = torch.Tensor()
       
      for index in range(0,batched_label.size()[1]):
        ground_truth_grid_output = batched_label[batch_number,index,:]
        if (ground_truth_grid_output.sum() == 0):
          break
        
        ground_truth_center_x = ground_truth_grid_output[1].item()
        ground_truth_center_y = ground_truth_grid_output[2].item()

        object_class = ground_truth_grid_output[0].item() # Stores which object is present in the grid cell which is responsible for detecting

        # Finding whether grid detects an object or not
        if(object_class >= 0 and object_class < classes and ground_truth_center_x < (grid_cell_center_x+(grid_offset/2)) and ground_truth_center_x >= (grid_cell_center_x-(grid_offset/2))
            and ground_truth_center_y < (grid_cell_center_y+(grid_offset/2)) and ground_truth_center_y >= (grid_cell_center_y-(grid_offset/2))):
          object_present = object_class
          break
            
      # Calculating classification loss
      if(object_present != -1):
        partial_classification_loss = torch.tensor([0], dtype=torch.float)
        
        for target_class in range(classes):
          if(object_class != object_present):
            partial_classification_loss = partial_classification_loss + (predicted_grid_output[5*bounding_boxes+target_class]) ** 2 
      
        classification_loss = classification_loss + partial_classification_loss + (1 - predicted_grid_output[5*bounding_boxes + int(object_present)])**2

        # Calculating which bounding box has highest IOU with ground truth bounding box
        highest_iou_bbox_index = find_highest_IOU(predicted_grid_output, ground_truth_grid_output, bounding_boxes, grid_cell, grid_offset, input_grids)

        # Calculating localization loss of center points and aspect ratio

        ground_truth_height = (ground_truth_grid_output[3])/(grid * grid_offset)
        ground_truth_width = (ground_truth_grid_output[4])/(grid * grid_offset)
        ground_truth_center_x = ((ground_truth_grid_output[1]) % grid_offset)/grid_offset
        ground_truth_center_y = ((ground_truth_grid_output[2]) % grid_offset)/grid_offset

        localization_loss_centerpoint = localization_loss_centerpoint + ((predicted_grid_output[1]-ground_truth_center_x))**2 + ((predicted_grid_output[2]-ground_truth_center_y))**2
        localization_loss_aspect_ratio = localization_loss_aspect_ratio + (torch.sqrt(predicted_grid_output[3])-torch.sqrt(ground_truth_height))**2 + (torch.sqrt(predicted_grid_output[4])-torch.sqrt(ground_truth_width))**2
        
        # Calculating Confidence loss, if object detected
        confidence_loss_object = confidence_loss_object + (1 - predicted_grid_output[highest_iou_bbox_index*5])**2
              
      # Calculating Confidence loss, if object not detected
      else:
        for number_of_bounding_box in range(bounding_boxes):
          confidence_loss_noobject = confidence_loss_noobject + (predicted_grid_output[number_of_bounding_box*5])**2

    total_loss = total_loss + classification_loss + lambda_coord*localization_loss_centerpoint + lambda_coord*localization_loss_aspect_ratio + confidence_loss_object + lambda_noobject*confidence_loss_noobject 

  batch_loss = total_loss/batched_output.size()[0] 
  
  return batch_loss

Our Convolutional Neural newtork is defined, dataset is defined, loss function is defined. Now, we will train our model

In [0]:
def train(model, optimizer, scheduler, training_dataloader, training_batch_size, input_grids, grid_offset, number_of_cnn_output, bounding_boxes, classes, lambda_coord, lambda_noobject):
  
  # This is inbuilt function of Pytorch and it is important to call it in training 
  # as few function like dropout and batch norm works differently in training mode than in evaluation mode
  model.train()
  batch_loss = 0

  for batch_index, batched_sample in enumerate(training_dataloader):

    batched_image = torch.tensor(batched_sample['image'], requires_grad=True, dtype=torch.float)
    batched_label = torch.tensor(batched_sample['label'], requires_grad=True, dtype=torch.float)
    batched_output = model(batched_image)
    batched_output = batched_output.view(training_batch_size, input_grids, number_of_cnn_output) # Convert the output size into [N X GRIDS X (5 * B + C)]

    loss = yolo_loss(batched_output, batched_label, input_grids, grid_offset, bounding_boxes, classes, lambda_coord, lambda_noobject)
    print('Training Loss for batch_index : {} is {}'.format(batch_index,loss))
    batch_loss = batch_loss + loss.item()
   
    optimizer.zero_grad()    
    loss.backward()
    optimizer.step()
    scheduler.step()

  return batch_loss/len(training_dataloader)

Non Max Supression algorithm to remove duplicate bounding boxes

In [0]:
def non_max_suppression(bbox_list, iou_threshold):

  sorted_bbox_list = OrderedDict()
  for key in sorted(bbox_list, reverse=True):
    sorted_bbox_list[key] = bbox_list[key]
  
  deleted_elements_list = []
  for key1 in sorted_bbox_list:
    for key2 in sorted_bbox_list:

      if(key1 != key2):
        bbox1 = sorted_bbox_list[key1]
        bbox2 = sorted_bbox_list[key2]
        iou = calculate_IOU(bbox1['tlx'], bbox1['tly'],bbox1['brx'],bbox1['bry'],bbox2['tlx'],bbox2['tly'],bbox2['brx'],bbox2['bry'])

        if(iou >= iou_threshold):
          if key2 not in deleted_elements_list:
            deleted_elements_list.append(key2)

  for del_ele in deleted_elements_list:
    del sorted_bbox_list[del_ele]
  
  return sorted_bbox_list  
  

We have trained our model, now we will validate our model. Validation is required to tune our model parameters and hyper parameters.

In [0]:
def validation(model, validation_dataloader, validation_batch_size, classes, input_grids, grid_offset, number_of_cnn_output, saving_results_path, bounding_boxes, validation_images_dir, lambda_coord, lambda_noobject, object_detected_threshold, box_confidence_threshold, iou_threshold):

  # This is inbuilt function of Pytorch and it is important to call it in training 
  # as few function like dropout and batch norm works differently in training mode than in evaluation mode  

  model.eval() 
  batch_loss = 0
  filename=[]
  image_index = 0
  grid = np.sqrt(input_grids)

  for file in os.listdir(validation_images_dir):
    filename.append(file[0:6])

  with torch.no_grad():
    for batch_index, batched_validation_sample in enumerate(validation_dataloader):
      validation_image = batched_validation_sample['image']
      validation_label = batched_validation_sample['label']
      output = model(validation_image)
      output = output.view(validation_batch_size, input_grids, number_of_cnn_output)

      image_path = validation_images_dir + '/' + filename[image_index] + '.png'
      original_image = io.imread(image_path)
      
      loss = yolo_loss(output, validation_label, input_grids, grid_offset, bounding_boxes, classes, lambda_coord, lambda_noobject)
      print('Validation Loss for batch_index : {} is {}'.format(batch_index,loss))
      batch_loss += loss.item()     
      
      for batch_number in range(validation_batch_size):
        bbox_list = OrderedDict()
        for grid_cell in range(input_grids):
          grid_output = output[batch_number,grid_cell,:]
          
          # Logic to get the center, top-left and bottom-right coordinates of grid cell
          x_offset = int(grid_cell / grid)
          y_offset = int(grid_cell % grid)

          grid_cell_topleftX = x_offset * grid_offset 
          grid_cell_topleftY = y_offset * grid_offset

          for number_of_bounding_box in range(bounding_boxes):
            if(grid_output[number_of_bounding_box*5] > object_detected_threshold):

              class_probabilities = grid_output[5*bounding_boxes:]
              max_class_probabilites, index = torch.max(class_probabilities,0)
              
              center_x = grid_output[(number_of_bounding_box*5)+1] * grid_offset + grid_cell_topleftX
              center_y = grid_output[(number_of_bounding_box*5)+2] * grid_offset + grid_cell_topleftY
              height = grid_output[(number_of_bounding_box*5)+3] * grid * grid_offset
              width = grid_output[(number_of_bounding_box*5)+4] * grid * grid_offset

              top_left_x = int((((center_x - (width/2))/(grid*grid_offset))*original_image.shape[1]).item())
              top_left_y = int((((center_y - (height/2))/(grid*grid_offset))*original_image.shape[0]).item())
              bottom_right_x = int((((center_x + (width/2))/(grid*grid_offset))*original_image.shape[1]).item())
              bottom_right_y = int((((center_y + (height/2))/(grid*grid_offset))*original_image.shape[0]).item())
              
              predicted_strength = max_class_probabilites.item() * grid_output[number_of_bounding_box*5].item()
              accuracy = round(predicted_strength * 100,2)

              if(top_left_x >= 0 and top_left_y >= 0 and bottom_right_x >= 0 and bottom_right_y >= 0 and predicted_strength >= box_confidence_threshold):
                bbox_list[predicted_strength]={'tlx':top_left_x,'tly':top_left_y,'brx':bottom_right_x,'bry':bottom_right_y,'accuracy':accuracy,'index':index.item()}
        
        updated_bbox_list = non_max_suppression(bbox_list,iou_threshold)      
        name = 'Unknown'
        color = (130,130,130)
        
        for key in updated_bbox_list:
          bbox = updated_bbox_list[key]
          object_class = bbox['index']
          
          top_left_x = bbox['tlx']
          top_left_y = bbox['tly']
          bottom_right_x = bbox['brx']
          bottom_right_y = bbox['bry']
        
          if(object_class == 0):
            name = 'Car'
            color = (255,255,255)
          elif(object_class == 1):
            name = 'Cyclist'
            color = (0,0,255)
          elif(object_class == 2):
            name = 'Pedestrian'
            color = (0,255,0)
          elif(object_class == 3):
            name = 'Tram'
            color = (255,0,0)
          elif(object_class == 4):
            name = 'Truck'
            color = (0,255,255)
          elif(object_class == 5):
            name = 'Van' 
            color = (255,0,255)

          name = name + '(' + str(bbox['accuracy']) + ')'               
                
          original_image = cv2.rectangle(original_image,(top_left_x,top_left_y),(bottom_right_x,bottom_right_y),color,2)
          original_image = cv2.rectangle(original_image,(top_left_x,top_left_y-30),(top_left_x+125,top_left_y),color,cv2.FILLED)
          cv2.putText(original_image, name, (top_left_x, top_left_y-10), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0,0,0), 1)
        
        save_image_path = saving_results_path + str(batch_index) + '.png'
        file_saved = cv2.imwrite(save_image_path, original_image)

      image_index +=1

  return batch_loss/len(validation_dataloader)

We will finish our program by writing a main function

In [0]:
def main():
    print('...............Main Function starts...............')
    
    # Training Settings
    base_lr = 0.000001 # Hyper parameters
    max_lr = 0.001 # Hyper parameters
    momentum = 0.9 # Hyper parameters
    epochs = 100  # Hyper parameters
    training_batch_size = 2 # Hyper parameters
    validation_batch_size = 1
    
    object_detected_threshold = 0.25 # Model Parameters
    box_confidence_threshold = 0.5 # Model Parameters
    iou_threshold = 0.4 # Model Parameters

    input_image_size = 224 # Model Parameters
    input_grids = 7*7 # Model Parameters
    grid_offset = input_image_size/np.sqrt(input_grids) # Model Parameters
    bounding_boxes = 2 # Model Parameters
    
    classes = 6 # Model Parameters
    number_of_cnn_output = (5*bounding_boxes) + classes # Model Parameters
    lambda_coord = 5 # Model Parameters
    lambda_noobject = 0.5 # Model Parameters
    
    save_model = False
    seed = 1
    logging = True
    steps_completed = 0
    last_epoch_loss = 0
    number_of_training_data = 0
    training_loss = 0

    torch.manual_seed(seed)

    # Comet ML Settings for visualizing loss function and hyper parameters
    if(logging):
      experiment = Experiment(api_key="Vxlozksi1tLwXJlmZYjfQVm7w", project_name="object-detection", workspace="jayfartiyal")
      hyper_parameters = {"lr": base_lr, "epochs": epochs, "batch_size":training_batch_size} 
      experiment.log_parameters(hyper_parameters)

    # Images and labels Directory
    training_labels_dir = r'/content/gdrive/My Drive/kitti_single_nano/training/label_2'
    training_images_dir = r'/content/gdrive/My Drive/kitti_single_nano/training/image_2'
    validation_images_dir = r'/content/gdrive/My Drive/kitti_single_nano/validation/image_2'
    validation_labels_dir = r'/content/gdrive/My Drive/kitti_single_nano/validation/label_2'
    saving_model_path = r'/content/gdrive/My Drive/kitti_single_nano/validation/resnet50_class6_sgd_clr_version1.cnn.pt'
    saving_results_path = r'/content/gdrive/My Drive/kitti_single_nano/validation/results/image'

    if(save_model == False):
      #Inititalizing model and optimizer
      model = Net(input_grids, number_of_cnn_output)
      optimizer = optim.SGD(model.parameters(), lr=base_lr, momentum=momentum)
      scheduler = optim.lr_scheduler.CyclicLR(optimizer=optimizer,base_lr=base_lr,max_lr=max_lr,step_size_up=250)

      print('...............Convolutional Neural Network model and optimizer has been initialized...............')

      # Retrieving model and optimizer states if present 
      if(os.path.isfile(saving_model_path)):
        print('Previous Model state found...............')
        
        checkpoint = torch.load(saving_model_path)
        model.load_state_dict(checkpoint['model_state_dict'])
        optimizer.load_state_dict(checkpoint['optimizer_state_dict'])
        steps_completed = checkpoint['steps_completed']
        last_epoch_loss = checkpoint['last_epoch_loss']
        
        print('Previous Model and optimizer states has been retrieved...............')
        print('{} steps completed..........'.format(steps_completed))
        print('Last epoch cycle loss : {}..........'.format(last_epoch_loss))          
      else:
        print('Previous Model state not found !!!...............')

      save_model = True # After the finish of the program, it should save the model
    
    # Creating transform to apply on training dataset
    training_dataset_transform = transforms.Compose([
                                         BatchPadding(100),
                                         Resize(input_image_size),
                                         Normalization(),
                                         ToTensor()])
    
    # Creating transform to apply on validation dataset
    validation_dataset_transform = transforms.Compose([
                                         BatchPadding(100),
                                         Resize(input_image_size),
                                         Normalization(),
                                         ToTensor()])
    
    # Creating training and validation dataset instance
    training_dataset = KittiDataset(labels_dir=training_labels_dir, images_dir=training_images_dir, number_of_classes=classes, input_image_size=input_image_size, transform=training_dataset_transform)
    validation_dataset = KittiDataset(labels_dir=validation_labels_dir, images_dir=validation_images_dir, number_of_classes=classes, input_image_size=input_image_size, transform=validation_dataset_transform)
    number_of_training_data = training_dataset.__len__()

    training_dataloader = DataLoader(dataset=training_dataset, batch_size=training_batch_size, shuffle=True, drop_last=True)
    validation_dataloader = DataLoader(dataset=validation_dataset, batch_size=validation_batch_size)
    
    print('...............Training and Validation Dataloader initialized...............')
    print('...............Training is starting...............')
    
    for epoch in range(0,epochs):
      training_loss = train(model, optimizer, scheduler, training_dataloader, training_batch_size, input_grids, grid_offset, number_of_cnn_output, bounding_boxes, classes, lambda_coord, lambda_noobject)
      print('Training Loss for epoch :{} is {}'.format(epoch,training_loss))
      
      # Logging training loss for hyper parameter tuning
      if(logging):
        experiment.log_metric("Training Loss", training_loss)
      
      print('................Validation is starting.................')
      validation_loss = validation(model, validation_dataloader, validation_batch_size, classes, input_grids, grid_offset, number_of_cnn_output, saving_results_path, bounding_boxes, validation_images_dir, lambda_coord, lambda_noobject, object_detected_threshold, box_confidence_threshold, iou_threshold)
      print('Validation Loss for epoch :{} is {}'.format(epoch,validation_loss))

      # Logging training loss for hyper parameter tuning
      if(logging):
        experiment.log_metric("Validation Loss", validation_loss)

    steps_completed += int(((number_of_training_data)/training_batch_size)*epochs)

    if (save_model):
        torch.save({
          'steps_completed': steps_completed,
          'model_state_dict': model.state_dict(),
          'optimizer_state_dict': optimizer.state_dict(),
          'last_epoch_loss' : training_loss}, saving_model_path)
        print('..............Convolutional Neural Network Model parameters are saved...............')
          

    print('...............Model is now trained over : {} steps...............'.format(steps_completed))        
    print('...............Last epoch cycle loss : {} ...............'.format(last_epoch_loss))
    print('...............Current cycle last epoch loss : {} ...............'.format(training_loss))

if __name__ == '__main__':
  main()

