<a href="https://colab.research.google.com/github/JitindraFartiyal/Object-Detection/blob/object-detection-v1/Yolo.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Connecting to Google drive to upload dataset. This step is only required if you are using Google Colab and uploading dataset from Google Drive


In [1]:
from google.colab import drive
drive.mount('/content/gdrive')

Drive already mounted at /content/gdrive; to attempt to forcibly remount, call drive.mount("/content/gdrive", force_remount=True).


Importing all libraries

In [0]:
from comet_ml import Experiment
import os
import pandas as pd
import numpy as np
import math
import torch
import pdb
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import matplotlib.pyplot as plt
import cv2
from collections import OrderedDict 
from google.colab.patches import cv2_imshow
from torch.utils.data import Dataset, DataLoader
from skimage import io, transform
from torchvision import transforms, datasets, utils


We need to convert class ['Car','Cyclist'....] in the label file into an integer. As, we are not using label file into our model, we need not to use one hot encoding or other encoding techniques. We are simply converting it for ease of use.

In [0]:
def class_encoding(label):
   
  for i in range(label.shape[0]):
    if label.iloc[i,0] == 'Car':
      label.iloc[i,0] = 1
    elif label.iloc[i,0] == 'Cyclist':
      label.iloc[i,0] = 2
    elif label.iloc[i, 0] == 'DontCare':
      label.iloc[i, 0] = 3
    elif label.iloc[i,0] == 'Misc':
      label.iloc[i,0] = 4
    elif label.iloc[i,0] == 'Pedestrian':
      label.iloc[i,0] = 5
    elif label.iloc[i,0] == 'Person_sitting':
      label.iloc[i,0] = 6
    elif label.iloc[i,0] == 'Tram':
      label.iloc[i,0] = 7
    elif label.iloc[i,0] == 'Truck':
      label.iloc[i,0] = 8
    elif label.iloc[i,0] == 'Van':
      label.iloc[i,0] = 9 


Kitti Dataset has different format for label file as compared to the YOLO format for label file. We need to convert format of our Kitti Dataset label file into format of YOLO label file.                                                       

---


Note : We are rescaling coordinates of our bounding box into output image size which is [225 X 225] and not the input image size which is [270 X 270], as we need to compare the labels with the ouput of our model

In [0]:
def transform_label(label, number_of_classes):
  
  # In case of Bounding boxes, coordinate system doesnot start from bottom-left as we see normally in our mathematics, instead it starts from top-left corner
  top_left_x = label[:,1]
  top_left_y = label[:,2]
  bottom_right_x = label[:,3]
  bottom_right_y = label[:,4]

  height = bottom_right_y - top_left_y
  width = bottom_right_x - top_left_x
  center_x = top_left_x + width/2
  center_y = top_left_y + height/2

  # Reducing the scale [1242 X 375] of the coordinates of bounding box in the label file into output image scale [225 X 225]. 
  # We need to do this, so that at training and testing, we can compute loss easily, if all are in the same scale. 
  label[:,1] = (center_x/1242)*225
  label[:,2] = (center_y/375)*225
  label[:,3] = (height /375)*225
  label[:,4] = (width/1242)*225

  # Adding classes probabilites columns
  target = np.zeros((label.shape[0],label.shape[1] + number_of_classes)) 
  target[:,0:5] = label
  
  for i in range(0,label.shape[0]):
    if(target[i,0:1] == 1): # Prob_Class(Car) = 1 and rest 0, if Car is detected 
      target[i,5:6] = 1
    elif(target[i,0:1] == 2): # Prob_Class(Cyclist) = 1 and rest 0, if Cyclist is detected
      target[i,6:7] = 1
    elif(target[i,0:1] == 3): # Prob_Class(DontCare) = 1 and rest 0, if DontCare is detected
      target[i,7:8] = 1  
    elif(target[i,0:1] == 4): # Prob_Class(Misc) = 1 and rest 0, if Misc is detected
      target[i,8:9] = 1
    elif(target[i,0:1] == 5): # Prob_Class(Pedestrian) = 1 and rest 0, if Pedestrian is detected
      target[i,9:10] = 1
    elif(target[i,0:1] == 6): # Prob_Class(Person_sitting) = 1 and rest 0, if Person_sitting is detected 
      target[i,10:11] = 1  
    elif(target[i,0:1] == 7): # Prob_Class(Tram) = 1 and rest 0, if Tram is detected
      target[i,11:12] = 1
    elif(target[i,0:1] == 8): # Prob_Class(Truck) = 1 and rest 0, if Truck is detected 
      target[i,12:13] = 1
    elif(target[i,0:1] == 9): # Prob_Class(Van) = 1 and rest 0, if Van is detected 
      target[i,13:14] = 1
  return target


We need to preprocess the data. It means to keep the data i.e the input to our Convolutional Neural Network (CNN model) into an uniform form.
Here, our input will be a dictionary of image and its label. Only, the images will be the input for our CNN model and label will be used for calculating loss. Largely, we only need to preprocess the data which we input in our CNN model with resizing, normalizing, mean subtraction etc., but one common preprocessing on both image and label is needed to convert it into a tensor for further calculation.
Here, for image we are only resizing it into [270 X 270] and Mean Subtraction.

In [0]:
class Resize(object):

  def __init__(self, input_image_size):
    # Input image size is the size of the image that we are putting it into our CNN Model. In this case, it is [270 X 270]
    self.input_image_size = (input_image_size,input_image_size)
  
  def __call__(self, data_sample):
    image, label = data_sample['image'], data_sample['label']
    image = transform.resize(image,self.input_image_size,preserve_range=True)

    return {'image' : image, 'label' : label}

class ToTensor(object):
  
  def __call__(self, data_sample):
    image, label = data_sample['image'], data_sample['label']
    image = image.transpose((2, 0, 1)) # Converting the image form from (H X W X C) into (C X H X W)

    # If we donot use float() at end, by default, torch.from_numpy() will convert our input of our CNN model into a Float64 type
    # We have to convert our CNN model type also in Float64 i.e Double or else it will throw error. 
    # By default, CNN model type is Float16, so better to convert the input into Float16 type here only
    return {'image' : torch.from_numpy(image).float(),
            'label' : torch.from_numpy(label).float()}

class MeanSubtraction(object):

  def __call__(self, data_sample):
    image, label = data_sample['image'], data_sample['label']
    image = np.absolute(image - np.mean(image)) # Subtracting Mean from all the pixels of the image of all channels. Refer NumPy mannual to know more.

    return {'image' : image, 'label' : label}

# Our Dataset has different lengths data in our label file, so when stacking into a single batch during training, it throw error because of
# variable dimensions. One of the solution is to pad the label file with an arbitrary number.  
class BatchPadding(object):

  def __init__(self, pad):
    self.pad = pad
  
  def __call__(self, data_sample):
    image, label = data_sample['image'], data_sample['label']
    batched_label = np.zeros((self.pad,label.shape[1]))
    batched_label[0:label.shape[0],:] = label

    return {'image' : image, 'label' : batched_label}


Here, we are defining a class for our dataset. For our problem of Object Detection for Self Driving Cars, we are using KittiDataset


In [0]:
class KittiDataset(Dataset):

    def __init__(self, labels_dir, images_dir, number_of_classes, transform=None):
      
      self.labels_dir = labels_dir
      self.images_dir = images_dir
      self.number_of_classes = number_of_classes
      self.transform = transform

      self.labels_dict = {}
      self.filename = []
      self.__init__dataset()

    def __init__dataset(self):
      
      print('..........Initializing Dataset..........')
      
      index = 0
      for file in os.listdir(self.labels_dir):

        print('Reading label file : ' + file + '...')
        
        label_path = self.labels_dir + '/' + file
        label = pd.read_csv(filepath_or_buffer=label_path, sep=' ', header=None, index_col=False)
        
        # Taking out relevant features out from the label dataframe
        label = label.iloc[:,[0,4,5,6,7]] 
        label.columns = ['Class','TopLeftX','TopLeftY','BottomRightX','BottomRightY'] 
    
        # Class Encoding
        # Car=1, Cyclist=2, DontCare=3, Misc=4, Pedestrian=5, Person_sitting=6, Tram=7, Truck=8, Van=9 
        class_encoding(label)

        self.labels_dict[index] = label
        self.filename.append(file[0:6])
        index = index + 1

    def __len__(self):
      return len(self.labels_dict)

    def __getitem__(self, index):
      image_path = self.images_dir + '/' + self.filename[index] + '.png'
      image = io.imread(image_path)
      
      label = self.labels_dict[index]
      label = label.to_numpy(dtype = np.float16) 
      
      # Convert the label into YOLO format (class, center_x, center_y, height, width, class_prob1 ..... class_probn)
      target = transform_label(label, self.number_of_classes)

      data_sample = {'image' : image, 'label' : target}
      
      if self.transform:
        data_sample = self.transform(data_sample)
        
      return data_sample


After creating the datatset class, we now need to create our CNN model class, where we define our CNN layers and forward pass

In [0]:
class Net(nn.Module):
    def __init__(self, gridX, gridY, grids_values):
      
        super(Net, self).__init__()
        print('..........Initializing Convolutional Neural Network..........')
      
        # Fully Connected layer settings - Includes the size of Fully Connected layer
        self.gridX = gridX
        self.gridY = gridY
        self.grids_values = grids_values
        
        # Initialization of Convolutional layers, Batch Normalization layers and Dropout layers
        self.cnnlayer1 = nn.Sequential(OrderedDict([
          ('conv1', nn.Conv2d(in_channels=3, out_channels=64, kernel_size=1, stride=1, padding=0, bias=True)),
          ('relu1', nn.ReLU(True)),
          ('batch_norm_conv1', nn.BatchNorm2d(num_features=64))
         ]))
        
        self.cnnlayer2 = nn.Sequential(OrderedDict([
          ('conv2', nn.Conv2d(in_channels=64, out_channels=128, kernel_size=1, stride=1, padding=0, bias=True)),
          ('relu2', nn.ReLU(True)),
          ('batch_norm_conv2', nn.BatchNorm2d(num_features=128))
         ]))
        
        self.cnnlayer3 = nn.Sequential(OrderedDict([
          ('conv3', nn.Conv2d(in_channels=128, out_channels=256, kernel_size=1, stride=1, padding=0, bias=True)),
          ('relu3', nn.ReLU(True)),
          ('batch_norm_conv3', nn.BatchNorm2d(num_features=256))
         ]))
        
        self.cnnlayer4 = nn.Sequential(OrderedDict([
          ('conv4', nn.Conv2d(in_channels=256, out_channels=512, kernel_size=1, stride=1, padding=0, bias=True)),
          ('relu4', nn.ReLU(True)),
          ('batch_norm_conv4', nn.BatchNorm2d(num_features=512)),
          ('dropout_conv5', nn.Dropout2d(0.05))
         ]))

        self.cnnlayer5 = nn.Sequential(OrderedDict([
          ('conv5', nn.Conv2d(in_channels=512, out_channels=256, kernel_size=1, stride=1, padding=0, bias=True)),
          ('relu5', nn.ReLU(True)),
          ('batch_norm_conv5', nn.BatchNorm2d(num_features=256)),
          ('max_pool_conv5', nn.MaxPool2d(kernel_size=2, stride=2))
         ]))
        
        self.cnnlayer6 = nn.Sequential(OrderedDict([
          ('conv6', nn.Conv2d(in_channels=256, out_channels=128, kernel_size=1, stride=1, padding=0, bias=True)),
          ('relu6', nn.ReLU(True)),
          ('batch_norm_conv6', nn.BatchNorm2d(num_features=128)),
          ('max_pool_conv6', nn.MaxPool2d(kernel_size=3, stride=3))
         ]))
        
        self.cnnlayer7 = nn.Sequential(OrderedDict([
          ('conv7', nn.Conv2d(in_channels=128, out_channels=64, kernel_size=1, stride=1, padding=0, bias=True)),
          ('relu7', nn.ReLU(True)),
          ('batch_norm_conv7', nn.BatchNorm2d(num_features=64)),
          ('max_pool7', nn.MaxPool2d(kernel_size=3, stride=3))
         ]))
        
        self.fc1 = nn.Linear(self.gridX*self.gridY*64, self.gridX*self.gridY*32, bias=True)
        self.batch_norm_fc1 = nn.BatchNorm1d(num_features=self.gridX*self.gridY*32)
        self.dropout2 = nn.Dropout2d(0.5)
        self.fc2 = nn.Linear(self.gridX*self.gridY*32, self.gridX*self.gridY*self.grids_values, bias=True)


    def forward(self,x):
        x = self.cnnlayer1(x)
        x = self.cnnlayer2(x)
        x = self.cnnlayer3(x)
        x = self.cnnlayer4(x)

        x = self.cnnlayer5(x)
        x = self.cnnlayer6(x)
        x = self.cnnlayer7(x)

        x = x.view(-1,self.gridX*self.gridY*64)
        x = self.batch_norm_fc1(F.relu(self.fc1(x)))
        x = self.fc2(self.dropout2(x))
        x = F.sigmoid(x)

        return x


After prediction, we will get many bounding boxes for a single class. To eliminate that, we need an algorithm to find which bounding box matches the ground truth bounding box by how much. We call this algorithm Intersection of Union (IOU).


```
IOU = (area of intersection) / (area of bounding box1 + area of bounding box2 - area of intersection) 
```


In [0]:
def calculate_IOU(b1X, b1Y, b2X, b2Y, b3X, b3Y, b4X, b4Y):

  # b1X, b1Y, b2X, b2Y corresponds to topleft and bottom right coordinates of bounding box1 
  # b3X, b3Y, b4X, b4Y corresponds to topleft and bottom right coordinates of bounding box2
  xA = max(b1X,b3X)
  yA = max(b1Y,b3Y)
  xB = min(b2X,b4X)
  yB = min(b2Y,b4Y)

  area_intersection = (xB-xA+1) * (yB-yA+1)
  area_of_boundingbox1 = (b2X-b1X+1) * (b2Y-b1Y+1)
  area_of_boundingbox2 = (b4X-b3X+1) * (b4Y-b4Y+1)

  iou = area_intersection/(area_of_boundingbox1 + area_of_boundingbox2 - area_intersection)
  return iou


After calculating IOU of all the bouding boxes, we need to return the bounding box whose IOU is the highest. 


---
Remember the coordinates of the bounding box are scaled i.e x,y are offsets with respect to grid and h,w are scaled between 0 and 1 with respect to image height and width 


In [0]:
def find_highest_IOU(predicted_grid_output, ground_truth_grid_output, bounding_boxes, grid_cell, grid_offset, grids):
  max_iou = 0
  max_iou_index = 0
  
  # Logic to get the top-left coordinates of grid cell
  x_offset = int(grid_cell / grid_offset)
  y_offset = int(grid_cell % grid_offset)
  grid_x_offset = (x_offset*grid_offset) 
  grid_y_offset = (y_offset*grid_offset)   
  
  for number_of_bounding_box in range(0,bounding_boxes):

    # We will first calculate the center,height,width coordinates and then find topleft and bottomright coordinates of the bounding box
    predicted_center_x = (predicted_grid_output[(number_of_bounding_box*5) + 1].item()*grid_offset) + grid_x_offset
    predicted_center_y = (predicted_grid_output[(number_of_bounding_box*5) + 2].item()*grid_offset) + grid_y_offset
    predicted_height = predicted_grid_output[(number_of_bounding_box*5) + 3].item()*grids
    predicted_width= predicted_grid_output[(number_of_bounding_box*5) + 4].item()*grids

    predicted_topLeftX = predicted_center_x - predicted_width/2
    predicted_topLeftY = predicted_center_y - predicted_height/2
    predicted_bottomRightX = predicted_center_x + predicted_width/2
    predicted_bottomRightY = predicted_center_y + predicted_height/2

    ground_truth_center_x = ground_truth_grid_output[1].item()
    ground_truth_center_y = ground_truth_grid_output[2].item()
    ground_truth_height = ground_truth_grid_output[3].item()
    ground_truth_width = ground_truth_grid_output[4].item()

    ground_truth_topLeftX = ground_truth_center_x - ground_truth_width/2
    ground_truth_topLeftY = ground_truth_center_y - ground_truth_height/2
    ground_truth_bottomRightX = ground_truth_center_x + ground_truth_width/2
    ground_truth_bottomRightY = ground_truth_center_y + ground_truth_height/2

    iou = calculate_IOU(predicted_topLeftX,predicted_topLeftY,predicted_bottomRightX,predicted_bottomRightY,
                        ground_truth_topLeftX,ground_truth_topLeftY,ground_truth_bottomRightX,ground_truth_bottomRightY)

    if(iou > max_iou):
      max_iou = iou
      max_iou_index = number_of_bounding_box

  return max_iou_index


Now, we will calculate the Loss function. It comprises of three losses :

*   Classification Loss : if object is detected, the mean squared error loss of class probabilites
*   Localization Loss : if object is detected, the mean squared error loss of coordinates of bounding box
*   Confidence Loss : the mean squared error loss of box confidence, when object is detected and when it is not

In the end, we will mulitply our loss with lambda_coord and lambda_noobject which regularize the imbalance and reduce the effect of background noise







In [0]:
def yolo_loss(batched_output, batched_label, grids, bounding_boxes, classes, lambda_coord, lambda_noobject):
  total_loss = 0
  grid_offset = np.sqrt(grids) # Image is divide in 15 X 15, so one grid spans 1/15 from center in range of (0-1)

  for batch_number in range(0,batched_output.size()[0]):
    classification_loss = 0
    localization_loss_centerpoint = 0
    localization_loss_aspect_ratio = 0
    confidence_loss_object = 0
    confidence_loss_noobject = 0

    for grid_cell in range(0,batched_output.size()[1]):
      
      predicted_grid_output = batched_output[batch_number,grid_cell,:]

      # Logic to get the center coordinates of grid cell
      x_offset = int(grid_cell / grid_offset)
      y_offset = int(grid_cell % grid_offset)
      grid_cell_center_x = (x_offset*grid_offset) + (grid_offset/2)
      grid_cell_center_y = (y_offset*grid_offset) + (grid_offset/2)

      object_present = 0
      ground_truth_grid_output = torch.Tensor()
       
      for index in range(0,batched_label.size()[1]):
        ground_truth_grid_output = batched_label[batch_number,index,:]
        
        if (ground_truth_grid_output.sum() == 0):
          break
        
        ground_truth_center_x = ground_truth_grid_output[1].item()
        ground_truth_center_y = ground_truth_grid_output[2].item()
        ground_truth_height = ground_truth_grid_output[3].item()
        ground_truth_width = ground_truth_grid_output[4].item()

        # Finding whether grid detects an object or not
        if(ground_truth_center_x <= (grid_cell_center_x+(grid_offset/2)) and ground_truth_center_x >= (grid_cell_center_x-(grid_offset/2))
            and ground_truth_center_y <= (grid_cell_center_y+(grid_offset/2)) and ground_truth_center_y >= (grid_cell_center_y-(grid_offset/2))):
          object_present = 1
          break
            
      # Calculating classification loss
      if(object_present == 1):

        partial_classification_loss = 0
        for object_class in range(0,classes):
          partial_classification_loss += (predicted_grid_output[5*bounding_boxes+object_class].item() - ground_truth_grid_output[5+object_class].item()) ** 2 
      
        classification_loss += partial_classification_loss

        # Calculating which bounding box has highest IOU with ground truth bounding box
        highest_iou_bbox_index = find_highest_IOU(predicted_grid_output, ground_truth_grid_output, bounding_boxes, grid_cell, grid_offset, grids)

        # Calculating localization loss of center points and aspect ratio
        grid_cell_topleftX  = grid_cell_center_x-(grid_offset/2)
        grid_cell_topleftY  = grid_cell_center_y-(grid_offset/2)

        predicted_center_x = (predicted_grid_output[(highest_iou_bbox_index*5)+1].item())*grid_offset + grid_cell_topleftX
        predicted_center_y = (predicted_grid_output[(highest_iou_bbox_index*5)+2].item())*grid_offset + grid_cell_topleftY
        predicted_height = predicted_grid_output[(highest_iou_bbox_index*5)+3].item()*grids
        predicted_width = predicted_grid_output[(highest_iou_bbox_index*5)+4].item()*grids
     
        localization_loss_centerpoint += (predicted_center_x-ground_truth_grid_output[1].item())**2 + (predicted_center_y-ground_truth_grid_output[2].item())**2
        localization_loss_aspect_ratio += (math.sqrt(predicted_height)-math.sqrt(ground_truth_grid_output[3].item()))**2 + (math.sqrt(predicted_width)-math.sqrt(ground_truth_grid_output[4].item()))**2

        # Calculating Confidence loss, if object detected
        confidence_loss_object += (predicted_grid_output[highest_iou_bbox_index*5].item()-ground_truth_grid_output[0].item())**2
              
      # Calculating Confidence loss, if object not detected
      else:
        for number_of_bounding_box in range(0,bounding_boxes):
          confidence_loss_noobject += (predicted_grid_output[number_of_bounding_box*5].item())**2

    total_loss += classification_loss + lambda_coord*localization_loss_centerpoint + lambda_coord*localization_loss_aspect_ratio + confidence_loss_object + lambda_noobject*confidence_loss_noobject 
  
  return torch.tensor([total_loss/batched_output.size()[0]], requires_grad=True)

Our Convolutional Neural newtork is defined, dataset is defined, loss function is defined. Now, we will train our model

In [0]:
def train(model, optimizer, training_dataloader, batch_size, grids, grids_values, bounding_boxes, classes, lambda_coord, lambda_noobject):
  
  # This is inbuilt function of Pytorch and it is important to call it in training 
  # as few function like dropout and batch norm works differently in training mode than in evaluation mode
  model.train()
  batch_loss = 0
  
  for batch_index, batched_sample in enumerate(training_dataloader):
    
    batched_image = batched_sample['image']
    batched_label = batched_sample['label']
    batched_output = model(batched_image)
    batched_output = batched_output.view(batch_size, grids, grids_values) # Convert the output size into [N X GRIDS(225 X 225) X (5 * B + C)]

    loss = yolo_loss(batched_output, batched_label, grids, bounding_boxes, classes, lambda_coord, lambda_noobject)
    print('Loss for batch_index : {} is {}'.format(batch_index,loss.item()))
    batch_loss = batch_loss + loss.item()
    
    optimizer.zero_grad()    
    loss.backward()
    optimizer.step()

  return batch_loss/len(training_dataloader)

We have trained our model, now we will validate our model. Validation is required to tune our model parameters and hyper parameters.

In [0]:
def validation(model, testing_dataloader, batch_size, object_detected_threshold, box_confidence_threshold, bounding_boxes, grids,grids_values):

  # This is inbuilt function of Pytorch and it is important to call it in training 
  # as few function like dropout and batch norm works differently in training mode than in evaluation mode  
  model.eval()
  grid_offset = np.sqrt(grids) # Image is divide in 15 X 15, so one grid spans 1/15 from center in range of (0-1)

  with torch.no_grad():

    for batch_index, batched_validation_sample in enumerate(testing_dataloader):
      
      print('Validating for batch index : {} .....'.format(batch_index))

      validation_image = batched_validation_sample['image']
      validation_label = batched_validation_sample['label']
      output = model(validation_image)
      output = output.view(batch_size, grids,grids_values)

      original_image  = validation_image.permute(0,2,3,1) # Converting out image back from [N X H X W] to [H X W X N] size

      for batch_number in range(0,batch_size):
        original_image = original_image[batch_number,:,:,:]
        original_image = original_image.numpy()
        original_image = transform.resize(original_image,(375,1242),preserve_range=True)
        original_image = np.ascontiguousarray(original_image, dtype=np.uint8)
        
        for grid_cell in range(0,grids):

          grid_output = output[batch_number,grid_cell,:]
          
          # # Logic to get the center, top-left and bottom-right coordinates of grid cell
          x_offset = int(grid_cell / grid_offset)
          y_offset = int(grid_cell % grid_offset)
        
          grid_cell_center_x = (x_offset*grid_offset) + (grid_offset/2)
          grid_cell_center_y = (y_offset*grid_offset) + (grid_offset/2)
        
          grid_cell_topleftX  = grid_cell_center_x-(grid_offset/2)
          grid_cell_topleftY  = grid_cell_center_y-(grid_offset/2)
        
          for number_of_bounding_box in range(0,bounding_boxes):
          
            if(grid_output[number_of_bounding_box*5] >= object_detected_threshold):
            
              class_probabilities = grid_output[5*bounding_boxes:]
              max_class_probabilites, index = torch.max(class_probabilities,0)

              if(max_class_probabilites.item() >= box_confidence_threshold):
                accuracy = max_class_probabilites.item()*100
                
                center_x = grid_output[(number_of_bounding_box*5)+1]*grid_offset + grid_cell_topleftX
                center_y = grid_output[(number_of_bounding_box*5)+2]*grid_offset + grid_cell_topleftY
                height = grid_output[(number_of_bounding_box*5)+3]*grids
                width = grid_output[(number_of_bounding_box*5)+4]*grids

                top_left_x = ((center_x - (width/2))/grids)*original_image.shape[1]
                top_left_y = ((center_y - (height/2))/grids)*original_image.shape[0]
                bottom_right_x = ((center_x + (width/2))/grids)*original_image.shape[1]
                bottom_right_y = ((center_y + (height/2))/grids)*original_image.shape[0]

                name = ''
                if index.item() == 1:
                  name = 'Car'
                elif index.item() == 2:
                  name = 'Cyclist'
                elif index.item() == 3:
                  name = 'DontCare'
                elif index.item() == 4:
                  name = 'Misc'
                elif index.item() == 5:
                  name = 'Pedestrian'
                elif index.item() == 6:
                  name = 'Person_sitting'
                elif index.item() == 7:
                  name = 'Tram'
                elif index.item() == 8:
                  name = 'Truck'
                elif index.item() == 9:
                  name = 'Van' 

                  name = name + '     (' + str(accuracy) + ')'               
                
                original_image = cv2.rectangle(original_image,(top_left_x,top_left_y),(bottom_right_x,bottom_right_y),(0,0,255),3)
                cv2.putText(original_image, name, (top_left_x, top_left_y-10), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (255,0,0), 2)

        save_path = r'/content/gdrive/My Drive/kitti_single_mini/validation/results/image' + str(batch_index) + '.jpeg'
        cv2.imwrite(save_path,original_image)

We will finish our program by writing a main function

In [0]:
def main():
    print('..........Main Function starts..........')
    
    # Training Settings
    lr = 0.000001 # Hyper parameters
    betas = (0.9,0.999) # Hyper parameters
    epochs = 5 # Hyper parameters
    training_batch_size = 4 # Hyper parameters
    
    validation_batch_size = 1
    object_detected_threshold = 0.5 # Model Parameters
    box_confidence_threshold = 0.7 # Model Parameters
    grids = 15*15 # Model Parameters
    bounding_boxes = 2 # Model Parameters
    classes = 9 # Model Parameters
    grids_values = (5*bounding_boxes) + classes # Model Parameters
    lambda_coord = 2 # Model Parameters
    lambda_noobject = 0.5 # Model Parameters
    save_model = False
    seed = 1
    logging = True
    steps_completed = 0
    last_epoch_loss = 0
    number_of_training_data = 0
    training_loss = 0

    torch.manual_seed(seed)
    

    # Comet ML Settings for visualizing loss function and hyper parameters
    if(logging):
      experiment = Experiment(api_key="Vxlozksi1tLwXJlmZYjfQVm7w", project_name="object-detection", workspace="jayfartiyal")
      hyper_params = {"lr": lr, "epochs": epochs, "batch_size":training_batch_size} 
      experiment.log_parameters(hyper_params)

    # Images and labels Directory
    labels_dir = r'/content/gdrive/My Drive/kitti_single_mini/training/label_2'
    images_dir = r'/content/gdrive/My Drive/kitti_single_mini/training/image_2'
    validation_images_dir = r'/content/gdrive/My Drive/kitti_single_mini/validation/image_2'
    validation_labels_dir = r'/content/gdrive/My Drive/kitti_single_mini/validation/label_2'
    saving_model_path = r'/content/gdrive/My Drive/kitti_single_mini/validation/layer9class2.cnn.pt'


    if(images_dir.find('micro') != -1):
      number_of_training_data = 100
    elif(images_dir.find('mini') != -1):
      number_of_training_data = 250
    elif(images_dir.find('small') != -1):
      number_of_training_data = 500
    elif(images_dir.find('medium') != -1):
      number_of_training_data = 1000


    if(save_model == False):
      
      #Inititalizing model and optimizer
      gridX = int(math.sqrt(grids))
      gridY = gridX
      model = Net(gridX,gridY,grids_values)
      optimizer = optim.Adam(model.parameters(), lr=lr, betas=betas)
      print('..........Convolutional Neural Network model and optimizer has been initialized..........')

      # Retrieving model and optimizer states if present 
      if(os.path.isfile(saving_model_path)):

        print('.....Previous Model state found.....')
        
        checkpoint = torch.load(saving_model_path)
        model.load_state_dict(checkpoint['model_state_dict'])
        optimizer.load_state_dict(checkpoint['optimizer_state_dict'])
        steps_completed = checkpoint['steps_completed']
        last_epoch_loss = checkpoint['last_epoch_loss']
        
        print('.....Previous Model and optimizer states has been retrieved.....')
        print('{} steps completed'.format(steps_completed))
        print('Last epoch cycle loss : {}'.format(last_epoch_loss))

      else:
        print('.....Previous Model state not found !!!.....')

      save_model = True # After the finish of the program, it should save the model
    
    # Creating transform to apply on training dataset
    training_dataset_transform = transforms.Compose([
                                         BatchPadding(100),
                                         Resize(270),
                                         MeanSubtraction(),
                                         ToTensor()])
    
    # Creating transform to apply on validation dataset
    validation_dataset_transform = transforms.Compose([
                                         BatchPadding(100),
                                         Resize(270),
                                         ToTensor()])
    
    # Creating training and validation dataset instance
    training_dataset = KittiDataset(labels_dir=labels_dir, images_dir=images_dir,number_of_classes=classes, transform=training_dataset_transform)
    validation_dataset = KittiDataset(labels_dir=validation_labels_dir, images_dir=validation_images_dir, number_of_classes=classes, transform=validation_dataset_transform)
   
    training_dataloader = DataLoader(dataset=training_dataset, batch_size=training_batch_size, shuffle=True, drop_last=True)
    validation_dataloader = DataLoader(dataset=validation_dataset, batch_size=validation_batch_size)
    
    print('..........Training and Validation Dataloader initialized..........')
    
    print('.....Training is starting.....')
    
    for epoch in range(0,epochs):
      print('Training dataset for epoch : {}'.format(epoch))
      
      training_loss = train(model, optimizer, training_dataloader, training_batch_size, grids, grids_values, bounding_boxes, classes, lambda_coord, lambda_noobject)
      
      print('Loss for epoch :{} is {}'.format(epoch,training_loss))

      # Logging training loss for hyper parameter tuning
      if(logging):
        experiment.log_metric("Training Loss", training_loss )
      
    print('.....Validation is starting.....')
    validation(model, validation_dataloader, validation_batch_size, object_detected_threshold, box_confidence_threshold, bounding_boxes, grids, grids_values)
    #experiment.log_metric("Validation accuracy", accuracy.item())
   
    steps_completed += int(((number_of_training_data)/training_batch_size)*epochs)
    if (save_model):
        torch.save({
          'steps_completed': steps_completed,
          'model_state_dict': model.state_dict(),
          'optimizer_state_dict': optimizer.state_dict(),
          'last_epoch_loss' : training_loss}, saving_model_path)
          

    print('..........Model is now trained over : {} steps..........'.format(steps_completed))        
    print('..........Last epoch cycle loss : {}'.format(last_epoch_loss))
    print('..........Current cycle last epoch loss : {}'.format(training_loss))

if __name__ == '__main__':
  main()

