<a href="https://colab.research.google.com/github/CN-Nandhini/OneApi-FakeReviewDetection/blob/main/nandhini_ML_Research_Engineer_Take_Home_Assessment.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Design and Train an object detector to detect objects

You have to design and implement a Training Pipeline that can train, test and visualize the model using the dataset provided.

## Assignment Protocols

- We expect it to take ~3 hours, with an extra 15 min for clear loom explanation(s)
  - The assessment is timeboxed at 4 hours total in a single block. So please plan accordingly
- You need to use Google Collaboratory to run and edit this notebook
- You can only use Python as a programming Language
- You need to use Pytorch, (and you cannot use PyTorch Lightning)
- You cannot take help from any other person
- You can use Google to search for references
- You can not search on google for design-related things, like what should be loss function, or what should be model architecture.
  - But you can use pre-trained backbones from PyTorch
- Record a 5-10 mins of code walkthrough of the work you have done. You can use Loom Platform (https://www.loom.com) to record the video.
  - Design Decisions
    - Model Design which layers and activation functions you used and why
    - Loss function, which loss functions you used and why
    - What metrics in test function you would update and why?
  - Any optimizations you have made to the codebase
  - How you implemented resume functionality, what were the things you thought would be needed to resume training from exact same point
  - Explain what parts of the assessment are completed and what is missing?
  - Make sure to submit the screen recording link in the submission after you are done recording
  - Please note that the free plan on Loom only allows for videos up to 5 minutes in length. As such, you may need to record two separate 5-minute videos.
- [NO SUBMISSION WILL BE ACCEPTED WITHOUT]
  - Trained best model weights
  - Visualize Function in the Notebook
  - Code Walk-through video

## Task Details
Design a Training Pipeline to train a object detector with following specs or assumptions:
- Implement & Design Model
  - You can use any backbone
    - Either from PyTorch (torhvision) or any resource online
    - But you need to design head your self (head means how you will use features of the back bone and get the desired outputs)
  - Model needs to detect one object in each image
  - Model should output following for each image passed as input:
    - Whether we have an object or not
    - Where is the object?
      - The bounding box output format should be xmin, ymin, xmax, ymax
      - It is not necessary the model is trained to output exactly this format but the visualize function which shows output should output in this format
    - Either the object is a cat or dog?
    - And which specie the object belongs to? There are in total 9 species:
      - Cat [3 species]:
        - Abyssinian
        - Birman
        - Persian
      - Dog [6 species]
        - american_bulldog
        - american_pit_bull_terrier
        - basset_hound
        - beagle
        - chihuahua
        - pomeranian
- Implement Custom Dataloader
  - This is obvious as dataset is in a unique format any predifined dataloader wont work
  - Follow best practices of writing custom dataloaders
  - Details of the format of the dataset are defined in the Dataset Details section below
  - Add needed pre-processing that you think would help train a better model or would help as we are using pre-trained weights as starting point
  - Add augmentations that you think would help train a better model
- Implement Loss Function
  - Design and implement a loss function that can handle all of the outputs we have
  - You can use pytorch built-in loss functions
  - There are many scenarios which you need to handle, which one can understand from the dataset details and the model design
- Update Resume Training Functionality using the best weights
  - Current script does not have save best weights functionality
  - The code should be able to resume training from exactly same point from where the training was stopped if model weights file is passed
  - Keep in mind you can not resume training from same point by just loading weights of the model
- Implement a visualize function [Most important, without this no submission will be accepted]
  - The input of the function should be path of a folder with images and the weight file
    - Also the output folder path to save outputs
  - This function should return a dictionary of dictionaries with following details for each image:
    - {
        "has_object": True,
        "cat_or_dog": "cat",
        "specie": "persian",
        "xmin": 10,
        "ymin": 10,
        "xmax": 10,
        "ymax": 10
    }
  - And in case there is no object it should have 0 for bbox values, "NA" for "cat_or_dog" and "specie", and False for "has_object".
  - Values of the returned dictionary should be like explained above and keys should be image names including the extension ".jpg" or ".jpeg"
  - Should save output image with bounding box drawn on it, with same name input image but place in the output folder
- Try to train the best model
- Test function is already implemented but needs updates.
  - [For classification outputs] Kindly choose best metrics (from the torch metrics library) according to the problem you are solving, and update the code to use that metrics
    - You might also have to update the post-process function for heads according to model and loss function design
  - [For Bounding Box output] If you have time you can update code to add metrics for BBOX output (this will be a plus), otherwise explain what metrics you would have used for BBOX and why?

## Dataset Details
The dataset has in total 1041 images. Each image has a single object which is either a cat or a dog.
- There are multiple species for both cat and dog.
- The number of images falling in each specie is as follows:
  - basset_hound: 93
  - Birman: 93
  - pomeranian: 93
  - american_pit_bull_terrier: 93
  - american_bulldog: 93
  - Abyssinian: 92
  - beagle: 93
  - Persian: 93
  - chihuahua: 93
  - empty: 142
- The dataset has two folders:
  - images
    - Inside images folder we have 986 images in .jpg folder
  - labels
    - Inside labels folder we have 899 .xml files each file with details of image labels
    - For any image that does not have a cat or dog, there is no corresponding xml file

## Deliverable
- Updated Colab Based Jupyter Notebook:
  - With all the required functionality Implemented
  - Which one can train the model without any errors
  - One should achieve same metrics (Almost same metrics) if I run training using this collab notebook
    - Set default values for everything accordingly in the notebook
  - During evaluation we will just run the notebook and use the best weights the notebook saves automatically
- Best weights you have trained
  - We will Evaluate your weights against hold-out test we have and compare results
  - We will use visualize function to generate outputs for each image
  - Upload weights in an easily downloadable location like, Dropbox, Google Drive, Github, etc
- A video code-walk through explaining your design decisions including but not limited to:
  - Design Decisions
    - Model Design which layers and activation functions you used and why
    - Loss function, which loss functions you used and why
    - What metrics in test function you would update and why?
  - Any optimizations you have made to the codebase
  - How you implemented resume functionality, what were the things you thought would be needed to resume training from exact same point
  - Explain what parts of the assessment are completed and what is missing?

## Evaluation Criteria
 - Design Decisions
 - Completeness: Did you include all features?
 - Correctness: Does the solution (all deliverables) work in sensible, thought-out ways?
 - Maintainability: Is the code written in a clean, maintainable way?
 - Testing: Is the solution adequately tested?
 - Documentation: Is the codebase well-documented and has proper steps to run any of the deliverables?

## Extra Points
- Any Updates in the notebook (Bugs/Implementation Mistakes etc)

## How to submit
- Please upload the Notebook for this project to GitHub, and post a link to your repository below [repo link box, on the left of submit button].
  - Create a new GitHub repository from scratch
  - Add the final Colab/Jupyter notebook to the repository
- Please upload video and your final best weights on Google Drive or any other platform, and paste the link to the folder with both video and model in the text box just above the submit button.
- Please paste the commit Id of the latest commit of your Github Repo, which should not be later than 4 hours of time when the repo was created.
  - **Please note the submission without the commit id will not be considered.**

# Install Required Modules

In [127]:
! pip install bs4 lxml kaggle torchmetrics



# Download Dataset from Kaggle

In [10]:
import os
os.environ['KAGGLE_USERNAME'] = 'bilalyousaf0014'
os.environ['KAGGLE_KEY'] = '11031bc21c5e3ec23585dbe17dc4267d'

In [11]:
!kaggle datasets download -d bilalyousaf0014/ml-engineer-assessment-dataset

Downloading ml-engineer-assessment-dataset.zip to /content
 97% 76.0M/78.6M [00:03<00:00, 27.3MB/s]
100% 78.6M/78.6M [00:03<00:00, 22.3MB/s]


In [12]:
! unzip /content/ml-engineer-assessment-dataset.zip

Archive:  /content/ml-engineer-assessment-dataset.zip
  inflating: assessment_dataset/images/00001.jpeg  
  inflating: assessment_dataset/images/00008.jpeg  
  inflating: assessment_dataset/images/00017.jpeg  
  inflating: assessment_dataset/images/00022.jpeg  
  inflating: assessment_dataset/images/00048.jpeg  
  inflating: assessment_dataset/images/00055.jpeg  
  inflating: assessment_dataset/images/001.jpeg  
  inflating: assessment_dataset/images/1.jpeg  
  inflating: assessment_dataset/images/1001524.jpeg  
  inflating: assessment_dataset/images/1005343.jpeg  
  inflating: assessment_dataset/images/1049854.jpeg  
  inflating: assessment_dataset/images/1072860.jpeg  
  inflating: assessment_dataset/images/1120419.jpeg  
  inflating: assessment_dataset/images/1146885.jpeg  
  inflating: assessment_dataset/images/2.jpeg  
  inflating: assessment_dataset/images/3.jpeg  
  inflating: assessment_dataset/images/4.jpeg  
  inflating: assessment_dataset/images/5.jpeg  
  inflating: assessm

# MODEL IMPLEMENTATION:

In [128]:
import os
import torch
import torch.nn as nn
import numpy as np

In [129]:
pip install efficientnet_pytorch



In [130]:
import random
from efficientnet_pytorch import EfficientNet

manualSeed = random.randint(1, 10000)
random.seed(manualSeed)
torch.manual_seed(manualSeed)
torch.cuda.manual_seed_all(manualSeed)

# CUSTOM DATALOADER IMPLEMENTATION

In [131]:
train_list0 = np.load('/content/assessment_dataset/train_list.npy', allow_pickle=True).tolist()
val_list0 = np.load('/content/assessment_dataset/val_list.npy', allow_pickle=True).tolist()

In [132]:

train_list1= list(train_list0)
val_list1 = list(val_list0)
print (train_list1[0])
print (len(train_list1))
print (len(val_list1))


Birman_156
832
208


In [133]:
def removefromList(filelist):
  print (len(filelist))
  for filename in filelist:
    path = os.path.join('/content/assessment_dataset', "images/")
    extension = [".jpeg", ".jpg", ".png"]
    bFileExist = False
    for j in range(len(extension)):
      imagename = path + filename + extension[j]
      #print (imagename)
      if (os.path.exists(imagename)):
        bFileExist = True
        break;
    if bFileExist==False:
        filelist.remove(filename)
        print(f"{imagename} does not exist!")
  print (len(filelist))
  return filelist

In [134]:
def removefromListxmlnotexist(filelist):
  print (len(filelist))
  for filename in filelist:
    path = os.path.join('/content/assessment_dataset', "labels/")
    bFileExist = False
    xmlname = path + filename + ".xml"
    print (xmlname)
    if (os.path.exists(xmlname)):
      bFileExist = True
    else:
      filelist.remove(filename)
      print(f"{xmlname} does not exist!")
  print (len(filelist))
  return filelist

In [139]:
train_list2 = removefromList(train_list1)
val_list2 = removefromList(val_list1)
train_list = removefromListxmlnotexist (train_list2)
val_list = removefromListxmlnotexist (val_list2)

print (len(train_list))
print (len(val_list))
train_list2 = removefromList(train_list1)
val_list2 = removefromList(val_list1)
train_list = removefromListxmlnotexist (train_list2)
val_list = removefromListxmlnotexist (val_list2)
print (len(train_list))
print (len(val_list))

679
679
166
166
679
/content/assessment_dataset/labels/Birman_156.xml
/content/assessment_dataset/labels/Birman_181.xml
/content/assessment_dataset/labels/pomeranian_134.xml
/content/assessment_dataset/labels/basset_hound_175.xml
/content/assessment_dataset/labels/basset_hound_168.xml
/content/assessment_dataset/labels/beagle_146.xml
/content/assessment_dataset/labels/american_pit_bull_terrier_100.xml
/content/assessment_dataset/labels/american_pit_bull_terrier_118.xml
/content/assessment_dataset/labels/american_pit_bull_terrier_189.xml
/content/assessment_dataset/labels/Abyssinian_160.xml
/content/assessment_dataset/labels/american_pit_bull_terrier_107.xml
/content/assessment_dataset/labels/Abyssinian_169.xml
/content/assessment_dataset/labels/Abyssinian_195.xml
/content/assessment_dataset/labels/pomeranian_102.xml
/content/assessment_dataset/labels/Birman_18.xml
/content/assessment_dataset/labels/Birman_145.xml
/content/assessment_dataset/labels/basset_hound_111.xml
/content/assessme

In [140]:
from bs4 import BeautifulSoup

def read_xml_file(path):
    with open(path, 'r') as f:
        data = f.read()
    bs_data = BeautifulSoup(data, 'xml')
    return {
        "filename": bs_data.find("filename").text,
        "cat_or_dog": bs_data.find("name").text,
        "xmin": int(bs_data.find("xmin").text),
        "ymin": int(bs_data.find("ymin").text),
        "xmax": int(bs_data.find("xmax").text),
        "ymax": int(bs_data.find("ymax").text),
        "specie": "_".join(path.split(os.sep)[-1].split("_")[:-1])
    }

In [13]:
import albumentations as A



In [141]:
specie2id= {'basset_hound': 0,'Birman': 1,'pomeranian': 2,'american_pit_bull_terrier': 3,'american_bulldog': 4,'Abyssinian': 5,'beagle': 6,'Persian': 7,'chihuahua': 8,'empty':9}
catdog2id = {'cat': 0,'dog': 1, 'NA':2}

id2specie = {v: k for k, v in specie2id.items()}
id2catDog = {v: k for k, v in catdog2id.items()}

#Not **Used** Augmentation for bounding box

In [28]:
import cv2
class Augmenter(object):
    """Convert ndarrays in sample to Tensors."""

    def __call__(self, sample, flip_x=0.5):
        if np.random.rand() < flip_x:
            image, annots = sample['img'], sample['annot']
            image = image[:, ::-1, :]

            rows, cols, channels = image.shape

            x1 = annots[:, 0].copy()
            x2 = annots[:, 2].copy()

            x_tmp = x1.copy()

            annots[:, 0] = cols - x2
            annots[:, 2] = cols - x_tmp

            sample = {'img': image, 'annot': annots}

        return sample

class Resizer(object):
    """Convert ndarrays in sample to Tensors."""

    def __call__(self, sample, common_size=224):
        image, annots = sample['img'], sample['annot']
        height, width, _ = image.shape
        if height > width:
            scale = common_size / height
            resized_height = common_size
            resized_width = int(width * scale)
        else:
            scale = common_size / width
            resized_height = int(height * scale)
            resized_width = common_size

        image = cv2.resize(image, (resized_width, resized_height))

        new_image = np.zeros((common_size, common_size, 3))
        new_image[0:resized_height, 0:resized_width] = image
        annots[:, :4] *= scale

        return {'img': torch.from_numpy(new_image), 'annot': torch.from_numpy(annots), 'scale': scale}

class Normalizer(object):

    def __init__(self):
        self.mean = np.array([[[0.485, 0.456, 0.406]]])
        self.std = np.array([[[0.229, 0.224, 0.225]]])

    def __call__(self, sample):
        image, annots = sample['img'], sample['annot']

        return {'img': ((image.astype(np.float32) - self.mean) / self.std), 'annot': annots}

In [142]:
from PIL import Image
import os
from albumentations.pytorch.transforms import ToTensor


import torchvision.transforms as transforms


class SpeciesDataset():

  def __init__(self, dataset_path, images_list, imgtransform, train=False):
    super(SpeciesDataset, self).__init__()

    self.images_list = images_list
    self.dataset_path = dataset_path
    self.transform = imgtransform
    #image_label_data = read_xml_file("path_to_xml_file")
  def __len__(self):
    return len(self.images_list)

  def __getitem__(self, index):
    image = None
    label = None
    name = self.images_list[index]
    #print (name)

    labelfile = os.path.join(self.dataset_path, "labels", self.images_list[index]+".xml")

    image = None
    if os.path.exists(labelfile):
      label_data = read_xml_file(labelfile)
      file_name = label_data['filename']
      #print(file_name)
      img_name = os.path.join(self.dataset_path, "images", file_name)
      image = Image.open(img_name).convert('RGB')
      image = self.transform(image)
      #print('aftertransform', file_name)

      classid = 0 if label_data['cat_or_dog'] == 'cat' else 1
      bbox = [label_data['xmin'], label_data['ymin'], label_data['xmax'],
                label_data['ymax']]
      stype = specie2id[label_data['specie']]
      objExist=1
    else:
      print ("xmlnot found", labelfile)
      classid = 2
      bbox = [0, 0, 0, 0]
      stype = 9
      objExist =0

    # if (objExist == 1):
    #   print (image.shape)
    # else:
    #   print('none object', name)
    #return {'image': image,'objExist':objExist, 'bbox': bbox, 'category_id': classid}

    return image, {"bbox":bbox,"object":objExist, "cat_or_dog":classid, "specie":stype}

In [143]:
from torch.utils.data import DataLoader
import torchvision.transforms as transforms

root_path = "/content/assessment_dataset"
batch_size = 8
train_transform = transforms.Compose([transforms.Resize((224, 224)), transforms.ToTensor()])
val_transform = transforms.Compose([transforms.Resize((224, 224)), transforms.ToTensor()])

#train_transform = transforms.Compose([Normalizer(), Augmenter(), Resizer(), transforms.])
#val_transform = transforms.Compose([Normalizer(), Resizer()])

train_dataset = SpeciesDataset(root_path, train_list, train_transform, train=True)
training_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True, num_workers=0)

val_dataset = SpeciesDataset(root_path, val_list, val_transform, train=False)
val_loader = DataLoader(val_dataset, batch_size=1, shuffle=False, num_workers=0)


In [144]:
import torchvision
from torchvision import models
import torch.nn.functional as F

class resnetModel(nn.Module):
  def __init__(self):
    super(resnetModel, self).__init__()

    resnet = models.resnet34(pretrained=True)
    layers = list(resnet.children())[:8]
    self.features1 = nn.Sequential(*layers[:6])
    self.features2 = nn.Sequential(*layers[6:])
    self.objectavl = nn.Sequential(nn.BatchNorm1d(512), nn.Linear(512, 1), nn.Sigmoid())

    self.cat_or_dog = nn.Sequential(nn.BatchNorm1d(512), nn.Linear(512, 3))
    self.specie = nn.Sequential(nn.BatchNorm1d(512), nn.Linear(512, 10))
    self.bb = nn.Sequential(nn.BatchNorm1d(512), nn.Linear(512, 4))
  def forward(self, x):
    x = self.features1(x)
    x = self.features2(x)
    x = F.relu(x)
    x = nn.AdaptiveAvgPool2d((1,1))(x)
    x = x.view(x.shape[0], -1)
    return self.objectavl(x), self.cat_or_dog(x), self.specie(x), self.bb(x)

#not Used.. EfficientNet Tried

In [35]:
import torchvision
class Model(nn.Module):

  def __init__(self):
    super(Model, self).__init__()
    self.efficient = EfficientNet.from_pretrained('efficientnet-b0')
    self.efficient.eval()

    ### Initialize the required Layers
    self.have_object = nn.Sequential(
            nn.Conv2d(1280, 64, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.Conv2d(64, 32, kernel_size=3, padding=1),
            nn.AdaptiveAvgPool2d(output_size=1),
            nn.Flatten(),
            nn.Linear(32, 1),
            nn.Sigmoid()
        )
    self.cat_or_dog = nn.Sequential(
            nn.Conv2d(1280, 64, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.Conv2d(64, 32, kernel_size=3, padding=1),
            nn.AdaptiveAvgPool2d(output_size=1),
            nn.Flatten(),
            nn.Linear(32, 3),  nn.LogSoftmax(dim=1)
        )
    self.specie = nn.Sequential(
            nn.Conv2d(1280, 64, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.Conv2d(64, 32, kernel_size=3, padding=1),
            nn.AdaptiveAvgPool2d(output_size=1),
            nn.Flatten(),
            nn.Linear(32, 10), nn.Dropout(0.1), nn.LogSoftmax(dim=1)
        )
    self.bbox = nn.Sequential(nn.Conv2d(1280, 64, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.Conv2d(64, 32, kernel_size=3, padding=1),
            nn.AdaptiveAvgPool2d(output_size=1),
            nn.Flatten(),
            nn.Linear(32, 4))
    ### Initialize the required Layers

  def forward(self, input):
    print ("forward1", input.shape)
    input = input.view(-1, input.size(-3), input.size(-2), input.size(-1))
    print ("forward2", input.shape)

    endpoints = self.efficient.extract_endpoints(input)
    layer1 = endpoints['reduction_6'] #([1, 1280, 7, 7])

    objectscore = self.have_object(layer1)
    catordog = self.cat_or_dog (layer1)
    stype = self.specie (layer1)
    bboxregr = self.bbox
      ### Write Forward Calls for the Model

    return {
          "bbox": bboxregr,
          "object": objectscore,
          "cat_or_dog": catordog,
          "specie": stype
      }


# TRAINING LOOP IMPLEMENTATION

In [145]:


#model = Model()
model = resnetModel()

device = 'cuda' if torch.cuda.is_available() else 'cpu'
model = model.to(device)
num_params = sum(p.numel() for p in model.parameters() if p.requires_grad)
print (device, num_params)

optimizer = torch.optim.Adam(model.parameters(), lr = 0.001)
scheduler = torch.optim.lr_scheduler.ExponentialLR(optimizer, 0.99)



cpu 21298002


In [146]:
have_object_loss = nn.BCELoss()
specie_loss = nn.CrossEntropyLoss()
cat_or_dog_loss = nn.CrossEntropyLoss()
xmin_loss = nn.MSELoss()
ymin_loss = nn.MSELoss()
xmax_loss = nn.MSELoss()
ymax_loss = nn.MSELoss()

model.train()


resnetModel(
  (features1): Sequential(
    (0): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
    (1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (2): ReLU(inplace=True)
    (3): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
    (4): Sequential(
      (0): BasicBlock(
        (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu): ReLU(inplace=True)
        (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
      (1): BasicBlock(
        (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats

In [154]:
def train_one_epoch(epoch_index):
      running_loss = 0.
      last_loss = 0.

      # Here, we use enumerate(training_loader) instead of
      # iter(training_loader) so that we can track the batch
      # index and do some intra-epoch reporting
      #last_loss = 0
      for i, data in enumerate(training_loader):

          # Every data instance is an input + label pair
          inputimage, labels  = data
          batch = inputimage.shape[0]
          #print ("batchsize", batch)
          inputimage = inputimage.to(device)
          # Make predictions for this batch
          #optimizer.zero_grad()
          out_object, out_catdog,out_stype, out_bb  = model(inputimage)
          y_class = F.one_hot(labels["cat_or_dog"],3).type(torch.FloatTensor).to(device)
          y_bb = torch.stack(labels["bbox"]).transpose(0, 1).type(torch.FloatTensor).to(device) #labels['bbox'].cuda
          y_stype = F.one_hot(labels["specie"],10).type(torch.FloatTensor).to(device)
          y_object = labels["object"].type(torch.FloatTensor).to(device)

          # Compute the loss and its gradients
          loss_have_object = have_object_loss(out_object.squeeze(), y_object)
          loss_specie = specie_loss(out_stype, y_stype)
          loss_cat_or_dog = cat_or_dog_loss(out_catdog, y_class)


          loss_xmin = xmin_loss(out_bb[0], y_bb[0])
          loss_ymin = ymin_loss(out_bb[1], y_bb[1])
          loss_xmax = xmax_loss(out_bb[2], y_bb[2])
          loss_ymax = ymax_loss(out_bb[3], y_bb[3])

          loss =  loss_have_object+loss_specie+loss_cat_or_dog+loss_xmin+loss_ymin+ loss_xmax+loss_ymax# Consolidate all individual losses
          optimizer.zero_grad()
          #stypecorr = out_stype.squeeze().eq(y_stype).sum().item()
          classcorracc = (torch.argmax(out_catdog, 1) == torch.argmax(y_class, 1)).float().mean()
          speciescorracc = (torch.argmax(out_stype, 1) == torch.argmax(y_stype, 1)).float().mean()
          objcorrr = out_object.squeeze().eq(torch.round(y_object)).sum().item()
          if (i == 0):
            print (loss_have_object.item(), loss_specie.item(), loss_cat_or_dog.item(), loss_xmin.item(), loss_ymin.item(), loss_xmax.item(), loss_ymax.item(),loss.item())
          loss.backward()
                # update weights
          optimizer.step()
                # print progress
          # Gather data and report
          running_loss += loss.item()
          if i % 10 == 0:
              last_loss = running_loss / 10 # loss per batch
              running_loss = 0.
      return last_loss
#train_one_epoch (2)

In [153]:
import copy
def Test(model):
    model.eval()
    total = 0
    sum_loss = 0
    speciescorrect = 0
    classcorrect = 0
    objectscore = 0
    for i, data in enumerate(val_loader):

          # Every data instance is an input + label pair
          inputimage, labels  = data
          batch = inputimage.shape[0]
          #print ("batchsize", batch)
          inputimage = inputimage.to(device)
          # Make predictions for this batch
          #optimizer.zero_grad()
          out_object, out_catdog,out_stype, out_bb  = model(inputimage)
          y_class = F.one_hot(labels["cat_or_dog"],3).type(torch.FloatTensor).to(device)
          y_bb = torch.stack(labels["bbox"]).transpose(0, 1).type(torch.FloatTensor).to(device) #labels['bbox'].cuda
          y_stype = F.one_hot(labels["specie"],10).type(torch.FloatTensor).to(device)
          y_object = labels["object"].type(torch.FloatTensor).to(device)

          # Compute the loss and its gradients
          loss_have_object = have_object_loss(out_object.squeeze(), y_object)
          loss_specie = specie_loss(out_stype, y_stype)
          loss_cat_or_dog = cat_or_dog_loss(out_catdog, y_class)


          loss_xmin = xmin_loss(out_bb[0], y_bb[0])
          loss_ymin = ymin_loss(out_bb[1], y_bb[1])
          loss_xmax = xmax_loss(out_bb[2], y_bb[2])
          loss_ymax = ymax_loss(out_bb[3], y_bb[3])

          loss =  loss_have_object+loss_specie+loss_cat_or_dog+loss_xmin+loss_ymin+ loss_xmax+loss_ymax# Consolidate all individual losses
          sum_loss += loss.item()
          classcorrect += (torch.argmax(out_catdog, 1) == torch.argmax(y_class, 1)).float().mean()
          speciescorrect += (torch.argmax(out_stype, 1) == torch.argmax(y_stype, 1)).float().mean()
          objectscore += out_object.squeeze().eq(torch.round(y_object)).sum().item()
          #lossitem = float(loss.item())
          #history.append(lossitem)

    lossitem = sum_loss/len(val_loader)
    if lossitem < bestloss:
      bestloss = lossitem
      best_weights = copy.deepcopy(model.state_dict())
    return classcorrect/len(val_loader), speciescorrect/len(val_loader), objectscore/len(val_loader)

In [152]:
bestloss = -np.Infinity
best_weights = None

def train (epochs):
  for i in range(epochs):

    epoch_loss = train_one_epoch(i)
    print(f' Epoch {i} Loss : {epoch_loss}')
    metrics = Test(model)

nepochs = 5
train (nepochs)
torch.save(best_weights, "best_model.pth")


0.135262131690979 2.635745048522949 0.6998916864395142 15339.5673828125 24267.32421875 7160.931640625 18211.08203125 64982.37890625
0.1285443753004074 2.3250932693481445 0.6391024589538574 14682.9609375 37627.7109375 19183.265625 12367.568359375 83864.6015625
0.09361433237791061 2.234076499938965 0.7603674530982971 48884.9375 58592.26953125 8796.2958984375 40891.3515625 157167.9375
0.07052665948867798 2.890582323074341 1.3923566341400146 68160.359375 23692.203125 15232.74609375 52176.69140625 159266.34375
0.0533633828163147 2.4233853816986084 0.5688310861587524 14644.818359375 74601.1015625 76210.109375 89748.40625 255207.484375
0.04239656776189804 2.7349038124084473 1.4674261808395386 40488.2890625 43136.859375 23047.28125 25319.712890625 131996.390625
0.03431083634495735 2.3090505599975586 0.5533962845802307 33925.5 21986.22265625 54898.66796875 22117.890625 132931.1875
0.02833399549126625 2.5744214057922363 0.9828304648399353 46730.11328125 19256.041015625 66341.6875 49333.671875 18

ValueError: ignored

## Initializations

In [None]:


################################## HELPER CODE PROVIDED BY HIRING TEAM ##################################
"""
This codebase is provided to help you finish the assessment in time.
Yes, this code is not optimized, and properly formated. And there can be bugs, but best option
for you is to use it as it is until you are done with all other aspects of the codebase. And then
If you think that you are not able to achieve good results becuase this test function is problematic,
then you can update it.
"""
import torchmetrics

def test(model, val_loader):

  def __tl(x):
    return x.tolist()

  def __tn(x):
    return x.detach().cpu().numpy()

  def __tnl(x):
    return (x.detach().cpu().numpy()).tolist()

  def post_process_object(x):
    return torch.where(x > 0.5, 1.0, 0.0).squeeze(1)

  def post_process_cat_or_dog(x):
    return torch.where(x > 0.5, 1.0, 0.0).squeeze(1)

  def post_process_specie(x):
    return torch.argmax(x, dim=1)

  def post_process_xmin(x):
    return x

  def post_process_ymin(x):
    return x

  def post_process_xmax(x):
    return x

  def post_process_ymax(x):
    return x

  metric_object = torchmetrics.Accuracy(task="binary")
  metric_cat_or_dog = torchmetrics.Accuracy(task="binary")
  metric_specie = torchmetrics.Accuracy(task="multiclass", num_classes=9)

  output_list = {
      "object": [],
      "cat_or_dog": [],
      "specie": [],
      "xmin": [],
      "ymin": [],
      "xmax": [],
      "ymax": [],
  }
  labels_list = {
      "object": [],
      "cat_or_dog": [],
      "specie": [],
      "xmin": [],
      "ymin": [],
      "xmax": [],
      "ymax": [],
  }

  for i, data in enumerate(val_loader):
    inputs, labels = data
    if torch.cuda.is_available():
      inputs = inputs.cuda()

    # Make predictions for this batch
    outputs = model(inputs)

    is_object = __tnl(labels["have_object"])
    width = __tn(labels["width"])
    height = __tn(labels["height"])
    output_list["object"].extend(__tnl(post_process_object(outputs["object"])))
    labels_list["object"].extend(__tnl(labels["have_object"]))

    if is_object[0] == 1.0:
      output_list["cat_or_dog"].extend(
        __tnl(post_process_cat_or_dog(outputs["cat_or_dog"]))
      )
      labels_list["cat_or_dog"].extend(
        __tnl(labels["cat_or_dog"])
      )
      output_list["specie"].extend(
        __tnl(post_process_specie(outputs["specie"]))
      )
      labels_list["specie"].extend(__tnl(labels["specie"]))
      output_list["xmin"].extend(
        __tl(__tn(post_process_xmin(outputs["bbox"][:, 0]))*width)
      )
      labels_list["xmin"].extend(__tl(__tn(labels["xmin"])*width))
      output_list["ymin"].extend(
          __tl(__tn(post_process_ymin(outputs["bbox"][:, 1]))*height)
      )
      labels_list["ymin"].extend(
          __tl(__tn(labels["ymin"])*height)
      )
      output_list["xmax"].extend(
          __tl(__tn(post_process_xmax(outputs["bbox"][:, 2]))*width)
      )
      labels_list["xmax"].extend(__tl(__tn(labels["xmax"])*width))
      output_list["ymax"].extend(__tl(__tn(post_process_ymax(outputs["bbox"][:, 3]))*height))
      labels_list["ymax"].extend(__tl(__tn(labels["ymax"])*height))

  score_object = metric_object(torch.tensor(output_list["object"]), torch.tensor(labels_list["object"]))
  score_cat_or_dog = metric_cat_or_dog(torch.tensor(output_list["cat_or_dog"]), torch.tensor(labels_list["cat_or_dog"]))
  score_specie = metric_specie(torch.tensor(output_list["specie"]), torch.tensor(labels_list["specie"]))
  score_bbox = None
  return score_object, score_cat_or_dog, score_specie, score_bbox
################################## HELPER CODE PROVIDED BY HIRING TEAM ##################################


def train(epochs, model_weights):

  # Initialize Model and Optimizer
  model = Model()
  optimizer = None

  # Initialize Loss Functions
  have_object_loss = None
  specie_loss = None
  cat_or_dog_loss = None
  bbox_loss = None # Not necessary you need to apply function to all coordinates together, You can have separete loss functions for all coordinates too
  # Below or Above
  xmin_loss = None
  ymin_loss = None
  xmax_loss = None
  ymax_loss = None

  training_dataset = CustomDataset("/content/assessment_dataset", images_list=train_list)
  training_loader = None

  def train_one_epoch(epoch_index, tb_writer):
      running_loss = 0.
      last_loss = 0.

      # Here, we use enumerate(training_loader) instead of
      # iter(training_loader) so that we can track the batch
      # index and do some intra-epoch reporting
      for i, data in enumerate(training_loader):
          # Every data instance is an input + label pair
          inputs, labels = data

          # Make predictions for this batch
          outputs = model(inputs)

          # Compute the loss and its gradients
          loss_have_object = have_object_loss(outputs["object"], None)
          loss_specie = specie_loss(outputs["specie"], None)
          loss_cat_or_dog = cat_or_dog_loss(outputs["cat_or_dog"], None)

          loss_bbox = bbox_loss(outputs["bbox"], None)
          # Above or Below
          loss_xmin = xmin_loss(outputs["bbox"], None)
          loss_ymin = ymin_loss(outputs["bbox"], None)
          loss_xmax = xmax_loss(outputs["bbox"], None)
          loss_ymax = ymax_loss(outputs["bbox"], None)

          loss =  # Consolidate all individual losses

          # Gather data and report
          running_loss += loss.item()
          if i % 10 == 0:
              last_loss = running_loss / 10 # loss per batch
              running_loss = 0.
      return last_loss

  for i in range(epochs):

    epoch_loss = train_one_epoch(i, None)
    print(f' Epoch {i} Loss : {epoch_loss}')

    torch.save("model.pth", model.state_dict())
    metrics = test(model)
    print(metrics)

In [None]:
from PIL import ImageDraw

def visualize(model_weights, image_folder_path, output_folder="output"):

  model = Model()
  model.load_state_dict(torch.load(model_weights))

  try:
    image = Image.open(os.path.join("/content/assessment_dataset/images", image_name+".jpg"))
  except:
    image = Image.open(os.path.join("/content/assessment_dataset/images", image_name+".jpeg"))

  preprocess = None
  output = model()
  return {}