# HW2

In [None]:
# Name: Seyyid Osman Sevgili    
# ID: 504221565

## PART I - Convolutional Neural Networks [12 pts]

In this part of assignment, first we will implement convolution operation in 2 dimensions, then we will move to a Deep Learning framework for faster computation via GPUs!

In this assignment, we will use the same API as in Assignment 1. You have implemented most of the required layers. You will add Conv2d layer under `DL/CNN.py`

In [None]:
from DL.CNN import Conv2d
from DL.checker.checks import *
import numpy as np
from DL.regularizers import Dropout, MaxPool2d, AveragePool2d, BatchNorm, BatchNorm2d
%load_ext autoreload
%autoreload 2
%reload_ext autoreload

In [None]:
# Additional imports
import matplotlib.pyplot as plt
import warnings
import itertools
from tqdm import tqdm

### Convolutional Layer
Implement and call the forward and backward passes for the convolutional layer in conv2d.

#### Forward Pass  [6 pts]

In [None]:
conv = Conv2d(in_size=1, out_size=1, kernel_size=4, stride=2, padding=1)
x_shape = (2, 3, 4, 4)
w_shape = (3, 3, 4, 4)
x = np.linspace(-0.1, 0.4, num=np.prod(x_shape)).reshape(x_shape)
conv.W = np.linspace(-0.2, 0.3, num=np.prod(w_shape)).reshape(w_shape)
conv.b = np.linspace(-0.2, 0.3, num=3)


out = conv.forward(x)
# difference should be around 2e-8
print('Testing conv_forward_naive')
relError = rel_error(out, "CNN_forward")
print(f'difference: ', relError)
assert 2.9e-8 > relError

#### Backward Pass  [6 pts]

In [None]:
np.random.seed(250)
conv = Conv2d(in_size=1, out_size=2, stride=1, padding=1, kernel_size=3)

x = np.random.randn(3, 1, 6, 6)
conv.W = np.random.randn(2, 1, 3, 3)
conv.b = np.random.randn(2,)
dout = np.random.randn(3, 2, 6, 6)

dx_num = grad_check(lambda _: conv.forward(x), x, dout)
dw_num = grad_check(lambda _: conv.forward(x), conv.W, dout)
db_num = grad_check(lambda _: conv.forward(x), conv.b, dout)

out = conv.forward(x)
dx, dw, db = conv.backward(dout)

print(f'dx error: {rel_error(dx, dx_num)}')
print(f'dw error: {rel_error(dw, dw_num)}')
print(f'db error: {rel_error(db, db_num)}')

## PART II - Regularizers and Pooling  [32 pts]

You are going to implement regularization techniques widely used until recently in convolutional networks such as **Max Pooling** and **Dropout**

Find `Dropout`, `MaxPool2d`, `AveragePool2d`, `BatchNorm` and `BatchNorm2d` classes in **`DL/regularizers.py`** and complete the implementation of `forward` and `backward` methods for both of them.

### Dropout layer

As we covered in the class, dropout is a well-known regularization technique for preventing overfitting of neural networks. What dropout does is basically zeroing out of some outputs of hidden layers at random. We recommend you to multiply the dropout factor with outputs in forward pass as it is done in common implementations. Recall that this is called **Inverted Dropout**.

For more information on dropout, you can check the paper below.

**Improving neural networks by preventing co-adaptation of feature detectors**, Hinton et al.
https://arxiv.org/pdf/1207.0580.pdf

#### Forward pass  [3 pts]

In [None]:
np.random.seed(250)

x = np.random.randn(500, 2000) + 250
for p in [0.3, 0.5, 0.8]:
    dropout = Dropout(p=p)
    dropout.mode = 'train'
    out = dropout.forward(x)
    dropout.mode = 'test'
    out_test = dropout.forward(x)

    print(f'Dropout rate is: {p}')
    print(f'Percent of how much of input is zeroed out in training  {(out == 0).mean():.5f}, in testing {(out_test == 0).mean():.5f}')

# You can check wheter your implemention is true or not by looking at the percent of outputs set to zero

#### Backward pass  [3 pts]

In [None]:
dropout = Dropout(p=0.75)
np.random.seed(250)
x = np.random.randn(12, 12) + 11
dout = np.random.randn(*x.shape)


out = dropout.forward(x,seed=250)
dx = dropout.backward(dout)
dx_num = grad_check(lambda xx: dropout.forward(xx, seed=250), x, dout)

relError = rel_error(dx, dx_num)
print(f'Error on dx {relError}')
assert 5e-10 > relError

### MaxPool
#### Forward Pass  [3 pts]

In [None]:
x_shape = (3, 3, 7, 7)
x = np.linspace(-0.2, 0.4, num=np.prod(x_shape)).reshape(x_shape)
maxPool = MaxPool2d(stride = 2, pool_width = 3, pool_height = 3)
out = maxPool.forward(x)

relError = rel_error(out, "maxpool_forward")
print(f'Error: {relError}')
assert 1e-6 > relError

#### Backward pass  [3 pts]

In [None]:
np.random.seed(250)
x = np.random.randn(8, 1, 10, 10)
dout = np.random.randn(8, 1, 5, 5)
max_pool = MaxPool2d(pool_height=2, pool_width=2, stride=2)
dx_num = grad_check(lambda x: max_pool.forward(x), x, dout)

out = max_pool.forward(x)
dx = max_pool.backward(dout)

# Your error should be around 1e-12
print('Testing max_pool_backward_naive function:')
relError = rel_error(dx, dx_num)
print(f'dx error: {relError}')
assert 5e-12 > relError

### AveragePool
#### Forward Pass  [3 pts]

In [None]:
x_shape = (3, 3, 7, 7)
x = np.linspace(-0.2, 0.4, num=np.prod(x_shape)).reshape(x_shape)
average_pool = AveragePool2d(stride = 2, pool_width = 3, pool_height = 3)
out = average_pool.forward(x)

relError = rel_error(out, "averagepool_forward")
print(f'Error: {relError}')
assert 2e-7 > relError

#### Backward Pass  [3 pts]

In [None]:
np.random.seed(250)
x = np.random.randn(8, 1, 10, 10)
dout = np.random.randn(8, 1, 5, 5)
average_pool = AveragePool2d(pool_height=2, pool_width=2, stride=2)
dx_num = grad_check(lambda x: average_pool.forward(x), x, dout)

out = average_pool.forward(x)
dx = average_pool.backward(dout)

# Your error should be around 1e-11
print('Testing average_pool_backward_naive function:')
relError = rel_error(dx, dx_num)
print(f'dx error: {relError}')
assert 5e-10 > relError

### Batch Normalization 1D

#### Forward Pass  [3 pts]
First read and understand the paper:

S. Ioffe, C. Szegedy. 2015. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift
https://arxiv.org/pdf/1502.03167.pdf

Implement the forward and backward passes for the Batch Normalization technique.

In [None]:
# You should understand how the gamma and beta parameters affect to the output

# An example of a single hidden layer with ReLU activation.
np.random.seed(250)
N, D1, D2 = 180, 60, 3,
X = np.random.randn(N, D1)
W1 = np.random.randn(D1, D2)
a = np.maximum(0, X.dot(W1))

bn1 = BatchNorm(D2)

print('Without using batchnorm')
print(f'\t mean of each feature/channel: {a.mean(axis=0)}')
print(f'\t stds of each feature/channel: {a.std(axis=0)}')


print('Stats after batch normalization with gamma=1, beta=0')
normalized = bn1.forward(a)
print(f'\t mean: {normalized.mean(axis=0)}')
print(f'\t std: {normalized.std(axis=0)}')


bn1.gamma = np.array([3.0, 2.0, 1.0])
bn1.beta = np.array([4, 2, 5])
normalized  = bn1.forward(a)
print('Stats after batch normalization with arbitirary parameters')
print(f'\t mean: {normalized.mean(axis=0)}')
print(f'\t std: {normalized.std(axis=0)}')

#### Backward pass  [5 pts]

In [None]:
# Gradient check batchnorm backward pass
np.random.seed(250)
N, D = 20, 6
x = 3 * np.random.randn(N, D) + 9

bn1 = BatchNorm(D)
gamma = np.random.randn(D)
beta = np.random.randn(D)
dout = np.random.randn(N, D)

fx = lambda x: bn1.forward(x, gamma=gamma, beta=beta)
fg = lambda a: bn1.forward(x, gamma=a, beta=beta)
fb = lambda b: bn1.forward(x, gamma=gamma, beta=b)

dx_num = grad_check(fx, x, dout)
da_num = grad_check(fg, gamma.copy(), dout)
db_num = grad_check(fb, beta.copy(), dout)

bn1.forward(x, gamma=gamma, beta=beta)
dx, dgamma, dbeta = bn1.backward(dout)

relError = rel_error(dx_num, dx)
print(f'dx error: {relError}')
assert 1e-7 > relError

relError = rel_error(da_num, dgamma)
print(f'dgamma error: {relError}')
assert 1e-10 > relError

relError = rel_error(db_num, dbeta)
print(f'dbeta error: {relError}')
assert 1e-11 > relError

### Batch Normalization 2D
#### Forward Pass  [2 pts]

Implement BatchNorm2d. This computes statistics per-channel over batch as in pytorch-Batchnorm2D. You can take the Pytorch documentation as reference.
https://pytorch.org/docs/stable/generated/torch.nn.BatchNorm2d.html

In [None]:
np.random.seed(250)
N, C, H, W = 180, 3, 5, 5  # Batch size, channels, height, width

# Generating random data with the correct shape
X = np.random.randn(N, C, H, W)

W = np.array([2., 3., 1.]).reshape((1,C,1,1))
b = np.array([1., 0.5, 3.]).reshape((1,C,1,1))
# Reshaping for the linear layer - simulating a linear transformation
a = W*X + b

# Creating a BatchNorm2d instance for the specified number of channels
bn2d = BatchNorm2d(C)

print('Without using BatchNorm2d')
print(f'\t mean of each channel: {a.mean(axis=(0,2,3))}')
print(f'\t std of each channel: {a.std(axis=(0,2,3))}')

# Using BatchNorm2d with default parameters (gamma=1, beta=0)
normalized = bn2d.forward(X)
print('\nStats after BatchNorm2d normalization with gamma=1, beta=0')
print(f'\t mean: {normalized.mean(axis=(0, 2, 3))}')
print(f'\t std: {normalized.std(axis=(0, 2, 3))}')

# Changing gamma and beta parameters
bn2d.gamma = np.array([3.0, 2.0, 1.0]).reshape((1,C,1,1))
bn2d.beta = np.array([4, 2, 5]).reshape((1,C,1,1))
normalized = bn2d.forward(X)

print('\nStats after BatchNorm2d normalization with arbitrary parameters')
print(f'\t mean: {normalized.mean(axis=(0, 2, 3))}')
print(f'\t std: {normalized.std(axis=(0, 2, 3))}')

#### Backward pass  [4 pts]

In [None]:
np.random.seed(250)
N, C, H, W = 10, 3, 4, 4  # Batch size, channels, height, width

x = 3 * np.random.randn(N, C, H, W) + 7

bn2d = BatchNorm2d(C)
gamma = np.random.randn(C).reshape((1,C,1,1))
beta = np.random.randn(C).reshape((1,C,1,1))
dout = np.random.randn(N, C, H, W)

# Function to be used for numerical gradient calculation
fx = lambda x: bn2d.forward(x, gamma=gamma, beta=beta)
fg = lambda g: bn2d.forward(x, gamma=g, beta=beta)
fb = lambda b: bn2d.forward(x, gamma=gamma, beta=b)

# Gradient check for dx (input)
dx_num = grad_check(fx, x, dout)

# Gradient check for dgamma (gamma parameter)
da_num = grad_check(fg, gamma.copy(), dout)

# Gradient check for dbeta (beta parameter)
db_num = grad_check(fb, beta.copy(), dout)

# Perform the backward pass to get gradients from the BatchNorm2d layer
bn2d.forward(x, gamma=gamma, beta=beta)
dx, dgamma, dbeta = bn2d.backward(dout)

# Calculate relative errors for each gradient
rel_error_dx = rel_error(dx_num, dx)
print(f'dx error: {rel_error_dx}')
assert rel_error_dx < 1e-7

rel_error_dgamma = rel_error(da_num, dgamma)
print(f'dgamma error: {rel_error_dgamma}')
assert rel_error_dgamma < 1e-10

rel_error_dbeta = rel_error(db_num, dbeta)
print(f'dbeta error: {rel_error_dbeta}')
assert rel_error_dbeta < 1e-11

# PART III - Convolutional Neural Networks vs ResNets [28 pts]

You can use Google Colab for the rest of the homework.

## Environment setup
Follow the tutorial about how to utilize Google Colab but **don't install PyTorch** as mentioned in the blog post.

English:
https://medium.com/deep-learning-turkey/google-colab-free-gpu-tutorial-e113627b9f5d


In [None]:
!nvidia-smi
# This command should return some information about the GPU status if the runtime is right.
# In addition to that, if you encounter memory issues, you can diagnose your model by this command.

In [None]:
import numpy as np
import random
import os
import matplotlib.pyplot as plt

import torch
from torch import nn, optim
from torchvision import transforms, datasets
from torch.utils.data import DataLoader, Dataset
import torch.nn.functional as F
from torchvision import utils

from tqdm import tqdm


In [None]:
def seed_everything(seed):
    random.seed(seed)
    os.environ['PYTHONHASHSEED'] = str(seed)
    np.random.seed(seed)
    torch.manual_seed(seed)
    torch.cuda.manual_seed(seed)
    torch.backends.cudnn.deterministic = True
    torch.backends.cudnn.benchmark = True
seed_everything(0)

In [None]:
def fetch_dataloader():
    # using random crops and horizontal flip for train set
    train_transformer = transforms.Compose([
            transforms.RandomCrop(32, padding=4),
            transforms.RandomHorizontalFlip(),  # randomly flip image horizontally
            transforms.ToTensor(),
            transforms.Normalize((0.4914, 0.4822, 0.4465), (0.247, 0.243, 0.261))])

    # transformer for dev set
    dev_transformer = transforms.Compose([
        transforms.ToTensor(),
        transforms.Normalize((0.4914, 0.4822, 0.4465), (0.247, 0.243, 0.261))])

    # ************************************************************************************
    trainset = torchvision.datasets.CIFAR10(root='./data/data-cifar10', train=True,
                                                download=True, transform=train_transformer)
    devset = torchvision.datasets.CIFAR10(root='./data/data-cifar10', train=False,
                                              download=True, transform=dev_transformer)

    
    trainloader = torch.utils.data.DataLoader(trainset, batch_size=64,
                                              shuffle=True, num_workers=0)

    devloader = torch.utils.data.DataLoader(devset, batch_size=64,
                                            shuffle=False, num_workers=0)
    
    return trainloader, devloader


### Implement a Deep Neural Network with Convolutional Neural Network Architectures [4 pts]

Use Convolutional Neural Network, Linear and Activation Layers to design a CNN.

In [None]:
class CNN(torch.nn.Module):
    """
    Implement a Convolutional Neural Network with (at least two) convolutional, linear and activation layers you have implemented
    """
    def __init__(self):
        """
        Implement architecture
        """
        super(CNN, self).__init__()
        # YOUR CODE STARTS


        # YOUR CODE ENDS

    def forward(self, x):
        """
        Implement forward-pass
        """

         # YOUR CODE STARTS


        # YOUR CODE ENDS

        return x

### Implement a ResNet with residual blocks [4 pts]

Residual Networks introduce skip connections to improve training dynamics. First, implement a simplified ResNet block. Then, use three residual blocks to build a small ResNet. Use layers you implemented above.



In [None]:
class customResNet(torch.nn.Module):
    def __init__(self):
        """
        Implement architecture
        """
        super(customResNet, self).__init__()
                
        self.conv1 = ## TODO ##
        self.bn1 = ## TODO ##
        self.AvgPool2d = ## TODO ##
        self.relu = ## TODO ##
        
        self.layer1 = self.make_block(64, stride=1)
        self.layer2 = self.make_block(128, stride=2)
        self.layer3 = self.make_block(256, stride=2)
        self.linear = ## TODO ##

        
    def make_block(self, ch=64, stride=2)
    
        # YOUR CODE STARTS

        # YOUR CODE ENDS    

        
    def forward(self, x):
        out = self.relu(self.bn1(self.conv1(x)))
        out = self.layer1(out)
        out = self.layer2(out)
        out = self.layer3(out)
        out = self.AvgPool2d(out)
        out = out.view(out.size(0), -1)
        out = self.linear(out)
        return out
    

### Initialize your networks

In [None]:
networks = {"CNN": CNN(), "customResNet": customResNet()}

### Initialization of the Optimizers: You can change the Parameters according to your needs

In [None]:
optimizers = {
    "CNN": optim.Adam(networks["CNN"].parameters(), lr=1e-3),
    "customResNet": optim.Adam(networks["customResNet"].parameters(), lr=1e-3),
    }

### Initialization of the losses: You can change the parameters according to your needs

In [None]:
losses = {
    "CNN": torch.nn.CrossEntropyLoss(reduction='sum'),
    "customResNet": torch.nn.CrossEntropyLoss(reduction='sum'),
    }

### Setting the training length: You can change the parameters according to your needs

In [None]:
epochs = {
    "CNN": 15,
    "customResNet": 15
    }

### Training Loop [5 pts]

In [None]:
def train(network, optimizer, loss_fn, train_loader, epochs=10, verbose=True, device="cpu"):
    """
    Implement training loop
    """
    train_data = None
    # YOUR CODE STARTS


    # YOUR CODE ENDS
    return network, train_data

### Evaluation based on Accuracy [3 pts]

In [None]:
def evalf(network, test_loader, device="cpu"):
    """
    Implement evaluation for accuracy
    """
    val_data = None

    # YOUR CODE STARTS


    # YOUR CODE ENDS

    return accuracy, val_data

### Combine everything

In [None]:
device = "cuda" if torch.cuda.is_available() else "cpu"
train_data, val_data = {}, {}
train_loader, test_loader = fetch_dataloader()

for arch in ["CNN", "customResNet"]:
    network, optimizer, loss_fn = networks[arch], optimizers[arch], losses[arch]

    network, train_data[arch] = train(network, optimizer, loss_fn, train_loader, epochs=epochs[arch], verbose=False, device=device)
    accuracy, val_data[arch] = evalf(network, test_loader, device=device)
    
    print(f"for {arch}|\t Accuracy: {accuracy:.5f}")

### Visualize The First Two Convolution Layer Filter/Kernels of CNN Model and comment on the apperance of the filters.  [3 pts]
Display the filters/kernels in the first and second convolutional layers. You can change below code if needed.

In [None]:
# visualize the first two conv layer outputs, filter an example image with first conv filters
# Code here
conv_layers[0] # First layer conv filters
conv_layers[1] # Second layer conv filters
# Code here


## Compare the results [9 pts]

In this part, you will compare the results of both methods. While answering the questions below, consider the following points:

- What are the advantages and disadvantages of these models?
- In what scenarios would you use a CNN versus a custom ResNet?
- Is early stopping necessary during training? If so, why?
    
Also, feel free to use any library you want.

#### Plot loss and accuracy curves per training epoch for both train and test sets. Make sure you put legend and labels. Comment on the results. [3 pts]

#### Visualize a few example test outputs with true and predicted labels (both models). Make comments [3 pts]

#### Compare number of parameters of the models. Comment your insights [3 pts]

# PART IV Object Detection [28 pts]

Object detection is a computer vision task where the goal is to identify and localize objects within an image. 
Unlike image classification, where the focus is on assigning a single label to an image, object detection involves:
- Predicting **what** objects are present (class labels).
- Predicting **where** the objects are located (bounding boxes).

#### Image Classification vs Object Detection:

Image classification answers the question: **What is in the image?**

Object detection answers the questions: **What objects are in the image, and where are they located?**

#### Why Object Detection?
Object detection is used in a variety of real-world applications, such as:
- **Autonomous vehicles**: Detecting pedestrians, vehicles, and traffic signs.
- **Medical imaging**: Identifying abnormalities in scans (e.g., tumors).
- **Surveillance**: Recognizing and tracking people or objects in video feeds.
- **Retail**: Counting products or monitoring shelves.


#### Components of Object Detection:

A basic object detection network typically consists of two outputs:
1. **Class Prediction**: A probability distribution over possible object classes.
2. **Bounding Box Regression**: Coordinates of the bounding box enclosing the object.


- The bounding box is represented as `[x_min, y_min, x_max, y_max]`, where:
  - `(x_min, y_min)` is the top-left corner.
  - `(x_max, y_max)` is the bottom-right corner.
- The network predicts both **what** (class label) and **where** (bounding box coordinates).

### Define a Basic Object Detection Network
The network should:
- Take an image as input.
- Output a class label and four bounding box coordinates.

**Hints**:
- The final layer should output both the class probabilities and bounding box coordinates.
- Use the CNN you trained in the previous part as the backbone of your object detector. Add two heads to your backbone model, where one of the heads will predict the object class and the other will predict the object location. 


**Steps**:
- Implement a network that consists of a backbone and two heads.
- Build forward and backward propagation for the network.
- Use separate loss functions for classification (cross-entropy) and bounding box regression (mean squared error).
- Train the network on the object detection dataset.


### Dataset

You will detect license plates:
https://ieee-dataport.org/open-access/cd-lp-compressed-domain-license-plate-detection-database

The dataset contains 2,400 vehicle images for license plate detection purposes. There are 3 subsets, where you'll use only pixel domain images (2400 images). Since you'll predict the license plate coordinates, number of classes will be 2 (plate and non-plate objects). 

#### Implement the network [10 pts]

In [None]:
# codes here

#### Train the network [5 pts]

In [None]:
# codes here

#### Evaluate the model and plot precision, recall, and mAP scores for different IoU thresholds [5 pts]

Evaluate the network using **evaluate()** below for different IoU thresholds [0.1; 0.1; 1]:



In [None]:

import numpy as np

def calculate_iou(pred_box, gt_box):
    """
    Calculate Intersection over Union (IoU) between two bounding boxes.
    Args:
        pred_box (list): [x_min, y_min, x_max, y_max] for the predicted box.
        gt_box (list): [x_min, y_min, x_max, y_max] for the ground truth box.
    Returns:
        float: IoU value.
    """
    # Compute intersection
    x_min_inter = max(pred_box[0], gt_box[0])
    y_min_inter = max(pred_box[1], gt_box[1])
    x_max_inter = min(pred_box[2], gt_box[2])
    y_max_inter = min(pred_box[3], gt_box[3])

    inter_area = max(0, x_max_inter - x_min_inter) * max(0, y_max_inter - y_min_inter)
    
    # Compute union
    pred_area = (pred_box[2] - pred_box[0]) * (pred_box[3] - pred_box[1])
    gt_area = (gt_box[2] - gt_box[0]) * (gt_box[3] - gt_box[1])
    union_area = pred_area + gt_area - inter_area

    return inter_area / union_area if union_area > 0 else 0

def evaluate(predictions, ground_truths, iou_threshold=0.5):
    """
    Evaluate object detection metrics: Precision, Recall, and mAP.
    Args:
        predictions (list): List of predicted bounding boxes [[x_min, y_min, x_max, y_max, class], ...].
        ground_truths (list): List of ground truth boxes [[x_min, y_min, x_max, y_max, class], ...].
        iou_threshold (float): IoU threshold to consider a prediction correct.
    Returns:
        dict: Precision, Recall, and mAP scores.
    """
    tp = 0  # True positives
    fp = 0  # False positives
    fn = 0  # False negatives

    matched_gt = set()  # Keep track of matched ground truths

    for pred in predictions:
        pred_box, pred_class = pred[:4], pred[4]
        max_iou = 0
        matched = None

        for i, gt in enumerate(ground_truths):
            gt_box, gt_class = gt[:4], gt[4]

            # Only consider matching predictions of the same class
            if pred_class == gt_class:
                iou = calculate_iou(pred_box, gt_box)
                if iou > max_iou and iou >= iou_threshold and i not in matched_gt:
                    max_iou = iou
                    matched = i

        if matched is not None:
            tp += 1
            matched_gt.add(matched)
        else:
            fp += 1

    fn = len(ground_truths) - len(matched_gt)

    precision = tp / (tp + fp) if tp + fp > 0 else 0
    recall = tp / (tp + fn) if tp + fn > 0 else 0

    # For simplicity, mAP can be computed as Precision in this example
    # In real scenarios, mAP requires averaging precision at multiple recall thresholds
    mAP = precision

    return {"Precision": precision, "Recall": recall, "mAP": mAP}

# Example data
predictions = [
    [50, 50, 150, 150, "car"],  # [x_min, y_min, x_max, y_max, class]
    [30, 30, 120, 120, "car"],
]

ground_truths = [
    [40, 40, 140, 140, "car"],  # [x_min, y_min, x_max, y_max, class]
    [60, 60, 170, 170, "car"],
]

# Evaluate
results = evaluate(predictions, ground_truths, iou_threshold=0.5)
print("Evaluation Results:")
print(results)


In [None]:
# codes here

### Comparison [8 pts]
Try different backbones: the Resnet you trained and an untrained (only initialized) CNN.
Do backbones with different architectures affect the object detection performance?

Compare the pretrained and untrained backbones: Do pretrained backbones ease object detection, in your experiments? If not, why?

In [None]:
# Compare and comment here 