# Computer Vision Assignment 1 Part 2
---

Semester: **Fall 2023**

Due date: **October 5th 2023, 11.59PM EST.**

## Introduction
---
This assignment requires you to participate in a Kaggle competition with the rest of the class on the [German Traffic Sign Recognition Benchmark](http://benchmark.ini.rub.de/?section=gtsrb). The objective is to produce a model that gives the highest possible accuracy on the test portion of this dataset. You can register for the competition using the [private link](https://www.kaggle.com/t/f198351e92ff46d5a839fd73d22e9cbc).

Skeleton code is provided in the colab below. This contains code for training a simple default model and evaluating it on the test set. The evaluation script produces a file `gtsrb_kaggle.csv` that lists the IDs of the test set images, along with their predicted label. This file should be uploaded to the Kaggle webpage, which will then produce a test accuracy score.

Your goal is to implement a new model architecture that improves upon the baseline performance. You are free to implement any approach covered in class or from research papers. This part will count for 50% of the overall grade for assignment 1. This Grading will depend on your Kaggle performance and rank, as well as novelty of the architecture.  

## Rules
---
You should make a copy of this Colab (`File->Save a copy in Drive`). Please start the assignment early and don’t be afraid to ask for help from either the TAs or myself. You are allowed to collaborate with other students in terms discussing ideas and possible solutions. However you code up the solution yourself, i.e. you must write your own code. Copying your friends code and just changing all the names of the variables is NOT ALLOWED! You are not allowed to use solutions from similar assignments in courses from other institutions, or those found elsewhere on the web.

Your solutions should be submitted via the Brightspace system. This should include a brief description (in the Colab) explaining the model architectures you explored, citing any relevant papers or techniques that you used. You should also include convergence plots of training accuracy vs epoch for relevant models.

## Important Details
---
• You are only allowed 8 submissions to the Kaggle evaluation server per day. This is to prevent over-fitting on the test dataset. So be sure to start the assignment early!

• You are NOT ALLOWED to use the test set labels during training in any way. Doing so will be regarded as cheating and penalized accordingly.

• The evaluation metric is accuracy, i.e. the fraction of test set examples where the predicted label agrees with the ground truth label.

• You should be able to achieve a test accuracy of at least 95%

• **Extra important:** Please use your NYU NetID as your team name on Kaggle, so the TAs can figure out which user you are on the leaderboard.

# Dataset Preparation
___

1.  Download [`dataset.zip`](https://cs.nyu.edu/~fergus/teaching/vision/dataset.zip) from the course website to your local machine.
2.  Unzip the file. You should see a `dataset` directory with three subfolders: `training`, `validation`, and `testing`.
3.  Go to Google Drive (on your NYU account) and make a new directory (say `cv_kaggle_assignment`).
4.  Upload each of the three subfolders to it.
5.  Run the code block below. It will ask for permission to mount your Google Drive (NYU account) so this colab can access it. Paste the authorization code into the box as requested.


In [1]:
# Load the Drive helper and mount
# from google.colab import drive
# drive.mount('/content/drive', force_remount=True)
%cd  'cv_kaggle_assignment/'

/scratch/tl2546/DL_HW/cv_kaggle_assignment


# Dataloader

In [8]:
import torch
from torch.utils.data import Dataset
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torchvision import datasets, transforms
import torchvision
import numpy as np 

batch_size = 32
momentum = 0.9
lr = 0.01
epochs = 5
log_interval = 100

normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406],
                                 std=[0.229, 0.224, 0.225])
train_transforms = transforms.Compose([
                                    transforms.Resize((32, 32)),
                                    # transforms.ColorJitter(hue=.01, saturation=.01, contrast=.01),
                                    transforms.RandomRotation(10, interpolation=transforms.InterpolationMode.BILINEAR),
                                    transforms.GaussianBlur(3, sigma=(0.1, 0.5)),  # Smaller kernel for blur
                                    normalize
                                ])

class MyDataset(Dataset):

    def __init__(self, X_path="X.pt", y_path="y.pt", transform = train_transforms):

        self.X = torch.load(X_path).squeeze(1)
        self.y = torch.load(y_path).squeeze(1)
        self.transform = transform

    def __len__(self):
        return self.X.size(0)

    def __getitem__(self, idx):
        if self.transform:
            return self.transform(self.X[idx]), self.y[idx]
        return self.X[idx], self.y[idx]

train_dataset = MyDataset(X_path="train/X.pt", y_path="train/y.pt", transform = None)
aug_dataset = MyDataset(X_path="train/X.pt", y_path="train/y.pt", transform = train_transforms)
aug_dataset = torch.utils.data.Subset(aug_dataset, indices= np.random.permutation(len(train_dataset))[:len(train_dataset)//5])

comb_dataset = torch.utils.data.ConcatDataset([train_dataset, aug_dataset])
val_dataset = MyDataset(X_path="validation/X.pt", y_path="validation/y.pt")

train_loader = torch.utils.data.DataLoader(
    train_dataset, batch_size=batch_size, shuffle=True, num_workers=1)

comb_loader = torch.utils.data.DataLoader(
    comb_dataset, batch_size=batch_size, shuffle=True, num_workers=1)

val_loader = torch.utils.data.DataLoader(
    val_dataset, batch_size=batch_size, shuffle=True, num_workers=1)


In [3]:
import matplotlib.pyplot as plt
import numpy as np 

num_classes = 43

def imshow(img):
  img = img / 2 + 0.5     # unnormalize
  # npimg = img.numpy()
  img = torch.clamp(img, 0, 1)
  fig, ax = plt.subplots()
  ax.imshow(np.transpose(img, (1, 2, 0)), )
  ax.set_facecolor('white')  # Set the background color of the figure to white
  ax.axis('off')
  plt.show()

class_examples = {i: [] for i in range(num_classes)}

for img, label in comb_dataset:
    if len(class_examples[label.item()]) < 10:
        class_examples[label.item()].append(img)
    if all(len(class_examples[i]) == 10 for i in class_examples):
        break

In [None]:
for i in class_examples.keys():
    # print(f'Class: {i}')
    imshow(torchvision.utils.make_grid(class_examples[i],))

In [4]:
import torch
import torch.nn as nn
import torch.nn.functional as F

nclasses = 43 # GTSRB has 43 classes

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(3, 10, kernel_size=5)
        self.conv2 = nn.Conv2d(10, 20, kernel_size=5)
        self.conv2_drop = nn.Dropout2d()
        self.fc1 = nn.Linear(500, 50)
        self.fc2 = nn.Linear(50, nclasses)

    def forward(self, x):
        x = F.relu(F.max_pool2d(self.conv1(x), 2))
        x = F.relu(F.max_pool2d(self.conv2_drop(self.conv2(x)), 2))
        x = x.view(-1, 500)
        x = F.relu(self.fc1(x))
        x = F.dropout(x, training=self.training)
        x = self.fc2(x)
        return F.log_softmax(x,dim=1)


In [17]:
import torch
import torch.nn as nn
import torch.nn.functional as F

nclasses = 43 # GTSRB has 43 classes

# class Net(nn.Module):
#     def __init__(self):
#         super(Net, self).__init__()
#         self.conv1 = nn.Conv2d(3, 32, kernel_size=5, padding = 2)
#         self.conv1_skip = nn.Conv2d(3, 32, kernel_size = 1)
#         self.bn1 = nn.BatchNorm2d(32)
#         self.conv2 = nn.Conv2d(32, 64, kernel_size=5, padding = 2)
#         self.conv2_skip = nn.Conv2d(32, 64, kernel_size = 1)
#         self.bn2 = nn.BatchNorm2d(64)
#         self.conv2_drop = nn.Dropout2d(0.2)
#         # self.conv3 = nn.Conv2d(128, 256, kernel_size=3)
#         # self.bn3 = nn.BatchNorm2d(256)

#         self.fc1 = nn.Linear(64*64, 80)
#         self.fc2 = nn.Linear(80, nclasses)


#     def forward(self, x):
#         out = F.gelu(self.bn1(self.conv1(x)))
#         out = F.max_pool2d(F.gelu(out.detach() + self.conv1_skip(x)), 2)
#         x = F.gelu(out)
                
#         out = F.gelu(self.bn2(self.conv2(x)))
#         out = F.max_pool2d(F.gelu(out.detach() + self.conv2_skip(x)), 2)
#         x = F.gelu(out)
#         x = x.view(-1, 64*64)
        
#         x = F.gelu(self.fc1(x))
#         x = F.dropout(x, training=self.training)
#         x = self.fc2(x)
#         return F.log_softmax(x, dim=1)

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(3, 32, kernel_size=3)
        self.bn1 = nn.BatchNorm2d(32)
        self.conv2 = nn.Conv2d(32, 64, kernel_size=4)
        self.bn2 = nn.BatchNorm2d(64)
        self.conv2_drop = nn.Dropout2d(0.2)
        
        self.conv3 = nn.Conv2d(64, 128, kernel_size=3)
        self.bn3 = nn.BatchNorm2d(128)
        self.conv3_drop = nn.Dropout2d(0.2)
        
        self.fc1 = nn.Linear(128*4, 200)
        self.fc2 = nn.Linear(200, nclasses)

    def forward(self, x):
        x = F.gelu(F.max_pool2d(F.gelu(self.bn1(self.conv1(x))), 2))
        x = F.gelu(F.max_pool2d(self.conv2_drop(F.gelu(self.bn2(self.conv2(x)))), 2))
        x = F.gelu(F.max_pool2d(self.conv3_drop(F.gelu(self.bn3(self.conv3(x)))), 2))
        
        x = x.view(-1, 128*4)
        x = F.gelu(self.fc1(x))
        x = F.dropout(x, training=self.training)
        x = self.fc2(x)
        return F.log_softmax(x,dim=1)

# Training

In [None]:
model = Net()

optimizer = optim.SGD(model.parameters(), lr=lr, momentum=momentum)
lr_scheduler = optim.lr_scheduler.ReduceLROnPlateau(optimizer, mode = 'min', patience=3)

def train(epoch):
    model.train()
    for batch_idx, (data, target) in enumerate(comb_loader):
        optimizer.zero_grad()
        output = model(data)
        loss = F.nll_loss(output, target)
        loss.backward()
        optimizer.step()
        if batch_idx % log_interval == 0:
            print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(
                epoch, batch_idx * len(data), len(comb_loader.dataset),
                100. * batch_idx / len(comb_loader), loss.item()))

def validation():
    model.eval()
    validation_loss = 0
    correct = 0
    for data, target in val_loader:
        output = model(data)
        validation_loss += F.nll_loss(output, target, reduction="sum").item() # sum up batch loss
        pred = output.data.max(1, keepdim=True)[1] # get the index of the max log-probability
        correct += pred.eq(target.data.view_as(pred)).cpu().sum()

    validation_loss /= len(val_loader.dataset)
    print('\nValidation set: Average loss: {:.4f}, Accuracy: {}/{} ({:.0f}%)\n'.format(
        validation_loss, correct, len(val_loader.dataset),
        100. * correct / len(val_loader.dataset)))
    acc = correct/len(val_loader.dataset)
    return validation_loss, acc


for epoch in range(1, epochs + 1):
    train(epoch)
    validation_loss, val_acc = validation()
    lr_scheduler.step(validation_loss)
    model_file = 'model_' + str(epoch) + '.pth'
    torch.save(model.state_dict(), model_file)
    print('\nSaved model to ' + model_file + '.')


Validation set: Average loss: 0.4357, Accuracy: 3286/3870 (85%)


Saved model to model_1.pth.

Validation set: Average loss: 0.3453, Accuracy: 3520/3870 (91%)


Saved model to model_2.pth.

Validation set: Average loss: 0.4767, Accuracy: 3443/3870 (89%)


Saved model to model_3.pth.

Validation set: Average loss: 0.2221, Accuracy: 3671/3870 (95%)


Saved model to model_4.pth.


# Evaluate and Submit to Kaggle



In [57]:
import pickle
import pandas as pd

outfile = 'gtsrb_kaggle.csv'

output_file = open(outfile, "w")
dataframe_dict = {"Filename" : [], "ClassId": []}

test_data = torch.load('testing/test.pt')
file_ids = pickle.load(open('testing/file_ids.pkl', 'rb'))
model.eval() # Don't forget to put your model on eval mode !

for i, data in enumerate(test_data):
    data = data.unsqueeze(0)

    output = model(data)
    pred = output.data.max(1, keepdim=True)[1].item()
    file_id = file_ids[i][0:5]
    dataframe_dict['Filename'].append(file_id)
    dataframe_dict['ClassId'].append(pred)

df = pd.DataFrame(data=dataframe_dict)
df.to_csv(outfile, index=False)
print("Written to csv file {}".format(outfile))

Written to csv file gtsrb_kaggle.csv


# Submitting to Kaggle

Now download the CSV file `grtsrb_kaggle.csv` from your Google drive and then submit it to Kaggle to check the performance of your model.

**Extra important:** Please use your NYU NetID as your team name on Kaggle, or your submissions will not be evaluated.  
You can rename your team easily from the Team tab: https://www.kaggle.com/competitions/nyu-computer-vision-csci-ga2271-2022/team.