In [1]:
%load_ext autoreload
%autoreload 2

# Assignment 2: Designing an Architecture for Recognizing UVA Historical Landmarks
![UVA Grounds](https://sworld.co.uk/img/img/131/photoAlbum/5284/originals/0.jpg)

The UVA Grounds is known for its Jeffersonian architecture and place in U.S. history as a model for college and university campuses throughout the country. Throughout its history, the University of Virginia has won praises for its unique Jeffersonian architecture.

In this assignment, you will attempt the build an image recognition system to classify different buildlings/landmarks on Grounds. You will earn 100 points for this assignment if you successfully transfer at least 3 existing architectures, plus 10 bonus points if your classifier performance exceeds 94% accuracy.

To make it easier for you, some codes have been provided to help you process the data, you may modify it to fit your needs. You must submit the .ipynb file via UVA Canvas with the following format: yourcomputingID_assignment_2.ipynb

Best of luck, and have fun!

## TA's notes
Requirements:
- You must be able to implement a complete training/validation loop with a final test. The basic logic is similar to the FashionMNIST in previous tutorials.
- You must use 3 models. You could create/train all models from scratch or you can use transfer learning (for example, torchvision.models.resnet18) for one, two, or all of them. You may find CNN-specific architectures (e.g., LeNet, AlexNet, ResNet variants, GoogLeNet, VGG, etc) will be helpful. The most widely used CNN architectures include VGG-16 and ResNet-FPN-X101 (but they may be too big for Google Colab GPUs, so try small variants if that's the case). You could also try state-of-the-art Vision Transformers (ViT) but it's technically not a CNN so the rest of your models must both be CNN. ViTs might also be too big for the Google Colab GPUs. The above-mentioned architectures all have pretrained versions in [torchvision](https://pytorch.org/vision/stable/models.html).
- You can use the Sequential model creation instead of the class API, but that can make adding/changing various modules more difficult.
- Please split your own validation set. We've done this in previous tutorials also.
- Please do NOT change the random `SEED` or the test dataset/dataloader so I can verify your performance.
- Please also plot your training and loss curves. This way you and I will both understand the learning process and identify any problem.
- Please make your 3 models sufficiently different. For example, it's not enough to add one layer and call that a different model.

# Load Packages

In [2]:
import sys
import sklearn
import os
import numpy as np
import matplotlib as mpl
import matplotlib.pyplot as plt
from functools import partial
import PIL
import PIL.Image
import torch
import torchvision.models as models
import torch.nn as nn
from torch import optim

# Random Seed for Reproduction

In [3]:
from torch import manual_seed as torch_manual_seed
import random
import numpy as np

from torch.cuda import max_memory_allocated, set_device, manual_seed_all
from torch.backends import cudnn

def setup_seed(seed):
    torch_manual_seed(seed)
    manual_seed_all(seed)
    np.random.seed(seed)
    random.seed(seed)
    cudnn.deterministic = True

SEED = 42
setup_seed(SEED)

# Import Dataset
The full dataset is huge (+37GB) with +13K images of 18 classes. So it will take a while to download, extract, and process. To save you time and effort, a subset of the data has been resized and compressed to only 379Mb and stored in a Firebase server. This dataset will be the one you will benchmark for your grade. If you are up for a challenge (and perhaps bonus points), contact the instructor for the full dataset!

In [4]:
# Download dataset from FirebaseStorage
#!curl -L "https://firebasestorage.googleapis.com/v0/b/uva-landmark-images.appspot.com/o/dataset.zip?alt=media&token=e1403951-30d6-42b8-ba4e-394af1a2ddb7" -o "dataset.zip"

In [5]:
# Extract content
#!unzip -q "dataset.zip"

In [6]:
from sklearn.datasets import load_files

data_dir = "dataset/"
batch_size = 32
# IMPORTANT: Depends on what pre-trained model you choose, you will need to change these dimensions accordingly
img_height = 150
img_width = 150


In [7]:
from torch.utils.data import DataLoader, random_split
from torch import Generator
from torchvision.transforms import ToTensor
from torchvision.datasets import ImageFolder


TEST_RATIO = 0.2
BATCH_SIZE = 10

# Download and load the training data
dataset_all = ImageFolder(
    data_dir,
    transform=ToTensor(),
)

size_all = len(dataset_all)
print(f'Before splitting the full dataset into train and test: len(dataset_all)={size_all}')


size_test = int(size_all * TEST_RATIO)
size_train = size_all - size_test

dataset_train, dataset_test = random_split(dataset_all, [size_train, size_test], generator=Generator().manual_seed(SEED))
print(f'After splitting the full dataset into train and test: len(dataset_train)={len(dataset_train)}. len(dataset_test)={len(dataset_test)}')

VAL_RATIO = 0.2
size_val = int(size_train * VAL_RATIO)
size_train = size_train - size_val

dataset_train, dataset_val = random_split(dataset_train, [size_train, size_val], generator=Generator().manual_seed(SEED))
print(f'After splitting the full dataset into train and val: len(dataset_train)={len(dataset_train)}. len(dataset_val)={len(dataset_val)}')

# NOTE that you must not use the test dataset for model selection


Before splitting the full dataset into train and test: len(dataset_all)=14286
After splitting the full dataset into train and test: len(dataset_train)=11429. len(dataset_test)=2857
After splitting the full dataset into train and val: len(dataset_train)=9144. len(dataset_val)=2285


In [8]:
loaders = {
    'train' : torch.utils.data.DataLoader(dataset_train, 
                                          batch_size = 10, 
                                          shuffle = True, 
                                          num_workers = 1),
    
    'test'  : torch.utils.data.DataLoader(dataset_test, 
                                          batch_size = 10, 
                                          shuffle = True, 
                                          num_workers = 1),

    'val'   : torch.utils.data.DataLoader(dataset_val,
                                          batch_size = 10,
                                          shuffle = True,
                                          num_workers = 1)
}

# It's your turn: Building a classifier for UVA Landmark Dataset
You may design your own architecture AND re-use any of the exising frameworks.

Best of luck!

In [9]:
from torch.nn import Module

num_classes = 18

In [10]:
# YOUR CODE STARTS HERE. Feel free to modify anything.

# Feel free to rename them!
class MyResNet(Module):
    def __init__(self):
        super(MyResNet, self).__init__()
        # Load pre-trained ResNet-18 model
        self.resnet18 = models.resnet18(weights = True)
        
        # Since you might want to fine-tune the model, you can freeze the parameters
        for param in self.resnet18.parameters():
            param.requires_grad = False

        # Modify the fully connected layer to match your task
        num_ftrs = self.resnet18.fc.in_features
        self.resnet18.fc = nn.Linear(num_ftrs, num_classes)

    def forward(self, X):
        return self.resnet18(X)

In [11]:
class MyVGG(Module):
    def __init__(self):
        super(MyVGG, self).__init__()
        # Load pre-trained VGG11 model
        self.vgg11 = models.vgg11(weights = True)
        
        # Since you might want to fine-tune the model, you can freeze the parameters
        for param in self.vgg11.parameters():
            param.requires_grad = False

        # Modify the fully connected layer to match your task
        num_ftrs = self.vgg11.classifier[-1].in_features
        self.vgg11.classifier[-1] = nn.Linear(num_ftrs, num_classes)


    def forward(self, X):
        return self.vgg11(X)

In [12]:
class MyCodedNet(Module):
    def __init__(self):
        super(MyCodedNet, self).__init__()
        self.conv1 = nn.Conv2d(1, 64, stride = (2, 2), kernel_size = 3, padding = 1)
        self.bn1 = nn.BatchNorm2d(64)
        self.conv2 = nn.Conv2d(64, 64, kernel_size = 3, padding = 1)
        self.bn2 = nn.BatchNorm2d(64)
        self.conv3 = nn.Conv2d(64, 32, kernel_size = 3, padding = 1)
        self.bn3 = nn.BatchNorm2d(32)
        self.conv4 = nn.Conv2d(32, 16, kernel_size = 3, padding = 1)
        self.bn4 = nn.BatchNorm2d(16)
        self.pool = nn.MaxPool2d(2, 2)
        self.fc1 = nn.Linear(16 * 6 * 9, 128)
        self.bn5 = nn.BatchNorm1d(128)
        self.fc2 = nn.Linear(128, 64)
        self.bn6 = nn.BatchNorm1d(64)
        self.fc3 = nn.Linear(64, num_classes)

    def forward(self, X):
        x = self.pool(torch.relu(self.bn1(self.conv1(X))))
        x = self.pool(torch.relu(self.bn2(self.conv2(X))))
        x = self.pool(torch.relu(self.bn3(self.conv3(X))))
        x = self.pool(torch.relu(self.bn4(self.conv4(X))))
        x = x.view(-1, 16 * 6 * 9)
        x = torch.relu(self.bn5(self.fc1(X)))
        x = torch.relu(self.bn6(self.fc2(X)))
        x = self.fc3(X)
        return X


### Implementing CNN 1

In [13]:
my_res_net = MyResNet()

from torch import optim
optimizer = optim.Adam(my_res_net.parameters(), lr = 0.01)  
loss_func = nn.CrossEntropyLoss()



In [14]:
num_epochs = 10
my_res_net.train()
for epoch in range(num_epochs):  # loop over the dataset multiple times

    for i, data in enumerate(loaders['train'], 0):
        # get the inputs; data is a list of [inputs, labels]
        inputs, labels = data

        # zero the parameter gradients
        optimizer.zero_grad()

        # forward + backward + optimize
        outputs = my_res_net(inputs)
        loss = loss_func(outputs, labels)
        loss.backward()
        optimizer.step()

        if i % 60 == 0:    # print every 60 mini-batches
            print ('Epoch [{}/{}], Loss: {:.4f}' 
                       .format(epoch + 1, num_epochs, loss.item()))

print('Finished Training')

: 