# Deep Learning for Computer Vision:  HW 5

## Computer Science: COMS W 4995 006

## Due: November 10, 2022

### Problem: Telling Cats from Dogs using ResNet18

1. Here you will build a classifier that can distinguish between pictures of dogs and cats. You will use a ConvNet (Resnet18) that was pre-trained ImageNet. Your task will be to re-architect the network to solve your problem. You are required to do this in Pytorch or Tensorflow. To do this you will:

    a) Make a training, validation, and test sets for your dataset by using images from the link below, with 10,000 images of cats and 10,000 images of dogs. Use 8,000 images of each category for traning, 1,000 of each category for validation, 1,000 images of each category for testing. You are to randomly shuffle the data and choose the splits yourself.  

    b) Take ResNet18 network architecture. See https://pytorch.org/vision/stable/models.html.
    
    c) Load in the pre-trained weights. See again https://pytorch.org/vision/stable/models.html. 
    
    d) Add a fully connected layer followed by a final sigmoid layer **to replace** the last fully connected layer and 1000 category softmax layer that was used when the network was trained on ImageNet.
    
    e) Freeze all layers except the last two that you added.
    
    f) Fine-tune the network on your cats vs. dogs image data.
    
    g) Evaluate the accuracy on the test set.
    
    h) Unfreeze all layers.
    
    i) Continue fine-tuning the network on your cats vs. dogs image data.
    
    j) Evaluate the accuracy on the test set.
    
    k) Use your validation set throughout to decide on when to stop training the network at various stages.
    
    l) Comment your code and make sure to include accuracy, a few sample mistakes, and anything else you would like to add.
    
    m) Experiment with what you keep and what you replace as part of your network surgery. Does the training work better if you do not remove the last fully connected layer?
    
    n) Try this using any other CNN network you like. Take whatever path you like to get to your final model. Evaluate its accuracy. Do not ask which network you should use.


2. (Extra Credit): Repeat the assignment but replace ResNet18 with a Vision Transformer (ViT). See https://huggingface.co/docs/transformers/model_doc/vit and https://github.com/NielsRogge/Transformers-Tutorials/tree/master/VisionTransformer. 


3. (Extra Credit): Repeat the assignment but replace ResNet18 with BEiT. See https://huggingface.co/docs/transformers/model_doc/beit.

Downloads: You can get your image data from:
https://www.kaggle.com/c/dogs-vs-cats/data. 




In [1]:
############################################## IMPORTS ##############################################
import os
import copy
import numpy as np

import torch
import torch.nn as nn
import torch.optim as optim
from torch.optim import lr_scheduler

import torchvision
from torchvision import datasets, transforms


In [2]:
######################################### DATA AQUISITION & TORCH CONFIG ##############################################
mean = np.array([0.5, 0.5, 0.5])
std = np.array([0.25, 0.25, 0.25])

dataPreprocessingTransformations = {
    'train': transforms.Compose([
        transforms.RandomResizedCrop(224),
        transforms.RandomHorizontalFlip(),
        transforms.ToTensor(),
        transforms.Normalize(mean, std)
    ]),
    'valid': transforms.Compose([
        transforms.Resize(256),
        transforms.CenterCrop(224),
        transforms.ToTensor(),
        transforms.Normalize(mean, std)
    ]),
    'test': transforms.Compose([
        transforms.Resize(256),
        transforms.CenterCrop(224),
        transforms.ToTensor(),
        transforms.Normalize(mean, std)
    ]),
}

data_dir = '/Users/blake/Documents/COMS 4995 - DL CV/HW5/dogs-vs-cats/'
imgDatasets = {i: datasets.ImageFolder(os.path.join(data_dir, i),dataPreprocessingTransformations[i])
                  for i in ['train','valid','test']}

dataloaders = {i: torch.utils.data.DataLoader(imgDatasets[i], batch_size=64,shuffle=True, num_workers=0) for i in ['train','valid','test']}

partitionLens = {i: len(imgDatasets[i]) for i in ['train', 'valid','test']} # data is already split up into correct 8000-1000-1000 partitions
labelStrings = imgDatasets['train'].classes

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")


In [3]:
############################################## FUNCTION DEFINITION ##############################################

def train(model, loss, optimizer, scheduler, numEpochs):
    savedModels = copy.deepcopy(model.state_dict())
    bestAccuracy = 0

    for epoch in range(numEpochs):
        print('Epoch {}/{}'.format(epoch+1, numEpochs))
        print('-' * 10)

        # Each epoch has a training and validation phase

        for phase in ['train', 'valid']:

            if phase == 'train':
                model.train()  # Set model to training mode
            else:
                model.eval()   # Set model to evaluate mode

            numCorrect = 0

            # Iterate over data.
            for inputs, labels in dataloaders[phase]:

                # Reduce overhead by sending data to GPU
                inputs = inputs.to(device)
                labels = labels.to(device)

                # Forward Prop
                with torch.set_grad_enabled(phase == 'train'):
                    outputs = model(inputs)
                    _, preds = torch.max(outputs, 1)
                    curLoss = loss(outputs, labels)

                    # Back Prop (if needed/training)
                    if phase == 'train':
                        optimizer.zero_grad()
                        curLoss.backward()
                        optimizer.step()

                # Keep Track of Accuracy
                numCorrect += torch.sum(preds == labels.data)

            # Step the decay scheduler forward if we're training for the epoch
            if phase == 'train':
                scheduler.step()

            # Obtain the net accuracy at this epoch
            netAccuracy = float(numCorrect) / partitionLens[phase]

            print('{} accuracy: {:.2f}%'.format(phase, 100*netAccuracy))

            # Deep copy model if it has the best accuracy
            if phase == 'valid' and netAccuracy > bestAccuracy:
                savedModels = copy.deepcopy(model.state_dict())
                bestAccuracy = netAccuracy


    print('Best validation accurracy: {:2f}%'.format(100*bestAccuracy))

    # Return best model
    model.load_state_dict(savedModels)
    return model

In [4]:
def test(model):
    model.eval()   # Evaluation mode

    numCorrect = 0

    # Iterate over data.
    for inputs, labels in dataloaders["test"]:

        # Reduce overhead by sending data to GPU
        inputs = inputs.to(device)
        labels = labels.to(device)

        # Forward prop
        torch.set_grad_enabled(False)
        outputs = model(inputs)
        _, preds = torch.max(outputs, 1)


        # Keep track of number correct
        numCorrect += torch.sum(preds == labels.data)


    netAccuracy = float(numCorrect) / partitionLens["test"]

    print('Testing Accuracy: {:.2f}%'.format(100*netAccuracy))


In [5]:
########################################## DOMAIN TRANSFERED NETWORK ##############################################
## Train the last layer only

# Obtain model
newModel = torchvision.models.resnet18(pretrained=True)

# Freeze all layers
for param in newModel.parameters():
    param.requires_grad = False

# Replace last layer with new, unfrozen layer
oldFCInputCount = newModel.fc.in_features
totalClassCount = 2
newModel.fc = nn.Linear(oldFCInputCount, totalClassCount)


# Send to GPU to reduce overhead
newModel = newModel.to(device)

# Define sigmoid for loss
loss = nn.CrossEntropyLoss()

# Define optimizer for back propogation
optimizer_conv = optim.SGD(newModel.fc.parameters(), lr=1e-3, momentum=0.8)

# Use decaying scheduler
exponentialDecayScheduler = lr_scheduler.StepLR(optimizer_conv, step_size=8, gamma=1e-1)

# Train last layer
newModel = train(newModel, loss, optimizer_conv, exponentialDecayScheduler, numEpochs=2)

#########################################################################################################
## Train all layers

# Unfreeze all layers
for param in newModel.parameters():
    param.requires_grad = True

# Send to GPU to reduce overhead
newModel = newModel.to(device)

# Define sigmoid for loss
loss = nn.CrossEntropyLoss()

# Define optimizer for back propogation
optimizer_conv = optim.SGD(newModel.fc.parameters(), lr=1e-5, momentum=0.8)

# Use decaying scheduler
exponentialDecayScheduler = lr_scheduler.StepLR(optimizer_conv, step_size=8, gamma=1e-2)

# Train all layers
newModel = train(newModel, loss, optimizer_conv,exponentialDecayScheduler, numEpochs=2)

#########################################################################################################
# Test model on test data
test(newModel)


# The model is fairly accurate; however, more epoch cycles/training is probably required to increase the accuracy
# further. The sample mistakes are not necessarily indistinguishable and can be solved with increasing the train time.
# I currently am not increasing the train time (number of epochs) heavily due to my own computational limits.
# I believe the network isn't fine-tuned enough on the dog/cats and is still closer to ResNet18 which allows for more
# error room due to the 5 top class limit.



Epoch 1/2
----------
train accuracy: 87.04%
valid accuracy: 96.70%
Epoch 2/2
----------
train accuracy: 92.88%
valid accuracy: 97.10%
Best validation accurracy: 97.102897%
Epoch 1/2
----------
train accuracy: 93.44%
valid accuracy: 97.10%
Epoch 2/2
----------
train accuracy: 93.01%
valid accuracy: 97.00%
Best validation accurracy: 97.102897%
Testing Accuracy: 97.42%


In [7]:
########################## DOMAIN TRANSFERED NETWORK + MAX POOLING REPLACEMENT ##############################
## Train the last layer only

# Obtain model
newModel = torchvision.models.resnet18(pretrained=True)

# Freeze all layers
for param in newModel.parameters():
    param.requires_grad = False

# Extra Replacement of average pooling layer (more averaging)
newModel.avgpool = nn.AdaptiveAvgPool2d((5,5))

# Replace last layer with new, unfrozen layer
oldFCInputCount = newModel.fc.in_features * 25 # multiplied by 25 due to max pooling
totalClassCount = 2
newModel.fc = nn.Linear(oldFCInputCount, totalClassCount)


# Send to GPU to reduce overhead
newModel = newModel.to(device)

# Define sigmoid for loss
loss = nn.CrossEntropyLoss()

# Define optimizer for back propogation
optimizer_conv = optim.SGD(newModel.fc.parameters(), lr=1e-3, momentum=0.8)

# Use decaying scheduler
exponentialDecayScheduler = lr_scheduler.StepLR(optimizer_conv, step_size=8, gamma=1e-1)

# Train last layer
newModel = train(newModel, loss, optimizer_conv, exponentialDecayScheduler, numEpochs=1)

Epoch 1/1
----------
train accuracy: 90.25%
valid accuracy: 89.21%
Best validation accurracy: 89.210789%


In [8]:
#########################################################################################################
## Train all layers

# Unfreeze all layers
for param in newModel.parameters():
    param.requires_grad = True

# Send to GPU to reduce overhead
newModel = newModel.to(device)

# Define sigmoid for loss
loss = nn.CrossEntropyLoss()

# Define optimizer for back propogation
optimizer_conv = optim.SGD(newModel.fc.parameters(), lr=1e-5, momentum=0.8)

# Use decaying scheduler
exponentialDecayScheduler = lr_scheduler.StepLR(optimizer_conv, step_size=8, gamma=1e-2)

# Train all layers
newModel = train(newModel, loss, optimizer_conv,exponentialDecayScheduler, numEpochs=1)

Epoch 1/1
----------
train accuracy: 91.19%
valid accuracy: 97.60%
Best validation accurracy: 97.602398%


In [9]:
#########################################################################################################
# Test model on test data
test(newModel)

Testing Accuracy: 97.39%


In [15]:
########################## DOMAIN TRANSFERED NETWORK OF RESNET34 ##############################
## Train the last layer only

# Obtain model
newModel = torchvision.models.resnet34(pretrained=True)

# Freeze all layers
for param in newModel.parameters():
    param.requires_grad = False

# Replace last layer with new, unfrozen layer
oldFCInputCount = newModel.fc.in_features
totalClassCount = 2
newModel.fc = nn.Linear(oldFCInputCount, totalClassCount)


# Send to GPU to reduce overhead
newModel = newModel.to(device)

# Define sigmoid for loss
loss = nn.CrossEntropyLoss()

# Define optimizer for back propogation
optimizer_conv = optim.SGD(newModel.fc.parameters(), lr=1e-3, momentum=0.8)

# Use decaying scheduler
exponentialDecayScheduler = lr_scheduler.StepLR(optimizer_conv, step_size=8, gamma=1e-1)

# Train last layer
newModel = train(newModel, loss, optimizer_conv, exponentialDecayScheduler, numEpochs=1)


Epoch 1/1
----------
train accuracy: 87.49%
valid accuracy: 97.90%
Best validation accurracy: 97.902098%


In [16]:
#########################################################################################################
## Train all layers

# Unfreeze all layers
for param in newModel.parameters():
    param.requires_grad = True

# Send to GPU to reduce overhead
newModel = newModel.to(device)

# Define sigmoid for loss
loss = nn.CrossEntropyLoss()

# Define optimizer for back propogation
optimizer_conv = optim.SGD(newModel.fc.parameters(), lr=1e-5, momentum=0.8)

# Use decaying scheduler
exponentialDecayScheduler = lr_scheduler.StepLR(optimizer_conv, step_size=8, gamma=1e-2)

# Train all layers
newModel = train(newModel, loss, optimizer_conv,exponentialDecayScheduler, numEpochs=1)

Epoch 1/1
----------
train accuracy: 92.92%
valid accuracy: 97.70%
Best validation accurracy: 97.702298%


In [17]:
#########################################################################################################
# Test model on test data
test(newModel)


Testing Accuracy: 97.44%


In [18]:
# Note: due to computational limitations, I only used one epoch for the last two parts and two epochs for the first
# part. This entire segment of code took two hours for my computer to run so I didn't want to push its limits too far.
# I can guarantee the code is fully capable of taking in more epochs and runs efficiently with good hardware!