# Computer Vision Homework 3: Big vs Small Models

## Brief

Due date: Nov 13, 2023

Required files: `homework-3.ipynb`, `report.pdf`

To download the jupyter notebook from colab, you can refer to the colab tutorial we gave.


## Codes for Problem 1 and Problem 2

### Import Packages

In [32]:
import glob
import os
import random

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

import torch
import torch.nn as nn
import torch.optim as optim

from PIL import Image
from torch.utils.data import DataLoader, Dataset, RandomSampler
from torchvision import transforms, models, datasets
from tqdm import tqdm

%matplotlib inline

### Check GPU Environment

In [33]:
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f'Using {device} device')

Using cuda device


In [34]:
! nvidia-smi -L

GPU 0: Tesla T4 (UUID: GPU-5c20ef1f-44cc-ffcf-fc6f-5a67e132a31f)


### Set the Seed to Reproduce the Result

In [35]:
def set_all_seed(seed):
    np.random.seed(seed)
    random.seed(seed)
    torch.manual_seed(seed)
set_all_seed(123)

### Create Dataset and Dataloader

In [36]:
batch_size = 256

mean=(0.485, 0.456, 0.406)
std=(0.229, 0.224, 0.225)
train_transform = transforms.Compose([
    transforms.RandomCrop(32, padding=4),
    transforms.RandomHorizontalFlip(),
    transforms.ToTensor(),
    transforms.Normalize(mean, std),
])
test_transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize(mean, std),
])

train_dataset = datasets.CIFAR10(root='data', train=True, download=True, transform=train_transform)
valid_dataset = datasets.CIFAR10(root='data', train=False, download=True, transform=test_transform)

train_dataloader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True, pin_memory=True)
valid_dataloader = DataLoader(valid_dataset, batch_size=batch_size, shuffle=False, pin_memory=True)

sixteenth_train_sampler = RandomSampler(train_dataset, num_samples=len(train_dataset)//16)
half_train_sampler = RandomSampler(train_dataset, num_samples=len(train_dataset)//2)

sixteenth_train_dataloader = DataLoader(train_dataset, batch_size=batch_size, sampler=sixteenth_train_sampler)
half_train_dataloader = DataLoader(train_dataset, batch_size=batch_size, sampler=half_train_sampler)

Files already downloaded and verified
Files already downloaded and verified


### Load Models

In [37]:
# HINT: Remember to change the model to 'resnet50' and the weights to weights="IMAGENET1K_V1" when needed.
model = torch.hub.load('pytorch/vision:v0.10.0', 'densenet201', weights=None)

# Background: The original resnet18 is designed for ImageNet dataset to predict 1000 classes.
# TODO: Change the output of the model to 10 class.
model.fc=nn.Linear(in_features=1920,out_features=10,bias=True)
model=model.to(device)

Using cache found in /root/.cache/torch/hub/pytorch_vision_v0.10.0


### Training and Testing Models

In [38]:
# TODO: Fill in the code cell according to the pytorch tutorial we gave.
loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=1e-3)
def train(dataloader, model, loss_fn, optimizer):
    num_batches = len(dataloader)
    size = len(dataloader.dataset)
    epoch_loss = 0
    correct = 0

    model.train()

    for X, y in tqdm(dataloader):
        X, y = X.to(device), y.to(device)

        # Compute prediction error
        pred = model(X)
        loss = loss_fn(pred, y)

        # Backpropagation
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        epoch_loss += loss.item()
        pred = pred.argmax(dim=1, keepdim=True)
        correct += pred.eq(y.view_as(pred)).sum().item()

    avg_epoch_loss = epoch_loss / num_batches
    avg_acc = correct / size

    return avg_epoch_loss, avg_acc
def test(dataloader, model, loss_fn):
    num_batches = len(dataloader)
    size = len(dataloader.dataset)
    epoch_loss = 0
    correct = 0

    model.eval()

    with torch.no_grad():
        for X, y in tqdm(dataloader):
            X, y = X.to(device), y.to(device)

            pred = model(X)

            epoch_loss += loss_fn(pred, y).item()
            pred = pred.argmax(dim=1, keepdim=True)
            correct += pred.eq(y.view_as(pred)).sum().item()

    avg_epoch_loss = epoch_loss / num_batches
    avg_acc = correct / size

    return avg_epoch_loss, avg_acc


## Codes for Problem 3

In [39]:
# TODO: Try to achieve the best performance given all training data using whatever model and training strategy.
# (New) (You cannot use the model that was pretrained on CIFAR10)
epochs_d = 100
densenet_acc_plot = []
densenet_test_acc_plot = []
densenet_loss_plot = []
densenet_test_loss_plot = []
# HINT: Remember to change the model to 'resnet50' and the weights to weights="IMAGENET1K_V1" when needed.
model_densenet = torch.hub.load('pytorch/vision:v0.10.0', 'densenet201', weights=None)
# Background: The original resnet18 is designed for ImageNet dataset to predict 1000 classes.
# TODO: Change the output of the model to 10 class.
model_densenet.fc=nn.Linear(in_features=1920,out_features=10,bias=True)
model_densenet=model_densenet.to(device)
loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model_densenet.parameters(), lr=1e-3)
for epoch in range(epochs_d):
    train_loss, train_acc = train(train_dataloader, model_densenet, loss_fn, optimizer)
    test_loss, test_acc = test(valid_dataloader, model_densenet, loss_fn)
    print(f"Epoch {epoch + 1:2d}: Loss = {train_loss:.4f} Acc = {train_acc:.2f} Test_Loss = {test_loss:.4f} Test_Acc = {test_acc:.2f}")
    densenet_acc_plot.append(train_acc)
    densenet_test_acc_plot.append(test_acc)
    densenet_loss_plot.append(train_loss)
    densenet_test_loss_plot.append(test_loss)
print("Done!")
plt.figure(figsize=(10,5))
plt.title("Accuracy")
plt.plot(densenet_acc_plot,label="densenet201 train acc")
plt.plot(densenet_test_acc_plot,label="test acc")
plt.xlabel("epoch")
plt.ylabel("Accuracy")
plt.legend()
plt.show()
plt.figure(figsize=(10,5))
plt.title("Loss")
plt.plot(densenet_loss_plot,label="densenet201 train loss")
plt.plot(densenet_test_loss_plot,label="test loss")
plt.xlabel("epoch")
plt.ylabel("Loss")
plt.legend()
plt.show()

Using cache found in /root/.cache/torch/hub/pytorch_vision_v0.10.0
100%|██████████| 196/196 [00:59<00:00,  3.27it/s]
100%|██████████| 40/40 [00:08<00:00,  4.95it/s]


Epoch  1: Loss = 1.6303 Acc = 0.42 Test_Loss = 1.5806 Test_Acc = 0.46


100%|██████████| 196/196 [00:59<00:00,  3.32it/s]
100%|██████████| 40/40 [00:08<00:00,  4.88it/s]


Epoch  2: Loss = 1.2225 Acc = 0.56 Test_Loss = 1.2624 Test_Acc = 0.57


100%|██████████| 196/196 [00:59<00:00,  3.28it/s]
100%|██████████| 40/40 [00:07<00:00,  5.11it/s]


Epoch  3: Loss = 1.0367 Acc = 0.63 Test_Loss = 0.9990 Test_Acc = 0.65


100%|██████████| 196/196 [00:59<00:00,  3.29it/s]
100%|██████████| 40/40 [00:08<00:00,  4.97it/s]


Epoch  4: Loss = 0.9283 Acc = 0.67 Test_Loss = 0.9631 Test_Acc = 0.67


100%|██████████| 196/196 [00:59<00:00,  3.30it/s]
100%|██████████| 40/40 [00:07<00:00,  5.15it/s]


Epoch  5: Loss = 0.8447 Acc = 0.70 Test_Loss = 0.8328 Test_Acc = 0.71


100%|██████████| 196/196 [00:59<00:00,  3.30it/s]
100%|██████████| 40/40 [00:07<00:00,  5.01it/s]


Epoch  6: Loss = 0.7734 Acc = 0.73 Test_Loss = 0.7749 Test_Acc = 0.73


100%|██████████| 196/196 [00:59<00:00,  3.28it/s]
100%|██████████| 40/40 [00:07<00:00,  5.15it/s]


Epoch  7: Loss = 0.7101 Acc = 0.75 Test_Loss = 0.7342 Test_Acc = 0.74


100%|██████████| 196/196 [00:59<00:00,  3.31it/s]
100%|██████████| 40/40 [00:08<00:00,  4.97it/s]


Epoch  8: Loss = 0.6613 Acc = 0.77 Test_Loss = 0.7504 Test_Acc = 0.74


100%|██████████| 196/196 [00:59<00:00,  3.30it/s]
100%|██████████| 40/40 [00:07<00:00,  5.19it/s]


Epoch  9: Loss = 0.6322 Acc = 0.78 Test_Loss = 0.7296 Test_Acc = 0.75


100%|██████████| 196/196 [00:59<00:00,  3.30it/s]
100%|██████████| 40/40 [00:08<00:00,  4.97it/s]


Epoch 10: Loss = 0.5931 Acc = 0.79 Test_Loss = 0.6488 Test_Acc = 0.77


100%|██████████| 196/196 [01:00<00:00,  3.26it/s]
100%|██████████| 40/40 [00:07<00:00,  5.20it/s]


Epoch 11: Loss = 0.5658 Acc = 0.80 Test_Loss = 0.6283 Test_Acc = 0.78


100%|██████████| 196/196 [00:59<00:00,  3.31it/s]
100%|██████████| 40/40 [00:08<00:00,  4.99it/s]


Epoch 12: Loss = 0.5329 Acc = 0.81 Test_Loss = 0.6469 Test_Acc = 0.78


100%|██████████| 196/196 [00:59<00:00,  3.31it/s]
100%|██████████| 40/40 [00:07<00:00,  5.18it/s]


Epoch 13: Loss = 0.5077 Acc = 0.82 Test_Loss = 0.6274 Test_Acc = 0.79


100%|██████████| 196/196 [00:59<00:00,  3.29it/s]
100%|██████████| 40/40 [00:07<00:00,  5.00it/s]


Epoch 14: Loss = 0.4867 Acc = 0.83 Test_Loss = 0.6806 Test_Acc = 0.77


100%|██████████| 196/196 [00:59<00:00,  3.30it/s]
100%|██████████| 40/40 [00:07<00:00,  5.17it/s]


Epoch 15: Loss = 0.4765 Acc = 0.83 Test_Loss = 0.6179 Test_Acc = 0.79


100%|██████████| 196/196 [00:59<00:00,  3.30it/s]
100%|██████████| 40/40 [00:07<00:00,  5.00it/s]


Epoch 16: Loss = 0.4437 Acc = 0.84 Test_Loss = 0.5786 Test_Acc = 0.80


100%|██████████| 196/196 [00:59<00:00,  3.31it/s]
100%|██████████| 40/40 [00:07<00:00,  5.21it/s]


Epoch 17: Loss = 0.4228 Acc = 0.85 Test_Loss = 0.6392 Test_Acc = 0.79


100%|██████████| 196/196 [01:00<00:00,  3.25it/s]
100%|██████████| 40/40 [00:07<00:00,  5.01it/s]


Epoch 18: Loss = 0.4326 Acc = 0.85 Test_Loss = 0.6650 Test_Acc = 0.79


100%|██████████| 196/196 [00:58<00:00,  3.32it/s]
100%|██████████| 40/40 [00:07<00:00,  5.21it/s]


Epoch 19: Loss = 0.4034 Acc = 0.86 Test_Loss = 0.5438 Test_Acc = 0.82


 74%|███████▍  | 146/196 [00:44<00:15,  3.27it/s]


KeyboardInterrupt: ignored

## Problems

1. (30%) Finish the rest of the codes for Problem 1 and Problem 2 according to the hint. (2 code cells in total.)
2. Train small model (resnet18) and big model (resnet50) from scratch on `sixteenth_train_dataloader`, `half_train_dataloader`, and `train_dataloader` respectively.
3. (30%) Achieve the best performance given all training data using whatever model and training strategy.  
  (You cannot use the model that was pretrained on CIFAR10)



## Discussion

Write down your insights in the report. The file name should be report.pdf.
For the following discussion, please present the results graphically as shown in Fig. 1 and discuss them.

- (30%) The relationship between the accuracy, model size, and the training dataset size.  
    (Total 6 models. Small model trains on the sixteenth, half, and all data. Big model trains on the sixteenth, half, and all data. If the result is different from Fig.1, please explain the possible reasons.)
- (10%) What if we train the ResNet with ImageNet initialized weights (`weights="IMAGENET1K_V1"`).
Please explain why the relationship changed this way?

Hint: You can try different hyperparameters combinations when training the models.

## Credits

1. [CIFAR10](https://www.cs.toronto.edu/~kriz/cifar.html)