# Dealing with Overfitting

In this homework, you are going to address overfitting using various skills. The methods you can try are as follows:

1. Change the loss function.
2. Change the model architecture.
3. Add more data, such as augmentation or splitting the training data.
4. ...

Additionally, you will learn about logging and version control in this homework.

Feel free to ask your friends for assistance and send a message to the TA if needed.

In [1]:
# install requirements
# !wget -P data/cifar10_1 https://github.com/modestyachts/CIFAR-10.1/raw/master/datasets/cifar10.1_v6_labels.npy
# !wget -P data/cifar10_1 https://github.com/modestyachts/CIFAR-10.1/raw/master/datasets/cifar10.1_v6_data.npy

In [1]:
# import required libraries
import torch
import torchvision
import torchvision.transforms as transforms
import torch.optim as optim
from torch.utils.data import DataLoader
from pathlib import Path
import torch.nn as nn
import numpy as np
from pytorchcv.model_provider import get_model
from torchinfo import summary
from tqdm import tqdm

from  utils import Cifar10_1

In [2]:
# define some hyper parameters
DATA_PATH = './data'

# change these hyper parameter if needed
BATCH_SZIE = 32
EPOCH = 10

classes = ('plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck')

# check device (CPU, GPU)
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(device)

cuda


In [3]:
# TODO
# augment data here
# https://pytorch.org/vision/master/transforms.htmls
transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
    # add augmentation here...
])

In [4]:
# load dataset

# cifar10
trainset   = torchvision.datasets.CIFAR10(root='./data', train=True, download=True, transform=transform)
testset_1  = torchvision.datasets.CIFAR10(root='./data', train=False, download=True, transform=transform)
testset_2  = Cifar10_1(root='./data/cifar10_1/', transform=transform)
print(trainset.data.shape)
print(testset_1.data.shape)
print(testset_2.data.shape)

Files already downloaded and verified
Files already downloaded and verified
(50000, 32, 32, 3)
(10000, 32, 32, 3)
(2000, 32, 32, 3)


In [None]:
# TODO
# modify your training data here


In [None]:
# TODO
# split your training data if needed

In [9]:
# TODO
# select a model you want (small one or large one)
# https://github.com/osmr/imgclsmob/blob/master/pytorch/README.md
model_name = 'nin'
# model_name = 'densenet100_k24'
# model_name = 'preresnet56'
model = get_model(name = model_name + '_cifar10', pretrained=True)
model = model.to(device)

Downloading /home/yuzhong/.torch/models/nin_cifar10-0743-795b0824.pth.zip from https://github.com/osmr/imgclsmob/releases/download/v0.0.175/nin_cifar10-0743-795b0824.pth.zip...


In [10]:
summary(model, input_size=(1, 3, 32, 32), device=device)

Layer (type:depth-idx)                   Output Shape              Param #
CIFARNIN                                 [1, 10]                   --
├─Sequential: 1-1                        [1, 192, 8, 8]            --
│    └─Sequential: 2-1                   [1, 96, 32, 32]           --
│    │    └─NINConv: 3-1                 [1, 192, 32, 32]          14,592
│    │    └─NINConv: 3-2                 [1, 160, 32, 32]          30,880
│    │    └─NINConv: 3-3                 [1, 96, 32, 32]           15,456
│    └─Sequential: 2-2                   [1, 192, 16, 16]          --
│    │    └─MaxPool2d: 3-4               [1, 96, 16, 16]           --
│    │    └─Dropout: 3-5                 [1, 96, 16, 16]           --
│    │    └─NINConv: 3-6                 [1, 192, 16, 16]          460,992
│    │    └─NINConv: 3-7                 [1, 192, 16, 16]          37,056
│    │    └─NINConv: 3-8                 [1, 192, 16, 16]          37,056
│    └─Sequential: 2-3                   [1, 192, 8, 8]     

### Version Control


Before starting the training process, it is important to set up version control to track the changes and ensure reproducibility.

Here's what you need to do:
1. Give the experiment a name or assign it a unique identifier (UUID) to easily identify and reference it later.

2. Log all the hyperparameters used for the experiment.
    - This includes parameters like learning rate, batch size, optimizer choice, regularization strength, etc.
    - Make sure to record these hyperparameters before starting the training process.

3. Save the code used for the experiment.
    - This ensures that the exact code version used for training is preserved.
    - You can create a separate branch or tag in your version control system (e.g., Git) specifically for this experiment.


By following these steps, you establish a systematic approach to version control, allowing you to reproduce and compare results accurately.

In [None]:
# TODO
# version control

In [11]:
# dataloader
trainloader  = DataLoader(trainset, batch_size=BATCH_SZIE, shuffle=True, num_workers=2)
testloader_1 = DataLoader(testset_1, batch_size=BATCH_SZIE, shuffle=False, num_workers=2)
testloader_2 = DataLoader(testset_2, batch_size=BATCH_SZIE, shuffle=False, num_workers=2)


In [12]:
# TODO
# define loss function
# https://blog.csdn.net/weixin_36670529/article/details/105670337
criterion = nn.CrossEntropyLoss()

# or define your own loss function here
# https://discuss.pytorch.org/t/custom-loss-functions/29387
# https://rowantseng.medium.com/pytorch-%E8%87%AA%E5%AE%9A%E7%BE%A9%E6%90%8D%E5%A4%B1%E5%87%BD%E6%95%B8-custom-loss-c12e8741968b
# https://androidkt.com/how-to-add-l1-l2-regularization-in-pytorch-loss-function/
# 😂

# Do not modify this
optimizer = optim.SGD(model.parameters(), lr=1e-3, momentum=0.9)

In [13]:
# train
def train(e):
    model.train()
    num_data = 0
    correct = 0
    loss_all = 0

    for i, (x, y) in enumerate(tqdm(trainloader)):
        # get the inputs; data is a list of [inputs, labels]
        x, y = x.to(device), y.to(device)

        # zero the parameter gradients
        optimizer.zero_grad()

        # forward + backward + optimize
        outputs = model(x)

        # compute loss here
        loss = criterion(outputs, y)
        # maybe you need to add L2 loss here
        # https://androidkt.com/how-to-add-l1-l2-regularization-in-pytorch-loss-function/


        # back prop
        loss.backward()
        optimizer.step()

        # log
        num_data += y.size(0)
        loss_all += loss.item()
        pred = outputs.data.max(1)[1]
        correct += pred.eq(y.view(-1)).sum().item()

    print( f'epoch: [{e}], loss: {loss_all/len(trainloader):.4f}, acc: {correct/num_data:.4f}')

# start training
for e in range(EPOCH):
    train(e)


100%|██████████| 1563/1563 [00:12<00:00, 122.36it/s]


epoch: [0], loss: 0.1942, acc: 0.9395


100%|██████████| 1563/1563 [00:12<00:00, 123.95it/s]


epoch: [1], loss: 0.1668, acc: 0.9450


100%|██████████| 1563/1563 [00:12<00:00, 123.87it/s]


epoch: [2], loss: 0.1544, acc: 0.9496


100%|██████████| 1563/1563 [00:12<00:00, 121.89it/s]


epoch: [3], loss: 0.1472, acc: 0.9506


100%|██████████| 1563/1563 [00:12<00:00, 123.19it/s]


epoch: [4], loss: 0.1418, acc: 0.9528


100%|██████████| 1563/1563 [00:12<00:00, 123.70it/s]


epoch: [5], loss: 0.1345, acc: 0.9551


100%|██████████| 1563/1563 [00:12<00:00, 122.50it/s]


epoch: [6], loss: 0.1299, acc: 0.9572


100%|██████████| 1563/1563 [00:12<00:00, 121.98it/s]


epoch: [7], loss: 0.1238, acc: 0.9582


100%|██████████| 1563/1563 [00:12<00:00, 122.22it/s]


epoch: [8], loss: 0.1218, acc: 0.9598


100%|██████████| 1563/1563 [00:12<00:00, 122.13it/s]

epoch: [9], loss: 0.1182, acc: 0.9600





In [14]:
# evaluate
def test(model, test_loader, loss_fun, device):
    model.eval()
    test_loss = 0
    correct = 0
    targets = []

    for data, target in test_loader:
        data = data.to(device)
        target = target.to(device).long()

        targets.append(target.detach().cpu().numpy())

        output = model(data)

        test_loss += loss_fun(output, target).item()
        pred = output.data.max(1)[1]

        correct += pred.eq(target.view(-1)).sum().item()

    return test_loss/len(test_loader), correct /len(test_loader.dataset)



loss, acc = test(model, testloader_1, criterion, device)
print(f"on testset 1: loss: {loss:.4f}, acc: {acc:.4f}")

loss, acc = test(model, testloader_2, criterion, device)
print(f"on testset 2: loss: {loss:.4f}, acc: {acc:.4f}")


on testset 1: loss: 0.2312, acc: 0.9243
on testset 2: loss: 0.5226, acc: 0.8335


In [None]:
# record the results to a log file (e.g google sheet, )