# Deep Learning Mini-Project - Fall 2022
#### - by Eduardo Calzadilla
#### _New York University - CS-GY 6953 / ECE-GY 7123_



Starting with the resource found on [kuangliu's GitHub](https://github.com/kuangliu/pytorch-cifar). We tested architectures included as is with 2 training epochs and picked the two with the top test accuracy. For brevity, and computational constraints, we do not conduct all of the training on this notebook but leave the same commented for future recreation of similar accuracy scores. 

GoogleNet and MobileNetV2 performed the best in terms of accuracy over trainable params in this notebook's iteration, previous accuracy scores (according to the origin repo) and capacity for hyperparameter tuning - excluding other potential strategies such as data augmentation. These were then tested for 25 epochs each on a CPU and GPU runtime, given the Training and Test Accuracy Scores and the CPU and GPU training time in minutes; we selected the GoogleNet structure for the remainder of the study.

The structure was then altered to decrease the number of the parameters, as well as increase the testing accuracy. Each new change was ran for 5 epochs and the highest performing structure was kept. Other variables outside of the model's structure were then also experimented with until a high test accuracy was acheieved (>80%).

The final architecture is shown at the end of the notebook and the results of the tests are summarized in the figures on the Report.

In [1]:
%pip install torch
%pip install torchvision
%pip install python-math
%pip install torch-summary

Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.


In [2]:
# import system related libraries
from pathlib import Path
import sys
import os

# establish path
module_path = "gitFolder"
if module_path not in sys.path:
    sys.path.append(module_path)

os.getcwd()

'/home/studio-lab-user/gitFolder'

In [3]:
# importing relevant torch libraries
import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F
import torch.backends.cudnn as cudnn
from torchsummary import summary

# torchvision libraries
import torchvision
import torchvision.transforms as transforms

#data prep and math libraries
import argparse
import math
import time

# import local libraries
from models import * # models defined according to https://github.com/kuangliu/pytorch-cifar
from utils import progress_bar


In [4]:
# Data prep
print('==> Preparing data..')
transform_train = transforms.Compose([
    transforms.RandomCrop(32, padding=4),
    transforms.RandomHorizontalFlip(),
    transforms.ToTensor(),
    transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010)),
])

transform_test = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010)),
])

trainset = torchvision.datasets.CIFAR10(
    root='./data', train=True, download=True, transform=transform_train)
trainloader = torch.utils.data.DataLoader(
    trainset, batch_size=128, shuffle=True, num_workers=2)

testset = torchvision.datasets.CIFAR10(
    root='./data', train=False, download=True, transform=transform_test)
testloader = torch.utils.data.DataLoader(
    testset, batch_size=100, shuffle=False, num_workers=2)

classes = ('plane', 'car', 'bird', 'cat', 'deer',
           'dog', 'frog', 'horse', 'ship', 'truck')

==> Preparing data..
Files already downloaded and verified
Files already downloaded and verified


In [5]:
# Build model
device = 'cuda' if torch.cuda.is_available() else 'cpu'

# send to device, and match with parallel connection
def send_to_device(net):
    net = net.to(device)
    if device == 'cuda':
        net = torch.nn.DataParallel(net)
        cudnn.benchmark = True
        print('\t ==> Model sent to gpu...')
    else:
        print('\t ==> Model sent to cpu...')
        
    return net


print('==> Building model...')
# Res_net18 = ResNet18()
Google_net = GoogLeNet()
Google_net5 = GoogLeNet5()
# Dense_net = DenseNet121()
# Mobile_net = MobileNet()
# Mobile_netV2 = MobileNetV2()
# SEN_net18 = SENet18()
# simple_DLA = SimpleDLA()


# Res_net18 = send_to_device(Res_net18)
Google_net = send_to_device(Google_net)
Google_net5 = send_to_device(Google_net5)
# Dense_net = send_to_device(Dense_net)
# Mobile_net = send_to_device(Mobile_net)
# Mobile_netV2 = send_to_device(Mobile_netV2)
# SEN_net18 = send_to_device(SEN_net18)
# simple_DLA = send_to_device(simple_DLA)

==> Building model...
	 ==> Model sent to cpu...
	 ==> Model sent to cpu...


In [6]:
# number of parameters in selected model
print('Number of trainable parameters: ', sum(p.numel() for p in Google_net5.parameters()))

Number of trainable parameters:  3799530


In [16]:
summary(Google_net5, (3,32,32)) #expand below for the model architecture

Layer (type:depth-idx)                   Output Shape              Param #
├─Sequential: 1-1                        [-1, 96, 30, 30]          --
|    └─Conv2d: 2-1                       [-1, 96, 30, 30]          7,296
|    └─BatchNorm2d: 2-2                  [-1, 96, 30, 30]          192
|    └─ReLU: 2-3                         [-1, 96, 30, 30]          --
├─Inception: 1-2                         [-1, 128, 30, 30]         --
|    └─Sequential: 2-4                   [-1, 32, 30, 30]          --
|    |    └─Conv2d: 3-1                  [-1, 32, 30, 30]          3,104
|    |    └─BatchNorm2d: 3-2             [-1, 32, 30, 30]          64
|    |    └─ReLU: 3-3                    [-1, 32, 30, 30]          --
|    └─Sequential: 2-5                   [-1, 64, 30, 30]          --
|    |    └─Conv2d: 3-4                  [-1, 48, 30, 30]          4,656
|    |    └─BatchNorm2d: 3-5             [-1, 48, 30, 30]          96
|    |    └─ReLU: 3-6                    [-1, 48, 30, 30]          --
|    

Layer (type:depth-idx)                   Output Shape              Param #
├─Sequential: 1-1                        [-1, 96, 30, 30]          --
|    └─Conv2d: 2-1                       [-1, 96, 30, 30]          7,296
|    └─BatchNorm2d: 2-2                  [-1, 96, 30, 30]          192
|    └─ReLU: 2-3                         [-1, 96, 30, 30]          --
├─Inception: 1-2                         [-1, 128, 30, 30]         --
|    └─Sequential: 2-4                   [-1, 32, 30, 30]          --
|    |    └─Conv2d: 3-1                  [-1, 32, 30, 30]          3,104
|    |    └─BatchNorm2d: 3-2             [-1, 32, 30, 30]          64
|    |    └─ReLU: 3-3                    [-1, 32, 30, 30]          --
|    └─Sequential: 2-5                   [-1, 64, 30, 30]          --
|    |    └─Conv2d: 3-4                  [-1, 48, 30, 30]          4,656
|    |    └─BatchNorm2d: 3-5             [-1, 48, 30, 30]          96
|    |    └─ReLU: 3-6                    [-1, 48, 30, 30]          --
|    

In [17]:
best_acc = 0  # best test accuracy
start_epoch = 0  # start from epoch 0 or last checkpoint epoch

# Training and testing functions
def train(model, epoch):
    print('\nEpoch: %d' % epoch)
    model.train()
    train_loss = 0
    correct = 0
    total = 0
    for batch_idx, (inputs, targets) in enumerate(trainloader):
        inputs, targets = inputs.to(device), targets.to(device)
        optimizer.zero_grad()
        outputs = model(inputs)
        loss = criterion(outputs, targets)
        loss.backward()
        optimizer.step()

        train_loss += loss.item()
        _, predicted = outputs.max(1)
        total += targets.size(0)
        correct += predicted.eq(targets).sum().item()

        progress_bar(batch_idx, len(trainloader), 'Train Loss: %.3f | Train Acc: %.3f%% (%d/%d)'
                     % (train_loss/(batch_idx+1), 100.*correct/total, correct, total))


def test(model, epoch):
    global best_acc
    model.eval()
    test_loss = 0
    correct = 0
    total = 0
    with torch.no_grad():
        for batch_idx, (inputs, targets) in enumerate(testloader):
            inputs, targets = inputs.to(device), targets.to(device)
            outputs = model(inputs)
            loss = criterion(outputs, targets)

            test_loss += loss.item()
            _, predicted = outputs.max(1)
            total += targets.size(0)
            correct += predicted.eq(targets).sum().item()

            progress_bar(batch_idx, len(testloader), 'Test Loss: %.3f | Test Acc: %.3f%% (%d/%d)'
                         % (test_loss/(batch_idx+1), 100.*correct/total, correct, total))

    # Save checkpoint.
    acc = 100.*correct/total
    if acc > best_acc:
        print('Saving..')
        state = {
            'net': model.state_dict(),
            'acc': acc,
            'epoch': epoch,
        }
        if not os.path.isdir('checkpoint'):
            os.mkdir('checkpoint')
        torch.save(state, './checkpoint/ckpt.pth')
        best_acc = acc

In [22]:
# Conduct training and testing
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(Google_net5.parameters(), lr=0.05,
                      momentum=0.9, weight_decay=5e-5)
scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max=200)

for epoch in range(start_epoch, 25): #limit to 10 for highest efficiency
    begin = time.time()
    train(Google_net5, epoch)
    print('==> Time to complete training epoch: ', time.time() - begin)
    
    test_begin = time.time()
    test(Google_net5, epoch)
    print('==> Time to complete test epoch: ', time.time() - test_begin)
    scheduler.step()



Epoch: 0
==> Time to complete training epoch:  84.27412939071655
==> Time to complete test epoch:  5.993894338607788

Epoch: 1
==> Time to complete training epoch:  83.85359740257263
==> Time to complete test epoch:  5.971439361572266

Epoch: 2
==> Time to complete training epoch:  83.94187808036804
==> Time to complete test epoch:  6.010834693908691

Epoch: 3
==> Time to complete training epoch:  84.56193137168884
==> Time to complete test epoch:  6.062749862670898

Epoch: 4
==> Time to complete training epoch:  86.34221506118774
==> Time to complete test epoch:  6.207306861877441

Epoch: 5
==> Time to complete training epoch:  86.80048203468323
==> Time to complete test epoch:  6.185855150222778

Epoch: 6
==> Time to complete training epoch:  86.72444152832031
==> Time to complete test epoch:  6.177858829498291

Epoch: 7
==> Time to complete training epoch:  86.6077036857605
==> Time to complete test epoch:  6.186498165130615

Epoch: 8
==> Time to complete training epoch:  86.522280