# Deep Learning Homework 3
<p> Thomas Dougherty <br>
9/28/2023 <br>
Deep Learning</p>


### Load Data

In [8]:
import torch
from torch import nn
from torch import optim
import torch.nn.functional as F
from torch.utils.data import random_split

import torchvision
from torchvision import transforms
from torchvision import models
from torchvision.datasets import CIFAR10


# download training data
train_data = CIFAR10(root="./train/",
                     train=True,
                     download=True,
                     transform=None)


Files already downloaded and verified


### Accessing the Data

`How do you access the label?` <br>
<p> The label can be accessed with with direct indexing of the [1] index of a train_data[i] tuple </p>

In [9]:
label = train_data[16][1]
data_class = train_data.classes[train_data[16][1]]
print(f'Data Label: {label}')                       # print the label of the tuple
print(f'Class: {data_class}')    # print the class of the image

Data Label: 9
Class: truck


`What method is called when you index into a Dataset?`

The Dataset function __getitem__(self, index: int) is called. It takes a self-reference and integer as arguments to return a tuple containing the image and target class.

`Is CIFAR10 a class that is derived from the Dataset class?`

Yes, CIFAR10 is a subclass of Dataset that inherits the __getitem__ and __len__ methods from Dataset.

`Inheritance Tree` <br>
![Alt text](dataset-inherit.jpg)

### Data Transforms

Pre-process the data before running through a neural net

In [10]:
# TRAINING DATA

# taking mean and std values from the book
train_transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.
    transforms.RandomHorizontalFlip(),
    transforms.RandomCrop(32, padding=4),
    transforms.Normalize(mean=(0.4914, 0.4822, 0.4465),
                        std=(0.2023, 0.1994, 0.201)),
])

train_data_xform = CIFAR10(root="./train/",
                     train=True,
                     transform=train_transform)

data, label = train_data[16]
print(f'Training data without transforms: {data}')
print(f'Data Label: {label}')

data, label = train_data_xform[16]
print(f'Training data with transforms: {data}')
print(f'Data Label: {label}')


# TESTING DATA
test_transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize(
        (0.4914, 0.4822, 0.4465),
        (0.2023, 0.1994, 0.2010))
    ])

test_data = CIFAR10(root="./test/",
                     train=False,
                     transform=test_transform,
                     download=True)

Training data without transforms: <PIL.Image.Image image mode=RGB size=32x32 at 0x7F6741977A30>
Data Label: 9
Training data with transforms: tensor([[[ 2.1652,  2.1652,  2.1652,  ..., -2.4291, -2.4291, -2.4291],
         [ 2.1652,  2.1652,  2.1845,  ..., -2.4291, -2.4291, -2.4291],
         [ 2.1458,  2.1652,  2.2233,  ..., -2.4291, -2.4291, -2.4291],
         ...,
         [-2.4291, -2.4291, -2.4291,  ..., -2.4291, -2.4291, -2.4291],
         [-2.4291, -2.4291, -2.4291,  ..., -2.4291, -2.4291, -2.4291],
         [-2.4291, -2.4291, -2.4291,  ..., -2.4291, -2.4291, -2.4291]],

        [[ 2.2625,  2.2625,  2.2625,  ..., -2.4183, -2.4183, -2.4183],
         [ 2.2625,  2.2625,  2.2821,  ..., -2.4183, -2.4183, -2.4183],
         [ 2.2428,  2.2625,  2.3215,  ..., -2.4183, -2.4183, -2.4183],
         ...,
         [-2.4183, -2.4183, -2.4183,  ..., -2.4183, -2.4183, -2.4183],
         [-2.4183, -2.4183, -2.4183,  ..., -2.4183, -2.4183, -2.4183],
         [-2.4183, -2.4183, -2.4183,  ..., -2.41

`When you instantiate train_data the second time, with the transform, try without download=True. Look at the API. What does it say?`
<p>If download is set to 'True', the data is downloaded to the root directory, otherwise it will verify the data has been downloaded. If the 'download' option is not included and the Dataset is not found, it will cause a runtime error.</p>

`What is the difference between training and testing transforms? Training is supposed to ”see” more data variability and that is why we provide augmentations of the original data through transforms. Why do you think the test dataset has a different transform?`

The training transform has augmentations such as horizontal flips and random crops whereas the testing data is unaltered aside from normalization. Keeping the two sets seperated like this helps simulate real-world variabiltiy and the model will make generalizations about unseen data. Augmenting the testing data the same way as the training data may lead to overly-optimistic results.

`Please do` <br>
`data, label = train_data[index] in both cases (with and without transforms).`<br>
`Why is your result different when you apply transforms?`

Before transforms are applied, the data is raw. When accessing train_data[index], metadata is returned such as the color mode, size, and address in memory. When the data is processed with a transform, it's changed into a tensor so that it is readable by a machine learning model.

### Data Batching
<p> I've noticed that bigger batch size results in faster training times, likely because it takes better advtange of GPU parallelism. However, it also leads to higher loss value. The larger batch size causes a smoother average since extreme or outlying datapoints have less influence over the loss value. The learning rate and the number of epochs has to be adjusted in order to get accurate results. </p>

In [11]:
train_set, val_set = random_split(
                      train_data_xform, 
                      [40000, 10000])

batch_size = 512

# Dataloader does all the work of shuffling data between batches and training cycles (epochs)

train_loader = torch.utils.data.DataLoader(
    train_set,
    batch_size,           # Bigger batch size results in faster training times, and higher memory usage
    shuffle = True)


val_loader = torch.utils.data.DataLoader(
    val_set,
    batch_size,
    shuffle = True)

# create data batches
data_batch, labels_batch = next(iter(train_loader))

print(data_batch.size())
print(labels_batch.size())

test_loader = torch.utils.data.DataLoader(
    test_data,
    batch_size,
    shuffle = False) # set shuffle to false for the testing data for repeatable results


torch.Size([512, 3, 32, 32])
torch.Size([512])


### Model Design

In [12]:
vgg16 = models.vgg16(pretrained=True)     # faster convergence on pre-trained models

# Print layers of the neural network
print("Neural Network Layers: ")
print(vgg16)



Neural Network Layers: 
VGG(
  (features): Sequential(
    (0): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): ReLU(inplace=True)
    (2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (3): ReLU(inplace=True)
    (4): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (5): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (6): ReLU(inplace=True)
    (7): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (8): ReLU(inplace=True)
    (9): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (10): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (11): ReLU(inplace=True)
    (12): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (13): ReLU(inplace=True)
    (14): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (15): ReLU(inplace=True)
    (16): MaxPool2d(kernel_size=2, stride=

In [13]:
# replace the Linear transformation layer with a new definition
"""There are 10 classes in the CIFAR10 dataset, so 10 possible output features"""
vgg16.classifier[-1] = nn.Linear(4096, 10)

device = "cuda"

model_vgg = vgg16.to(device=device)

### Model Training

<p> I spent too much time trying to debug the training loop when it kept crashing my laptop. Tried again on my desktop and it turned out my laptop just couldn't handle it The training loop took around 20-25 minutes to complete with the original parameters but I managed to get it down to around 6 minutes. It was fun playing around with this to get the loop to run in a more efficient way while trying to maintain around 80% accuracy </p> 
<br>
Training: Find patters of a given data set <br>
Validation: Tune hyperparameters and evaulate the model on a separate portion of the data set <br>
Testing: Assess performance of the model with unseen data

In [14]:
N_EPOCHS = 12

criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(model_vgg.parameters(),
                      lr=0.010, 
                      momentum=0.9)

for epoch in range(N_EPOCHS):

    # TRAINING
    train_loss = 0.0
    model_vgg.train()
    print("TRAINING") 
    for inputs, labels in train_loader:
        inputs = inputs.to(device) 
        labels = labels.to(device)

        optimizer.zero_grad()   # forget errors of previous pass, let's start fresh

        outputs = model_vgg(inputs)
        loss = criterion(outputs, labels)   # compute loss
        
        loss.backward()         # backpropegation; compute gradient
        
        optimizer.step()        # adjust parameters based on gradient 

        train_loss += loss.item()
    print("Epoch: {} Training Loss: {}".format(epoch, 
                  train_loss/len(train_loader)))
    
    # VALIDATION
    val_loss = 0.0
    model_vgg.eval()
    print("VALIDATION")
    for inputs, labels in val_loader:
        inputs = inputs.to(device)
        labels = labels.to(device)

        outputs = model_vgg(inputs)
        loss = criterion(outputs, labels)

        val_loss += loss.item()
    print("Epoch: {} Train Loss: {} Val Loss: {}".format(
                  epoch, 
                  train_loss/len(train_loader), 
                  val_loss/len(val_loader)))

TRAINING
Epoch: 0 Training Loss: 1.0472834789300267
VALIDATION
Epoch: 0 Train Loss: 1.0472834789300267 Val Loss: 0.7708707064390182
TRAINING
Epoch: 1 Training Loss: 0.6240437034564682
VALIDATION
Epoch: 1 Train Loss: 0.6240437034564682 Val Loss: 0.5693194717168808
TRAINING
Epoch: 2 Training Loss: 0.4981967591786686
VALIDATION
Epoch: 2 Train Loss: 0.4981967591786686 Val Loss: 0.5287001490592956
TRAINING
Epoch: 3 Training Loss: 0.4372568458695955
VALIDATION
Epoch: 3 Train Loss: 0.4372568458695955 Val Loss: 0.45301304906606676
TRAINING
Epoch: 4 Training Loss: 0.38682634241973296
VALIDATION
Epoch: 4 Train Loss: 0.38682634241973296 Val Loss: 0.4452690929174423
TRAINING
Epoch: 5 Training Loss: 0.350644817835168
VALIDATION
Epoch: 5 Train Loss: 0.350644817835168 Val Loss: 0.42349734008312223
TRAINING
Epoch: 6 Training Loss: 0.3198577593776244
VALIDATION
Epoch: 6 Train Loss: 0.3198577593776244 Val Loss: 0.40564699172973634
TRAINING
Epoch: 7 Training Loss: 0.2948030256017854
VALIDATION
Epoch: 7 T

### Testing

In [15]:
num_correct = 0.0

for x_test_batch, y_test_batch in test_loader:

    model_vgg.eval()
    y_test_batch = y_test_batch.to(device)
    x_test_batch = x_test_batch.to(device)

    y_pred_batch = model_vgg(x_test_batch)
    _, predicted = torch.max(y_pred_batch, 1)
    num_correct += (predicted == y_test_batch).float().sum()

vgg_accuracy = num_correct/(len(test_loader)*test_loader.batch_size)

print(len(test_loader), test_loader.batch_size)
print("Test Accuracy: {}".format(vgg_accuracy))

20 512
Test Accuracy: 0.83935546875


### VGG16 vs LeNet5

##### LeNet class definition

In [16]:
class LeNet5(nn.Module):
    def __init__(self):
        super(LeNet5, self).__init__()
        self.conv1 = nn.Conv2d(3, 6, 5) # <1>
        self.conv2 = nn.Conv2d(6, 16, 5)
        self.fc1 = nn.Linear(16 * 5 * 5, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
        x = F.max_pool2d(F.relu(self.conv1(x)), (2, 2))
        x = F.max_pool2d(F.relu(self.conv2(x)), 2)
        x = x.view(-1, int(x.nelement() / x.shape[0]))
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

device = "cuda"
model_lenet = LeNet5().to(device=device)

##### LeNet training loop 
<p> Noticed that an excessively high learning rate tends to lead to poor performance. Probably from larger adjustments to model parameters and overshooting the optimal solution.</p>
<p> It also tends to lead to a loss plateau. </p>

In [17]:
criterion = nn.CrossEntropyLoss()

optimizer = optim.SGD(model_lenet.parameters(),
                      lr=0.015,  
                      momentum=0.9)

# TRAINING
for epoch in range(N_EPOCHS):

    # Training 
    train_loss = 0.0
    model_lenet.train()
    for inputs, labels in train_loader:
        inputs = inputs.to(device)
        labels = labels.to(device)

        optimizer.zero_grad()

        outputs = model_lenet(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()

        train_loss += loss.item()

    # Validation
    val_loss = 0.0
    model_lenet.eval()
    for inputs, labels in val_loader:
        inputs = inputs.to(device)
        labels = labels.to(device)

        outputs = model_lenet(inputs)
        loss = criterion(outputs, labels)

        val_loss += loss.item()

    print("Epoch: {} Train Loss: {} Val Loss: {}".format(
                  epoch, 
                  train_loss/len(train_loader), 
                  val_loss/len(val_loader)))


Epoch: 0 Train Loss: 2.2516923886311204 Val Loss: 2.0793440103530885
Epoch: 1 Train Loss: 1.9201205199277853 Val Loss: 1.7683249354362487
Epoch: 2 Train Loss: 1.7125612814215165 Val Loss: 1.7482620179653168
Epoch: 3 Train Loss: 1.6314312310158452 Val Loss: 1.5617895245552063
Epoch: 4 Train Loss: 1.540311656420744 Val Loss: 1.501370108127594
Epoch: 5 Train Loss: 1.49169121361986 Val Loss: 1.5180115044116973
Epoch: 6 Train Loss: 1.4591672646848461 Val Loss: 1.4335203528404237
Epoch: 7 Train Loss: 1.4107440906234934 Val Loss: 1.4246290266513824
Epoch: 8 Train Loss: 1.3928079891808425 Val Loss: 1.3620448529720306
Epoch: 9 Train Loss: 1.3453128051154222 Val Loss: 1.328314608335495
Epoch: 10 Train Loss: 1.308727303637734 Val Loss: 1.4007271468639373
Epoch: 11 Train Loss: 1.2928386715394031 Val Loss: 1.267656636238098


##### Testing

In [18]:
num_correct = 0.0

for x_test_batch, y_test_batch in test_loader:

    model_lenet.eval()
    y_test_batch = y_test_batch.to(device)
    x_test_batch = x_test_batch.to(device)

    y_pred_batch = model_lenet(x_test_batch)
    _, predicted = torch.max(y_pred_batch, 1)
    num_correct += (predicted == y_test_batch).float().sum()

lenet_accuracy = num_correct/(len(test_loader)*test_loader.batch_size)

print(len(test_loader), test_loader.batch_size)
print("LeNet5 Test Accuracy: {}".format(lenet_accuracy))
print("VGG16 Test Accuracy: {}".format(vgg_accuracy))

20 512
LeNet5 Test Accuracy: 0.5594726800918579
VGG16 Test Accuracy: 0.83935546875


`Please compare these two performances on CIFAR10. Why is one better than another?`
<p> VGG16 took longer but was much more accurate. LeNet5 had faster iterations but requires more epochs to achieve the same level of accuracy as VGG16. This is because VGG16 is a deeper neural network with 16 weight layers, 13 convolution layers, 5 max pooling layers, and 3 dense layers. This allows VGG16 to find more complex patterns compared to LeNet5's 2 convolution layers and 3 linear layers. </p>