# Deep Learning Homework 3
<p> Thomas Dougherty <br>
9/28/2023 <br>
Deep Learning</p>


### Load Data

In [1]:
import torch
from torch import nn
from torch import optim
import torch.nn.functional as F

import torchvision
from torchvision import transforms
from torchvision import models
from torchvision.datasets import CIFAR10


# download training data
train_data = CIFAR10(root="./train/",
                     train=True,
                     download=True,
                     transform=None)

print(train_data)


Files already downloaded and verified
Dataset CIFAR10
    Number of datapoints: 50000
    Root location: ./train/
    Split: Train


### Accessing the Data

`How do you access the label?` <br>
<p> The label can be accessed with with direct indexing of the [1] index of a train_data[i] tuple </p>

In [2]:
label = train_data[16][1]
data_class = train_data.classes[train_data[16][1]]
print(f'Data Label: {label}')                       # print the label of the tuple
print(f'Class: {data_class}')    # print the class of the image

Data Label: 9
Class: truck


`What method is called when you index into a Dataset?`

The Dataset function __getitem__(self, index: int) is called. It takes a self-reference and integer as arguments to return a tuple containing the image and target class.

`Is CIFAR10 a class that is derived from the Dataset class?`

Yes, CIFAR10 is a subclass of Dataset that inherits the __getitem__ and __len__ methods from Dataset.

`Inheritance Tree`
<p> do this</p>

### Data Transforms

Pre-process the data before running through a neural net

In [3]:
# TRAINING DATA

# taking mean and std values from the book
train_transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.
    transforms.RandomHorizontalFlip(),
    transforms.RandomCrop(32, padding=4),
    transforms.Normalize(mean=(0.4914, 0.4822, 0.4465),
                        std=(0.2023, 0.1994, 0.201)),
])

train_data_xform = CIFAR10(root="./train/",
                     train=True,
                     transform=train_transform)

data, label = train_data[16]
print(f'Training data without transforms: {data}')
print(f'Data Label: {label}')

data, label = train_data_xform[16]
print(f'Training data with transforms: {data}')
print(f'Data Label: {label}')


# TESTING DATA
test_transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize(
        (0.4914, 0.4822, 0.4465),
        (0.2023, 0.1994, 0.2010))
    ])

test_data = CIFAR10(root="./test/",
                     train=False,
                     transform=test_transform,
                     download=True)

Training data without transforms: <PIL.Image.Image image mode=RGB size=32x32 at 0x7FA62FCB61D0>
Data Label: 9
Training data with transforms: tensor([[[ 2.2039,  2.2427,  2.2621,  ..., -2.4291, -2.4291, -2.4291],
         [ 2.2233,  2.3009,  2.3202,  ..., -2.4291, -2.4291, -2.4291],
         [ 2.2621,  2.3202,  2.3590,  ..., -2.4291, -2.4291, -2.4291],
         ...,
         [ 0.2461,  0.1879,  0.1685,  ..., -2.4291, -2.4291, -2.4291],
         [ 0.2461,  0.2073,  0.1491,  ..., -2.4291, -2.4291, -2.4291],
         [-2.4291, -2.4291, -2.4291,  ..., -2.4291, -2.4291, -2.4291]],

        [[ 2.3018,  2.3411,  2.3608,  ..., -2.4183, -2.4183, -2.4183],
         [ 2.3215,  2.3805,  2.4198,  ..., -2.4183, -2.4183, -2.4183],
         [ 2.3805,  2.4198,  2.4591,  ..., -2.4183, -2.4183, -2.4183],
         ...,
         [ 0.2564,  0.1974,  0.1778,  ..., -2.4183, -2.4183, -2.4183],
         [ 0.2564,  0.2171,  0.1581,  ..., -2.4183, -2.4183, -2.4183],
         [-2.4183, -2.4183, -2.4183,  ..., -2.41

`When you instantiate train_data the second time, with the transform, try without download=True. Look at the API. What does it say?`
<p>If download is set to 'True', the data is downloaded to the root directory, otherwise it will verify the data has been downloaded. If the 'download' option is not included and the Dataset is not found, it will cause a runtime error.</p>

`What is the difference between training and testing transforms? Training is supposed to ”see” more data variability and that is why we provide augmentations of the original data through transforms. Why do you think the test dataset has a different transform?`

The training transform has augmentations such as horizontal flips and random crops whereas the testing data is unaltered aside from normalization. Keeping the two sets seperated like this helps simulate real-world variabiltiy and the model will make generalizations about unseen data. Augmenting the testing data the same way as the training data may lead to overly-optimistic results.

`Please do` <br>
`data, label = train_data[index] in both cases (with and without transforms).`<br>
`Why is your result different when you apply transforms?`

Before transforms are applied, the data is raw. When accessing train_data[index], metadata is returned such as the color mode, size, and address in memory. When the data is processed with a transform, it's changed into a tensor so that it is readable by a machine learning model.

### Data Batching
<p> I've noticed that bigger batch size results in faster training times, likely because it takes better advtange of GPU parallelism. However, it also leads to higher loss value. The larger batch size causes a smoother average since extreme or outlying datapoints have less influence over the loss value. The learning rate and the number of epochs has to be adjusted in order to get accurate results. </p>

In [4]:
# Dataloader does all the work of shuffling data between batches and training cycles (epochs)

train_loader = torch.utils.data.DataLoader(
    train_data_xform,
    batch_size = 512,           # Bigger batch size results in faster training times, and higher memory usage
    shuffle = True)

# create data batches
data_batch, labels_batch = next(iter(train_loader))

print(data_batch.size())
print(labels_batch.size())

test_loader = torch.utils.data.DataLoader(
    test_data,
    batch_size = 512,
    shuffle = False) # set shuffle to false for the testing data for repeatable results


torch.Size([512, 3, 32, 32])
torch.Size([512])


### Model Design

In [5]:
vgg16 = models.vgg16(pretrained=True)     # faster convergence on pre-trained models

# Print layers of the neural network
print("Neural Network Layers: ")
print(vgg16)



Neural Network Layers: 
VGG(
  (features): Sequential(
    (0): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): ReLU(inplace=True)
    (2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (3): ReLU(inplace=True)
    (4): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (5): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (6): ReLU(inplace=True)
    (7): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (8): ReLU(inplace=True)
    (9): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (10): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (11): ReLU(inplace=True)
    (12): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (13): ReLU(inplace=True)
    (14): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (15): ReLU(inplace=True)
    (16): MaxPool2d(kernel_size=2, stride=

In [6]:
# replace the Linear transformation layer with a new definition
# There are 10 classes in the CIFAR10 dataset, so 10 possible output features
vgg16.classifier[-1] = nn.Linear(4096, 10)

device = "cuda"

model = vgg16.to(device=device)

### Model Training

<p> I spent too much time trying to debug the training loop when it kept crashing my laptop. Tried again on my desktop and it turned out my laptop just couldn't handle it The training loop took around 20-25 minutes to complete with the original parameters but I managed to get it down to around 6 minutes. It was fun playing around with this to get the loop to run in a more efficient way while trying to maintain around 80% accuracy </p>

In [7]:
print("Setting criterion...")
criterion = nn.CrossEntropyLoss()
print("Implementing stochastic gradient descent...")
optimizer = optim.SGD(model.parameters(),
                      lr=0.010, 
                      momentum=0.9)


N_EPOCHS = 8
for epoch in range(N_EPOCHS):
    print(f'Starting epoch {epoch}...')
    epoch_loss = 0.0
    for inputs, labels in train_loader:
        inputs = inputs.to(device) 
        labels = labels.to(device)

        optimizer.zero_grad()   # forget errors of previous pass, let's start fresh

        outputs = model(inputs)
        loss = criterion(outputs, labels)   # compute loss
        
        loss.backward()         # backpropegation; compute gradient
        
        optimizer.step()        # adjust parameters based on gradient 

        epoch_loss += loss.item()
    print("Epoch: {} Loss: {}".format(epoch, 
                  epoch_loss/len(train_loader)))

Setting criterion...
Implementing stochastic gradient descent...
Starting epoch 0...
Epoch: 0 Loss: 1.001998674504611
Starting epoch 1...
Epoch: 1 Loss: 0.5679168841060327
Starting epoch 2...
Epoch: 2 Loss: 0.4683391150771355
Starting epoch 3...
Epoch: 3 Loss: 0.4046045754637037
Starting epoch 4...
Epoch: 4 Loss: 0.3539463248179883
Starting epoch 5...
Epoch: 5 Loss: 0.33032100450019447
Starting epoch 6...
Epoch: 6 Loss: 0.2879910795968406
Starting epoch 7...
Epoch: 7 Loss: 0.2691903699721609


### Testing

In [8]:
num_correct = 0.0

for x_test_batch, y_test_batch in test_loader:

    model.eval()
    y_test_batch = y_test_batch.to(device)
    x_test_batch = x_test_batch.to(device)

    y_pred_batch = model(x_test_batch)
    _, predicted = torch.max(y_pred_batch, 1)
    num_correct += (predicted == y_test_batch).float().sum()

vgg_accuracy = num_correct/(len(test_loader)*test_loader.batch_size)

print(len(test_loader), test_loader.batch_size)
print("Test Accuracy: {}".format(vgg_accuracy))

20 512
Test Accuracy: 0.8656250238418579


### CIFAR10 vs LeNet5

##### LeNet class definition

In [9]:
class LeNet5(nn.Module):
    def __init__(self):
        super(LeNet5, self).__init__()
        self.conv1 = nn.Conv2d(3, 6, 5) # <1>
        self.conv2 = nn.Conv2d(6, 16, 5)
        self.fc1 = nn.Linear(16 * 5 * 5, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
        x = F.max_pool2d(F.relu(self.conv1(x)), (2, 2))
        x = F.max_pool2d(F.relu(self.conv2(x)), 2)
        x = x.view(-1, int(x.nelement() / x.shape[0]))
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

device = "cuda"
model = LeNet5().to(device=device)

##### LeNet training loop 
<p> Using same parameters as the VGG test </p>

In [10]:
print("Setting criterion...")
criterion = nn.CrossEntropyLoss()
print("Implementing stochastic gradient descent...")
optimizer = optim.SGD(model.parameters(),
                      lr=0.010, 
                      momentum=0.9)


N_EPOCHS = 8
for epoch in range(N_EPOCHS):
    print(f'Starting epoch {epoch}...')
    epoch_loss = 0.0
    for inputs, labels in train_loader:
        inputs = inputs.to(device) 
        labels = labels.to(device)

        optimizer.zero_grad()   # forget errors of previous pass, let's start fresh

        outputs = model(inputs)
        loss = criterion(outputs, labels)   # compute loss
        
        loss.backward()         # backpropegation; compute gradient
        
        optimizer.step()        # adjust parameters based on gradient 

        epoch_loss += loss.item()
    print("Epoch: {} Loss: {}".format(epoch, 
                  epoch_loss/len(train_loader)))

Setting criterion...
Implementing stochastic gradient descent...
Starting epoch 0...
Epoch: 0 Loss: 2.2525579953680235
Starting epoch 1...
Epoch: 1 Loss: 1.928471863269806
Starting epoch 2...
Epoch: 2 Loss: 1.7487856198330314
Starting epoch 3...
Epoch: 3 Loss: 1.6546363648103208
Starting epoch 4...
Epoch: 4 Loss: 1.5817009161929696
Starting epoch 5...
Epoch: 5 Loss: 1.534278276015301
Starting epoch 6...
Epoch: 6 Loss: 1.479981238744697
Starting epoch 7...
Epoch: 7 Loss: 1.4372382419449943


##### Testing

In [11]:
num_correct = 0.0

for x_test_batch, y_test_batch in test_loader:

    model.eval()
    y_test_batch = y_test_batch.to(device)
    x_test_batch = x_test_batch.to(device)

    y_pred_batch = model(x_test_batch)
    _, predicted = torch.max(y_pred_batch, 1)
    num_correct += (predicted == y_test_batch).float().sum()

lenet_accuracy = num_correct/(len(test_loader)*test_loader.batch_size)

print(len(test_loader), test_loader.batch_size)
print("LeNet5 Test Accuracy: {}".format(lenet_accuracy))
print("VGG16 Test Accuracy: {}".format(vgg_accuracy))

20 512
LeNet5 Test Accuracy: 0.525097668170929
VGG16 Test Accuracy: 0.8656250238418579


`Please compare these two performances on CIFAR10. Why is one better than another?`
<p> 