### HW 4, Part 2, Start
### CSCI 4270 and 6270, Spring 2024

This is starter code for HW 4, Part 2. Most important is the definition of the Dataset object for loading, separately, the train, validation and test image sets. Students can use as much or as little of this as they wish and can modify it in anyway they'd like

In [1]:
import numpy as np
import os 
import torch
import torchvision.transforms as transforms
from PIL import Image
from torch.utils.data import Dataset

In [2]:
def is_image(fn):
    extensions = ['.jpg', '.jpeg', '.png']
    return any(fn.lower().endswith(ext) for ext in extensions)

def find_images_in_folder(folder_path, verbose=False):
    full_image_paths = []
    # Iterate through all files in the folder
    for filename in os.listdir(folder_path):
        file_path = os.path.join(folder_path, filename)
        # Check if the file is an image
        if os.path.isfile(file_path) and is_image(filename):
            # Try opening the image
            try:
                im = Image.open(file_path)
                full_image_paths.append(file_path)
                if verbose:
                    print(f"Read image: {file_path}")
            except Exception as e:
                print(f"Error failed to read {filename}: {e}")
#     print(f'Returing {len(full_image_paths)} image paths')
    return full_image_paths

folder_path = "hw4_data/valid/ocean"
full_paths = find_images_in_folder(folder_path, verbose=False)


In [3]:
'''
Provide a Dataset object for the five class dataset.
'''

# These are empirically determined values to optimize image intensity rescaling prior to training
MEAN = [0.485, 0.456, 0.406]
STD = [0.229, 0.224, 0.225]

'''
The Dataset class we write must include the __init__, __len__ and __getitem__ (subscripting) 
methods.
'''
class HW4_Dataset(Dataset):
    def __init__(self, path, class_names, new_size=None, verbose=False):
        '''
        Produce a list of the full image paths and class indices for all images
        in the given set (found along the path).  Record a transform to be
        applied by the __getitem__ method to each image.
        '''
        self.full_image_paths = []
        self.class_names = class_names
        self.gt_class_idx = []
        for idx, nm in enumerate(class_names):
            folder_path = os.path.join(path, nm)
            image_paths = find_images_in_folder(folder_path, verbose)
            self.full_image_paths += image_paths
            self.gt_class_idx += [idx] * len(image_paths)

        if new_size is not None:
            self.transform = transforms.Compose([transforms.Resize(new_size),
                                                 transforms.ToTensor(),
                                                 transforms.Normalize(mean=MEAN, std=STD)])
        else:
            self.transform = transforms.Compose([transforms.ToTensor(),
                                                 transforms.Normalize(mean=MEAN, std=STD)])

    def __len__(self):
        return len(self.full_image_paths)

    def __getitem__(self, idx):
        fp = self.full_image_paths[idx]
        class_i = self.gt_class_idx[idx]
        one_hot = np.zeros(5)
        one_hot[class_i] = 1
        one_hot_vector = torch.tensor(one_hot)
        im = Image.open(fp)
        im = self.transform(im)
        return im, one_hot_vector
        
        
class_names = ['grass', 'ocean', 'redcarpet', 'road', 'wheatfield']

new_size = 240   # This reduces the original 240x360 images to 60x90.  Setting it to 240 leaves the images unchanged
# new_size = None # Setting new_size to None keeps the original image size.
verbose = False

# Form all three datasets.
train_dataset = HW4_Dataset("hw4_data/train", class_names, new_size=new_size, verbose=verbose)
valid_dataset = HW4_Dataset("hw4_data/valid", class_names, new_size=new_size, verbose=verbose)
test_dataset = HW4_Dataset("hw4_data/test", class_names, new_size=new_size, verbose=verbose)


In [4]:
'''
Explore the constructed dataset
'''
import random
import matplotlib.pyplot as plt

# Find and output the number of images
n = len(valid_dataset)
# print(f'The validation dataset has {n} images')

# Randomly shuffle the image indices
indices = list(range(n))
random.shuffle(indices)

# Get the image and the class id of the 0th image after the shuffle.
im, class_idx = valid_dataset[indices[0]]
# print(f'After the shuffle the 0th image has class index {class_idx}')

# Convert the image from an array back to a numpy 3d array
im_np = im.numpy().transpose((1, 2, 0))
# print(f'Image shape is {im_np.shape}')

# Before displaying the image rescale the intensities to be between 0 and 1
im_min = im_np.min()
im_max = im_np.max()
im_np = (im_np - im_min) / (im_max - im_min)

# Display the image
# plt.imshow(im_np)
# plt.axis('off')
# plt.show()
# print(class_idx)

In [6]:
from torch.utils.data import DataLoader
batch_size = 32
train_dataloader = DataLoader(train_dataset, batch_size=batch_size)
valid_dataloader = DataLoader(valid_dataset, batch_size=batch_size)
test_dataloader = DataLoader(test_dataset, batch_size=batch_size)

for X, y in test_dataloader:
#     print(f"Shape of X [N, C, H, W]: {X.shape}")
#     print(f"Shape of y: {y.shape} {y.dtype}")
    break
    
device = "cuda" if torch.cuda.is_available() else "cpu"
# print(f"Using {device} device")

#conv nn from MNIST example
from torch import nn
class NeuralNetwork_Conv(nn.Module):
    def __init__(self):
        super(NeuralNetwork_Conv, self).__init__()
        self.conv_stack = nn.Sequential(
            #modified input (feature depth) to match input tensors depth
            #also doesn't change height/width
            nn.Conv2d(3, 16, 3, stride=1, padding=1),
            nn.ReLU(),
            #halves height/width (now tensor is 120x180)
            nn.MaxPool2d(2,2),
            nn.Conv2d(16, 32, 3, stride=1, padding=1),
            nn.ReLU(),
            #halves height/width, now 60x90
            nn.MaxPool2d(2,2),
            nn.Conv2d(32, 32, 3, stride=1, padding=1),
            nn.ReLU(),
        )

        self.fc_stack = nn.Sequential(
            nn.Flatten(),
            #singular fully connected, input is flattened img so 15x22 and output from convolutional
            nn.Linear(60*90*32, 128),
            nn.ReLU(),
            #output layer matches num classes
            nn.Linear(128, 5),
        )

    def forward(self, x):
        logits = self.fc_stack(self.conv_stack(x))
        return logits

# print()
model = NeuralNetwork_Conv().to(device)
# print(model)
# for p in model.parameters():
#     print(p.size())
    
# print()
# mb = torch.rand(batch_size, 3, 240, 360).to(device)
# logits = model.forward(mb)
# print(logits.size())

Shape of X [N, C, H, W]: torch.Size([32, 3, 240, 360])
Shape of y: torch.Size([32, 5]) torch.float64


In [10]:
def train(dataloader, model, loss_fn, optimizer):
    size = len(dataloader.dataset)
    model.train()
    training_loss = 0.0
    for batch, (X, y) in enumerate(dataloader):
        X, y = X.to(device), y.to(device)

        # Compute prediction error
        pred = model(X)
        loss = loss_fn(pred, y)
        # Backpropagation
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        
        training_loss += loss.item()
    training_loss /= len(dataloader)
    return training_loss
    
        
def validate(dataloader, model, loss_fn):
    size = len(dataloader.dataset)
    num_batches = len(dataloader)
    model.eval()
    test_loss, correct = 0, 0
    #so that we do not compute gradients, only model performance
    with torch.no_grad():
        for X, y in dataloader:
            X, y = X.to(device), y.to(device)
            pred = model(X)
            test_loss += loss_fn(pred, y).item()
            correct += (pred.argmax(1) == y.argmax(1)).type(torch.float).sum().item()
    test_loss /= num_batches
    correct /= size
    return correct

In [122]:
'''Learning Rate 1e-4, 3 conv (stride 1, padding 1) w/ 2 pools (2x2 kernel), 1 fully connected'''
loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr=1e-4)
epochs = 150
best_accuracy = 0
best_model = 'best_model.pth'
for t in range(epochs):
    training_loss = train(train_dataloader, model, loss_fn, optimizer)
    validation_accuracy = validate(valid_dataloader, model, loss_fn)
    if (validation_accuracy > best_accuracy):
        best_accuracy = validation_accuracy
        #save the model 
        torch.save(model, best_model)
        print("Epoch:", t)
        print("Training Loss:", training_loss)
        print("Accuracy:", best_accuracy)
        print()
print("Done!")

Epoch: 0
Training Loss: 1.6057454598672463
Accuracy: 0.22533333333333333

Epoch: 1
Training Loss: 1.5927176989163412
Accuracy: 0.27066666666666667

Epoch: 2
Training Loss: 1.5810231772674757
Accuracy: 0.30266666666666664

Epoch: 3
Training Loss: 1.5690563654470626
Accuracy: 0.32

Epoch: 4
Training Loss: 1.5557522725991226
Accuracy: 0.344

Epoch: 5
Training Loss: 1.541804754700173
Accuracy: 0.34933333333333333

Epoch: 6
Training Loss: 1.526509084947633
Accuracy: 0.36133333333333334

Epoch: 7
Training Loss: 1.5096931998815502
Accuracy: 0.39066666666666666

Epoch: 8
Training Loss: 1.4915984477283377
Accuracy: 0.4053333333333333

Epoch: 9
Training Loss: 1.471558003426727
Accuracy: 0.4093333333333333

Epoch: 10
Training Loss: 1.4497031015315742
Accuracy: 0.41733333333333333

Epoch: 11
Training Loss: 1.4256636186192433
Accuracy: 0.42533333333333334

Epoch: 12
Training Loss: 1.3996583958410405
Accuracy: 0.428

Epoch: 14
Training Loss: 1.341870140625785
Accuracy: 0.43466666666666665

Epoch: 15

In [77]:
'''Learning Rate 1e-2, 3 conv (stride 1, padding 1) w/ 2 pools (2x2 kernel), 1 fully connected'''
optimizer = torch.optim.SGD(model.parameters(), lr=1e-2)
epochs = 50
best_accuracy = 0
#best_model = 'best_model.pth'
for t in range(epochs):
    training_loss = train(train_dataloader, model, loss_fn, optimizer)
    validation_accuracy = validate(valid_dataloader, model, loss_fn)
    if (validation_accuracy > best_accuracy):
        best_accuracy = validation_accuracy
        #torch.save(model, best_model)
        print("Epoch:", t)
        print("Training Loss:", training_loss)
        print("Accuracy:", best_accuracy)
        print()
print("Done!")

Epoch: 0
Training Loss: 0.31635332958014883
Accuracy: 0.2

Epoch: 1
Training Loss: 0.48589642325461213
Accuracy: 0.20266666666666666

Epoch: 20
Training Loss: 0.3716062876936629
Accuracy: 0.204

Epoch: 28
Training Loss: 0.26680702476068013
Accuracy: 0.20533333333333334

Epoch: 33
Training Loss: 0.22037400237304494
Accuracy: 0.21466666666666667

Epoch: 38
Training Loss: 0.1816286506332231
Accuracy: 0.228

Epoch: 42
Training Loss: 0.1581944456221547
Accuracy: 0.23733333333333334

Epoch: 43
Training Loss: 0.14258158562706819
Accuracy: 0.24266666666666667

Epoch: 49
Training Loss: 0.09976205417901468
Accuracy: 0.248

Done!


In [83]:
'''Learning Rate 1e-6, 3 conv (stride 1, padding 1) w/ 2 pools (2x2 kernel), 1 fully connected'''
optimizer = torch.optim.SGD(model.parameters(), lr=1e-6)
epochs = 50
best_accuracy = 0
#best_model = 'best_model.pth'
for t in range(epochs):
    training_loss = train(train_dataloader, model, loss_fn, optimizer)
    validation_accuracy = validate(valid_dataloader, model, loss_fn)
    if (validation_accuracy > best_accuracy):
        best_accuracy = validation_accuracy
        #torch.save(model, best_model)
        print("Epoch:", t)
        print("Training Loss:", training_loss)
        print("Accuracy:", best_accuracy)
        print()
print("Done!")

Epoch: 0
Training Loss: 1.6130496804122674
Accuracy: 0.18666666666666668

Epoch: 4
Training Loss: 1.6122589665380391
Accuracy: 0.188

Epoch: 5
Training Loss: 1.6120633262124928
Accuracy: 0.18933333333333333

Epoch: 6
Training Loss: 1.6118676887430026
Accuracy: 0.19066666666666668

Epoch: 8
Training Loss: 1.6114773856532392
Accuracy: 0.192

Epoch: 10
Training Loss: 1.611088868287025
Accuracy: 0.19333333333333333

Epoch: 11
Training Loss: 1.610895669347409
Accuracy: 0.19466666666666665

Epoch: 13
Training Loss: 1.6105091728044278
Accuracy: 0.196

Epoch: 14
Training Loss: 1.6103161138228395
Accuracy: 0.19866666666666666

Epoch: 15
Training Loss: 1.6101230914281173
Accuracy: 0.2

Epoch: 16
Training Loss: 1.6099300114155717
Accuracy: 0.20133333333333334

Epoch: 17
Training Loss: 1.609737072252866
Accuracy: 0.20266666666666666

Epoch: 20
Training Loss: 1.6091577035459605
Accuracy: 0.20533333333333334

Epoch: 49
Training Loss: 1.6036491732028397
Accuracy: 0.20666666666666667

Done!


In [128]:
'''Neural Network with more conv layers, max pooling, and fully connected layers'''
class NeuralNetwork_MoreConv(nn.Module):
    def __init__(self):
        super(NeuralNetwork_MoreConv, self).__init__()
        self.conv_stack = nn.Sequential(
            #modified input (feature depth) to match input tensors depth
            #also doesn't change height/width
            nn.Conv2d(3, 16, 3, stride=1, padding=1),
            nn.ReLU(),
            #halves height/width, now tensor is 120x180
            nn.MaxPool2d(2,2),
            nn.Conv2d(16, 32, 3, stride=1, padding=1),
            nn.ReLU(),
            #60x90
            nn.MaxPool2d(2,2),
            nn.Conv2d(32, 32, 3, stride=1, padding=1),
            nn.ReLU(),
            #30x45
            nn.MaxPool2d(2,2),
            nn.Conv2d(32, 64, 3, stride=1, padding=1),
            nn.ReLU(),
            #15x22
            nn.MaxPool2d(2,2),
            nn.Conv2d(64, 128, 3, stride=1, padding=1),
            nn.ReLU(),
        )

        self.fc_stack = nn.Sequential(
            nn.Flatten(),
            #singular fully connected, input is flattened img so 15x22 and output from convolutional
            nn.Linear(15*22*128, 256),
            nn.ReLU(),
            #another fully connected layer
            nn.Linear(256, 128),
            nn.ReLU(),
            #another fully connected layer
            nn.Linear(128, 64),
            nn.ReLU(),
            #output layer, matches num classes
            nn.Linear(64, 5),
        )

    def forward(self, x):
        logits = self.fc_stack(self.conv_stack(x))
        return logits
    
model_conv = NeuralNetwork_MoreConv().to(device)
# print(model_conv)
# for p in model_conv.parameters():
#     print(p.size())
    
# print()
# mb = torch.rand(batch_size, 3, 240, 360).to(device)
# logits = model_conv.forward(mb)
# print(logits.size())

In [85]:
'''Learning Rate 1e-4, 5 conv (stride 1, padding 1) w/ 4 pools (2x2 kernel), 3 fully connected'''
loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr=1e-4)
epochs = 50
best_accuracy = 0
best_model_conv = 'best_model_conv.pth'
for t in range(epochs):
    training_loss = train(train_dataloader, model_conv, loss_fn, optimizer)
    validation_accuracy = validate(valid_dataloader, model_conv, loss_fn)
    if (validation_accuracy > best_accuracy):
        best_accuracy = validation_accuracy
        #save the model 
        torch.save(model_conv, best_model)
        print("Epoch:", t)
        print("Training Loss:", training_loss)
        print("Accuracy:", best_accuracy)
        print()
print("Done!")

Epoch: 0
Training Loss: 1.6097832491564932
Accuracy: 0.2

Epoch: 31
Training Loss: 1.6094334858949437
Accuracy: 0.20266666666666666

Epoch: 36
Training Loss: 1.6093823815385502
Accuracy: 0.204

Epoch: 38
Training Loss: 1.6093620294535702
Accuracy: 0.20666666666666667

Epoch: 40
Training Loss: 1.6093417504971677
Accuracy: 0.20933333333333334

Epoch: 43
Training Loss: 1.6093106651057798
Accuracy: 0.21066666666666667

Epoch: 46
Training Loss: 1.6092791597499991
Accuracy: 0.212

Epoch: 47
Training Loss: 1.6092686906231173
Accuracy: 0.21333333333333335

Epoch: 49
Training Loss: 1.609248060405706
Accuracy: 0.21466666666666667

Done!


In [130]:
'''Taken from part 1'''
def compute_accuracy(test_pairs, num_classes):
    test_pairs_np = np.array(test_pairs)
    correct = np.sum(test_pairs_np[:, 0] == test_pairs_np[:, 1])
    return correct / len(test_pairs)

def compute_per_class_accuracy(test_pairs, num_classes):
    test_pairs_np = np.array(test_pairs)
    true_labels = test_pairs_np[:, 0]
    predicted = test_pairs_np[:, 1]
    classes = np.bincount(true_labels)
    correct = np.bincount(true_labels[true_labels == predicted])
    
    return correct/classes

def compute_confusion_matrix(test_pairs, num_classes):
    test_pairs_np = np.array(test_pairs)
    confusion = np.bincount(test_pairs_np[:, 0] * num_classes + test_pairs_np[:, 1]).reshape(num_classes, num_classes)
    return confusion

In [133]:
def test(dataloader, model, loss_fn):
    size = len(dataloader.dataset)
    num_batches = len(dataloader)
    model.eval()
    test_loss, correct = 0, 0
    predicted_labels = []
    true_labels = []
    with torch.no_grad():
        for X, y in dataloader:
            X, y = X.to(device), y.to(device)
            pred = model(X)
            correct += (pred.argmax(1) == y.argmax(1)).type(torch.float).sum().item()
            
            predicted_labels.extend(pred.argmax(1).cpu())
            true_labels.extend(y.argmax(1).cpu())

    correct /= size
    return predicted_labels, true_labels

In [134]:
'''Final Test using best weights from original parameters (learning rate 1e-4, 3 conv, 2 pool, 1 layer)'''
loss_fn = nn.CrossEntropyLoss()
final_model = torch.load(best_model)
final_model.eval()
pred, true = test(test_dataloader, final_model, loss_fn)
pairs = np.column_stack((pred, true)).astype(int)

accuracy = compute_accuracy(pairs, len(class_names))
print(f'Overall Accuracy: {accuracy:.2f}')

per_class_accuracy = compute_per_class_accuracy(pairs, len(class_names))
print()
print('Per class accuracy')
for i, acc in enumerate(per_class_accuracy):
    print(f'{i}: {acc:4.2f}')
    
cm = compute_confusion_matrix(pairs, len(class_names))
print(f'\nConfusion matrix')
for i in range(len(class_names)):
    print(f'{i:2d}:', end='')
    for j in range(len(class_names)):
        print(f' {cm[i, j]:2d}', end='')
    print()

Overall Accuracy: 0.57

Per class accuracy
0: 0.96
1: 0.70
2: 0.98
3: 0.66
4: 0.38

Confusion matrix
 0: 25  0  0  0  1
 1: 29 85  0  4  4
 2:  1  0 87  1  0
 3: 23 19  3 92  3
 4: 72 46 60 53 142


Discussion: 

My experimental results are a bit lacking considering that I utilized only 50 epochs to check for the best parameters and weights. I realize that a smaller learning rate might take longer to converge and 50 epochs is too small to realize that convergence. However, a greater learning rate (1e-2) began to pick up much slower compared to 1e-4, which surprised me. As for design choices, I played with the number of convolutional layers and fully connected layers using the learning rate that outperformed the others within 50 epochs. Again, because I had limited the epochs to 50, the full potential of the network may not have been brought out since the accuracy was stuck at 0.2 and didn't have a chance to pick up. All experiments had the same loss function and optimizer function. In the end, the first set of parameters I experimented with outshined the rest within the 50 epochs so I chose those parameters to train using 150 epochs.

The classifier works well in classifying red carpets especially, which makes sense considering its distinctiveness. I would consider the ocean to be the 2nd most easily classifiable. This is mainly because the classifier was unable to correctly guess the grass backgrounds in previous best models for the final test. Still, the number of grass images are substantially lower than the others. As just mentioned, the classifier works poorly with grass backgrounds as well as wheat fields. This makes sense because of their similar features. We can see a higher incorrect predictions of grasslands for the wheatfields compared to other backgrounds and the singular incorrect prediction of a wheatfield for the grasslands.

If I had more time, I would definitely have run 200 epochs per experiment to figure out the best parameters, but the amount of time training these neural networks took much longer than expected on my local machine.