## M2.5 - Homework Assignment: Convolutional Neural Networks

#### Student: Antonio López Chávez - A01741741
#### Professor: PhD Jorge Mario Cruz Duarte

#### **Experiment 1 (40/100 points): Baseline Model**
#### This baseline model will be the one you must beat in the next experiments. So, implement a basic CNN model with the following architecture:
* Convolutional layer with 32 filters, 3×3 kernel size, and ReLU activation.
* MaxPooling layer with 2×2 pool size.
* Flatten layer.
* Dense layer with 128 units and ReLU activation.
* Dense output layer with softmax activation.
#### After compiling the model, train it on the CIFAR-10 dataset and report the accuracy. Consider that this model would be the one to beat in the following experiments.

In [1]:
# Very important to import these libraries first
import torch
import torch.nn as nn
import torch.optim as optim
from torchvision import datasets, transforms
from torch.utils.data import DataLoader
from torch.nn.functional import relu, log_softmax

In [2]:
# Setting the seed to get reproducible results
torch.manual_seed(0)

# Using the correct mean and std for the CIFAR-10 dataset, usually used as 0.5 for both or 0.5 for mean and 0.25 for std
# Sources: TODO https://stackoverflow.com/questions/66678052/how-to-calculate-the-mean-and-the-std-of-cifar10-data
customize_image_transforming = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.4914, 0.4822, 0.4465], std=[0.247, 0.243, 0.261])
])

In [3]:
# Using GPU acceleration to speed up the training process (finally using the GPU for academic purposes!)
gpu_device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

In [4]:
# Downloading the CIFAR-10 dataset with the customized image transformation (may take a while, especially the first time and with bad internet connection)
cifar_train_data = datasets.CIFAR10(root='./data', train=True, download=True, transform=customize_image_transforming)
cifar_test_data = datasets.CIFAR10(root='./data', train=False, download=True, transform=customize_image_transforming)

Files already downloaded and verified
Files already downloaded and verified


In [6]:
# Setting the batch size to 4 and shuffling the training data for better training (best practices)
cifar_train_loader = DataLoader(cifar_train_data, batch_size=4, shuffle=True)
cifar_test_loader = DataLoader(cifar_test_data, batch_size=4, shuffle=False)

In [9]:
# We now define the Convolutional Neural Network Layers
convolutional_layer = nn.Conv2d(in_channels=3, out_channels=32, kernel_size=3, stride=1, padding=1).to(gpu_device)   # 32 filters, 3x3 kernel and ReLU activation function, directed to the gpu processor
max_pooling_layer = nn.MaxPool2d(2, 2).to(gpu_device)   # Max pooling layer with 2x2 pool size, directed to the gpu processor
dense_layer = nn.Linear(32 * 16 * 16, 128).to(gpu_device)   # Dense layer with 128 units and ReLU activation, directed to the gpu processor
dense_output_layer = nn.Linear(128, 10).to(gpu_device)  # Dense output layer, transforming the output to a prediction of 10 classes, directed to the gpu processor

In [10]:
# Loss and optimizer
loss_criterion = torch.nn.NLLLoss()
adam_optimizer = optim.Adam([{'params': convolutional_layer.parameters()},
                        {'params': dense_layer.parameters()},
                        {'params': dense_output_layer.parameters()}], lr=0.001)

#### Defining training and testing functions, as it is easier to invoke them for each epoch

In [29]:
# Training the model
def train(epoch):
    # It's only necessary to set the training mode for dropout and batch normalization layers
    for batch_index, (data, target) in enumerate(cifar_train_loader):
        data, target = data.to(gpu_device), target.to(gpu_device)
        adam_optimizer.zero_grad()
        tensor_x = relu(convolutional_layer(data))
        tensor_x = max_pooling_layer(tensor_x)
        tensor_x = tensor_x.view(-1, 32 * 16 * 16)  # Flatten layer
        tensor_x = relu(dense_layer(tensor_x))
        tensor_x = dense_output_layer(tensor_x)
        training_loss = log_softmax(tensor_x, dim=1)  # Apply log-softmax
        training_loss = loss_criterion(training_loss, target)
        training_loss.backward()
        adam_optimizer.step()
        if batch_index % 100 == 0:
            print(f'Train Epoch: {epoch} [{batch_index * len(data)}/{len(cifar_train_loader.dataset)} ({100. * batch_index / len(cifar_train_loader):.0f}%)]\tLoss: {training_loss.item():.6f}')


In [30]:
# Testing the model
def test():
    test_loss = 0
    correct = 0
    with torch.no_grad():
        for data, target in cifar_test_loader:
            data, target = data.to(gpu_device), target.to(gpu_device)
            tensor_x = relu(convolutional_layer(data))
            tensor_x = max_pooling_layer(tensor_x)
            tensor_x = tensor_x.view(-1, 32 * 16 * 16)
            tensor_x = relu(dense_layer(tensor_x))
            tensor_x = dense_output_layer(tensor_x)
            test_loss_prob = log_softmax(tensor_x, dim=1)  # Apply log-softmax
            test_loss += loss_criterion(test_loss_prob, target).item()
            pred = test_loss_prob.argmax(dim=1, keepdim=True)
            correct += pred.eq(target.view_as(pred)).sum().item()
    test_loss /= len(cifar_test_loader.dataset)
    accuracy = 100. * correct / len(cifar_test_loader.dataset)
    print(f'\nTest set: Average loss: {test_loss:.4f}, Accuracy: {correct}/{len(cifar_test_loader.dataset)} ({accuracy:.0f}%)\n')


In [31]:
# Run training and testing for ten epochs
for epoch in range(1, 11):
    train(epoch)
    test()


Test set: Average loss: 0.2954, Accuracy: 5834/10000 (58%)


Test set: Average loss: 0.2846, Accuracy: 6071/10000 (61%)


Test set: Average loss: 0.2743, Accuracy: 6265/10000 (63%)


Test set: Average loss: 0.2702, Accuracy: 6373/10000 (64%)


Test set: Average loss: 0.2865, Accuracy: 6250/10000 (62%)


Test set: Average loss: 0.2887, Accuracy: 6363/10000 (64%)


Test set: Average loss: 0.3165, Accuracy: 6315/10000 (63%)


Test set: Average loss: 0.3312, Accuracy: 6378/10000 (64%)


Test set: Average loss: 0.3263, Accuracy: 6334/10000 (63%)


Test set: Average loss: 0.3529, Accuracy: 6216/10000 (62%)



#### **Experiment 2 (20/100 points): Changing Architecture**
#### Vary the architecture of your baseline model and report how each change impacts the performance. Consider adding more convolutional layers, varying the number of filters in each convolutional layer, including dropout layers to reduce overfitting, and so forth.

##### I'll be adding another convolutional layer, a dropout layer and an extra dense layer

In [30]:
# Layers for second experiment
convolutional_layer = nn.Conv2d(in_channels=3, out_channels=32, kernel_size=3, stride=1, padding=1).to(gpu_device)   # 32 filters, 3x3 kernel and ReLU activation function, directed to the gpu processor
convolutional_layer2 = nn.Conv2d(in_channels=32, out_channels=64, kernel_size=3, stride=1, padding=1).to(gpu_device)    # 128 filters, 3x3 kernel and ReLU activation function, directed to the gpu processor
dropout_layer = nn.Dropout(0.25).to(gpu_device)   # Dropout layer with 25% probability, directed to the gpu processor
max_pooling_layer = nn.MaxPool2d(2, 2).to(gpu_device)   # Max pooling layer with 2x2 pool size, directed to the gpu processor
dense_layer = nn.Linear(64 * 16 * 16, 256).to(gpu_device)   # Dense layer with 256 units and ReLU activation, directed to the gpu processor
dense_layer2 = nn.Linear(256, 128).to(gpu_device)   # Dense layer with 128 units and ReLU activation, directed to the gpu processor
dense_output_layer = nn.Linear(128, 10).to(gpu_device)  # Dense output layer, transforming the output to a prediction of 10 classes, directed to the gpu processor

In [31]:
# Loss and optimizer
loss_criterion = torch.nn.NLLLoss()
adam_optimizer = optim.Adam([{'params': convolutional_layer.parameters()},
                        {'params': convolutional_layer2.parameters()},
                        {'params': dropout_layer.parameters()},
                        {'params': dense_layer.parameters()},
                        {'params': dense_layer2.parameters()},
                        {'params': dense_output_layer.parameters()}], lr=0.001)

In [32]:
# Training the model
def train(epoch):
    print(f'Training model for Experiment 2')
    for batch_index, (data, target) in enumerate(cifar_train_loader):
        data, target = data.to(gpu_device), target.to(gpu_device)
        adam_optimizer.zero_grad()
        tensor_x = relu(convolutional_layer(data))
        tensor_x = relu(convolutional_layer2(tensor_x))
        tensor_x = max_pooling_layer(tensor_x)
        tensor_x = tensor_x.view(-1, 64 * 16 * 16)  # Flatten layer
        tensor_x = relu(dense_layer(tensor_x))
        tensor_x = dropout_layer(tensor_x)
        tensor_x = relu(dense_layer2(tensor_x))
        tensor_x = dropout_layer(tensor_x)
        tensor_x = dense_output_layer(tensor_x)
        training_loss = log_softmax(tensor_x, dim=1)  # Apply log-softmax
        training_loss = loss_criterion(training_loss, target)
        training_loss.backward()
        adam_optimizer.step()
        if batch_index % 100 == 0:
            print(f'Train Epoch: {epoch} [{batch_index * len(data)}/{len(cifar_train_loader.dataset)} ({100. * batch_index / len(cifar_train_loader):.0f}%)]\tLoss: {training_loss.item():.6f}')

In [33]:
# Testing the model
def test():
    test_loss = 0
    correct = 0
    print(f'Testing model for Experiment 2')
    with torch.no_grad():
        for data, target in cifar_test_loader:
            data, target = data.to(gpu_device), target.to(gpu_device)
            tensor_x = relu(convolutional_layer(data))
            tensor_x = relu(convolutional_layer2(tensor_x))
            tensor_x = max_pooling_layer(tensor_x)
            tensor_x = tensor_x.view(-1, 64 * 16 * 16)  # Flatten layer
            tensor_x = relu(dense_layer(tensor_x))
            tensor_x = dropout_layer(tensor_x)
            tensor_x = relu(dense_layer2(tensor_x))
            tensor_x = dropout_layer(tensor_x)
            tensor_x = dense_output_layer(tensor_x)
            test_loss_prob = log_softmax(tensor_x, dim=1)  # Apply log-softmax
            test_loss += loss_criterion(test_loss_prob, target).item()
            pred = test_loss_prob.argmax(dim=1, keepdim=True)
            correct += pred.eq(target.view_as(pred)).sum().item()
    test_loss /= len(cifar_test_loader.dataset)
    accuracy = 100. * correct / len(cifar_test_loader.dataset)
    print(f'\nTest set: Average loss: {test_loss:.4f}, Accuracy: {correct}/{len(cifar_test_loader.dataset)} ({accuracy:.0f}%)\n')

In [34]:
# Run training and testing for ten epochs
for epoch in range(1, 11):
    train(epoch)
    test()

Training model for Experiment 2
Testing model for Experiment 2

Test set: Average loss: 0.3131, Accuracy: 5802/10000 (58%)

Training model for Experiment 2
Testing model for Experiment 2

Test set: Average loss: 0.3025, Accuracy: 5817/10000 (58%)

Training model for Experiment 2
Testing model for Experiment 2

Test set: Average loss: 0.2952, Accuracy: 6001/10000 (60%)

Training model for Experiment 2
Testing model for Experiment 2

Test set: Average loss: 0.2851, Accuracy: 6214/10000 (62%)

Training model for Experiment 2
Testing model for Experiment 2

Test set: Average loss: 0.3029, Accuracy: 6311/10000 (63%)

Training model for Experiment 2
Testing model for Experiment 2

Test set: Average loss: 0.3195, Accuracy: 6177/10000 (62%)

Training model for Experiment 2
Testing model for Experiment 2

Test set: Average loss: 0.3293, Accuracy: 6270/10000 (63%)

Training model for Experiment 2
Testing model for Experiment 2

Test set: Average loss: 0.3650, Accuracy: 6155/10000 (62%)

Training

#### **Experiment 3 (20/100 points): Hyperparameter Tuning**
#### Explore different hyper-parameters, including learning rate, batch size, and number of epochs. Plus, use a validation set to tune the hyper-parameters and report the results on the test set.

##### I'll be increasing the batch size, decreasing the learning rate and increasing the number of epochs. The goal is to get better results than the previous experiments by also validating the data.

In [36]:
# Using numpy and SubsetRandomSampler for shuffling the dataset
import numpy as np
from torch.utils.data.sampler import SubsetRandomSampler

In [37]:
# Downloading the CIFAR-10 dataset with the customized image transformation (may take a while, especially the first time and with bad internet connection)
cifar_train_data = datasets.CIFAR10(root='./data', train=True, download=True, transform=customize_image_transforming)
cifar_test_data = datasets.CIFAR10(root='./data', train=False, download=True, transform=customize_image_transforming)

Files already downloaded and verified
Files already downloaded and verified


In [38]:
# Splitting the training set into training and validation sets
num_train = len(cifar_train_data)
indices = list(range(num_train))    # Indicess for shuffling
split = int(num_train * 0.8)  # 80% for training, 20% for validation
np.random.shuffle(indices)

# Splitting the indices for training and validation
train_idx, valid_idx = indices[:split], indices[split:]
train_sampler = SubsetRandomSampler(train_idx)
valid_sampler = SubsetRandomSampler(valid_idx)

In [39]:
# Altering the batch size in order to test the speed and efficiency of the model, as well as assigning workers and pin memory to speed up the process
# This function will be used for testing different batch sizes
def get_data_loaders(batch_size):
    train_loader = torch.utils.data.DataLoader(cifar_train_data, batch_size=batch_size, sampler=train_sampler, num_workers=4, pin_memory=True)
    valid_loader = torch.utils.data.DataLoader(cifar_train_data, batch_size=batch_size, sampler=valid_sampler, num_workers=4, pin_memory=True)
    test_loader = torch.utils.data.DataLoader(cifar_test_data, batch_size=batch_size, shuffle=False, num_workers=4, pin_memory=True)
    return train_loader, valid_loader, test_loader

In [40]:
# Layers for third experiment - Not much changes from the past experiment
convolutional_layer = nn.Conv2d(in_channels=3, out_channels=32, kernel_size=3, stride=1, padding=1).to(gpu_device)   # 32 filters, 3x3 kernel and ReLU activation function, directed to the gpu processor
convolutional_layer2 = nn.Conv2d(in_channels=32, out_channels=64, kernel_size=3, stride=1, padding=1).to(gpu_device)    # 128 filters, 3x3 kernel and ReLU activation function, directed to the gpu processor
dropout_layer = nn.Dropout(0.25).to(gpu_device)   # Dropout layer with 25% probability, directed to the gpu processor
max_pooling_layer = nn.MaxPool2d(2, 2).to(gpu_device)   # Max pooling layer with 2x2 pool size, directed to the gpu processor
dense_layer = nn.Linear(64 * 16 * 16, 256).to(gpu_device)   # Dense layer with 256 units and ReLU activation, directed to the gpu processor
dense_layer2 = nn.Linear(256, 128).to(gpu_device)   # Dense layer with 128 units and ReLU activation, directed to the gpu processor
dense_output_layer = nn.Linear(128, 10).to(gpu_device)  # Dense output layer, transforming the output to a prediction of 10 classes, directed to the gpu processor

In [41]:
# Set loss criterion
loss_criterion = torch.nn.NLLLoss()

# Function to get the optimizer with a specific learning rate
def get_optimizer(learning_rate):
    return optim.Adam([{'params': convolutional_layer.parameters()},
                        {'params': convolutional_layer2.parameters()},
                        {'params': dropout_layer.parameters()},
                        {'params': dense_layer.parameters()},
                        {'params': dense_layer2.parameters()},
                        {'params': dense_output_layer.parameters()}], lr=learning_rate)

In [51]:
# Training the model
def train(epoch, train_loader, optimizer):
    # It's only necessary to set the training mode for dropout and batch normalization layers
    print(f'Training model for Experiment 3')
    for batch_index, (data, target) in enumerate(train_loader):
        data, target = data.to(gpu_device), target.to(gpu_device)
        optimizer.zero_grad()
        tensor_x = relu(convolutional_layer(data))
        tensor_x = relu(convolutional_layer2(tensor_x))
        tensor_x = max_pooling_layer(tensor_x)
        tensor_x = tensor_x.view(-1, 64 * 16 * 16)  # Flatten layer
        tensor_x = relu(dense_layer(tensor_x))
        tensor_x = dropout_layer(tensor_x)
        tensor_x = relu(dense_layer2(tensor_x))
        tensor_x = dropout_layer(tensor_x)
        tensor_x = dense_output_layer(tensor_x)
        training_loss = log_softmax(tensor_x, dim=1)  # Apply log-softmax
        training_loss = loss_criterion(training_loss, target)
        training_loss.backward()
        optimizer.step()
        if batch_index % 100 == 0:
            print(f'Train Epoch: {epoch} [{batch_index * len(data)}/{len(train_loader.dataset)} ({100. * batch_index / len(train_loader):.0f}%)]\tLoss: {training_loss.item():.6f}')

In [52]:
# Testing the model
def test(test_loader):
    test_loss = 0
    correct = 0
    print(f'Testing model for Experiment 3')
    with torch.no_grad():
        for data, target in test_loader:
            data, target = data.to(gpu_device), target.to(gpu_device)
            tensor_x = relu(convolutional_layer(data))
            tensor_x = relu(convolutional_layer2(tensor_x))
            tensor_x = max_pooling_layer(tensor_x)
            tensor_x = tensor_x.view(-1, 64 * 16 * 16)  # Flatten layer
            tensor_x = relu(dense_layer(tensor_x))
            tensor_x = dropout_layer(tensor_x)
            tensor_x = relu(dense_layer2(tensor_x))
            tensor_x = dropout_layer(tensor_x)
            tensor_x = dense_output_layer(tensor_x)
            test_loss_prob = log_softmax(tensor_x, dim=1)  # Apply log-softmax
            test_loss += loss_criterion(test_loss_prob, target).item()
            pred = test_loss_prob.argmax(dim=1, keepdim=True)
            correct += pred.eq(target.view_as(pred)).sum().item()
    test_loss /= len(test_loader.dataset)
    accuracy = 100. * correct / len(test_loader.dataset)
    print(f'\nTest set: Average loss: {test_loss:.4f}, Accuracy: {correct}/{len(test_loader.dataset)} ({accuracy:.0f}%)\n')
    return test_loss, accuracy

In [59]:
# Validation function
def validate_model(val_loader):
    print(f'Validating model for Experiment 3')
    valid_loss = 0.0
    correct = 0
    total = 0
    with torch.no_grad():
        for data, target in val_loader:
            data, target = data.to(gpu_device), target.to(gpu_device)
            tensor_x = relu(convolutional_layer(data))
            tensor_x = relu(convolutional_layer2(tensor_x))
            tensor_x = max_pooling_layer(tensor_x)
            tensor_x = tensor_x.view(-1, 64 * 16 * 16)  # Flatten layer
            tensor_x = relu(dense_layer(tensor_x))
            tensor_x = dropout_layer(tensor_x)
            tensor_x = relu(dense_layer2(tensor_x))
            tensor_x = dropout_layer(tensor_x)
            tensor_x = dense_output_layer(tensor_x)
            validate_loss_prob = log_softmax(tensor_x, dim=1)  # Apply log-softmax
            valid_loss += loss_criterion(validate_loss_prob, target).item()
            pred = validate_loss_prob.argmax(dim=1, keepdim=True)
            total += target.size(0)
            correct += (pred == target).sum().item()
    accuracy = 100 * correct / len(val_loader.dataset)
    avg_loss = valid_loss / len(val_loader)
    return avg_loss, accuracy

In [60]:
# Set GPU device to improve performance and speed
gpu_device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

In [61]:
# Hyper-parameter configurations, evaluating different batch sizes, learning rates and epochs
experiments = [
    {'learning_rate': 0.001, 'batch_size': 128, 'epochs': 10},  # This should be the default configuration
    {'learning_rate': 0.0005, 'batch_size': 64, 'epochs': 15},  # This should be the most accurate configuration
    {'learning_rate': 0.01, 'batch_size': 256, 'epochs': 5},    # This should be the fastest configuration
]

# Results dictionary
results = {}

for config in experiments:
    print(f"Running experiment with lr={config['learning_rate']}, batch_size={config['batch_size']}, epochs={config['epochs']}")

    # Get data loaders
    train_loader, valid_loader, test_loader = get_data_loaders(config['batch_size'])
    
    # Define loss function and optimizer
    criterion = nn.CrossEntropyLoss()
    optimizer = get_optimizer(learning_rate=config['learning_rate'])
    
    # Train the model
    for epoch in range(config['epochs']):
        train(epoch+1, train_loader, optimizer)
    
    # Validate the model
    val_loss, val_accuracy = validate_model(valid_loader)
    print(f'Validation loss: {val_loss:.4f}, Validation accuracy: {val_accuracy:.2f}%')
    
    # Test the model
    test_loss, test_accuracy = test(test_loader)
    
    # Save results
    results[config['learning_rate'], config['batch_size'], config['epochs']] = (val_loss, val_accuracy, test_loss, test_accuracy)

# Print results
for params, result in results.items():
    print(f"lr={params[0]}, batch_size={params[1]}, epochs={params[2]} -> Val Loss: {result[0]:.4f}, Val Acc: {result[1]:.2f}%, Test Loss: {result[2]:.4f}, Test Acc: {result[3]:.2f}%")

Running experiment with lr=0.001, batch_size=128, epochs=10
Training model for Experiment 3
Training model for Experiment 3
Training model for Experiment 3
Training model for Experiment 3
Training model for Experiment 3
Training model for Experiment 3
Training model for Experiment 3
Training model for Experiment 3
Training model for Experiment 3
Training model for Experiment 3
Validating model for Experiment 3
Validation loss: 2.1995, Validation accuracy: 266.52%
Testing model for Experiment 3

Test set: Average loss: 0.0181, Accuracy: 6638/10000 (66%)

Running experiment with lr=0.0005, batch_size=64, epochs=15
Training model for Experiment 3
Training model for Experiment 3
Training model for Experiment 3
Training model for Experiment 3
Training model for Experiment 3
Training model for Experiment 3
Training model for Experiment 3
Training model for Experiment 3
Training model for Experiment 3
Training model for Experiment 3
Training model for Experiment 3
Training model for Experimen

#### **(Optional) Experiment 4 (10/100 points): Advanced Techniques**
#### Consider implementing one or more advanced techniques, such as batch normalisation, data augmentation, or transfer learning. (Even you could try more exotique architectures.) Hence, comparing the model’s performance with these techniques to the baseline model will be easy.

In [10]:
# Adding data augmentation: random crop, random horizontal flip, random rotation, color jitter and normalization (this was already done in the last experiments, but it's important to mention it here as well)
customize_image_transforming = transforms.Compose([
    transforms.RandomCrop(32, padding=1),
    transforms.RandomHorizontalFlip(),
    transforms.RandomRotation(15),
    transforms.ColorJitter(brightness=0.25, contrast=0.25, saturation=0.25, hue=0.25),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.4914, 0.4822, 0.4465], std=[0.247, 0.243, 0.261])
])

In [11]:
# Maintaining the batch size from previous experiment since it got better and faster results
cifar_train_loader = DataLoader(cifar_train_data, batch_size=256, shuffle=True, num_workers=4, pin_memory=True)
cifar_test_loader = DataLoader(cifar_test_data, batch_size=128, shuffle=False, num_workers=4, pin_memory=True)

In [16]:
# Layers for fourth experiment - Added batch normalization layers for the convolutional and dense layers
convolutional_layer = nn.Conv2d(in_channels=3, out_channels=32, kernel_size=3, stride=1, padding=1).to(gpu_device)   # 32 filters, 3x3 kernel and ReLU activation function, directed to the gpu processor
convolutional_layer2 = nn.Conv2d(in_channels=32, out_channels=64, kernel_size=3, stride=1, padding=1).to(gpu_device)    # 128 filters, 3x3 kernel and ReLU activation function, directed to the gpu processor
dropout_layer = nn.Dropout(0.25).to(gpu_device)   # Dropout layer with 25% probability, directed to the gpu processor
max_pooling_layer = nn.MaxPool2d(2, 2).to(gpu_device)   # Max pooling layer with 2x2 pool size, directed to the gpu processor
dense_layer = nn.Linear(64 * 16 * 16, 256).to(gpu_device)   # Dense layer with 256 units and ReLU activation, directed to the gpu processor
dense_layer2 = nn.Linear(256, 128).to(gpu_device)   # Dense layer with 128 units and ReLU activation, directed to the gpu processor
dense_output_layer = nn.Linear(128, 10).to(gpu_device)  # Dense output layer, transforming the output to a prediction of 10 classes, directed to the gpu processor
batch_normalization_1 = nn.BatchNorm2d(32).to(gpu_device)
batch_normalization_2 = nn.BatchNorm2d(64).to(gpu_device)

In [17]:
# Loss and optimizer with a lower learning rate: 0.0005
loss_criterion = torch.nn.NLLLoss()
adam_optimizer = optim.Adam([{'params': convolutional_layer.parameters()},
                        {'params': convolutional_layer2.parameters()},
                        {'params': dropout_layer.parameters()},
                        {'params': max_pooling_layer.parameters()},
                        {'params': dense_layer.parameters()},
                        {'params': dense_layer2.parameters()},
                        {'params': dense_output_layer.parameters()},
                        {'params': batch_normalization_1.parameters()},
                        {'params': batch_normalization_2.parameters()},], lr=0.0005)

In [18]:
# Training the model for experiment 4
def train(epoch):
    # It's only necessary to set the training mode for dropout and batch normalization layers
    print(f'Training model for Experiment 4')
    for batch_index, (data, target) in enumerate(cifar_train_loader):
        data, target = data.to(gpu_device), target.to(gpu_device)
        adam_optimizer.zero_grad()
        # Adding batch normalization layers to the convolutional layers sequentially to make it easier to read
        tensor_x = relu(convolutional_layer(data))
        tensor_x = batch_normalization_1(tensor_x)
        tensor_x = relu(convolutional_layer2(tensor_x))
        tensor_x = batch_normalization_2(tensor_x)
        tensor_x = max_pooling_layer(tensor_x)
        tensor_x = tensor_x.view(-1, 64 * 16 * 16)  # Flatten layer
        tensor_x = relu(dense_layer(tensor_x))
        tensor_x = dropout_layer(tensor_x)
        tensor_x = relu(dense_layer2(tensor_x))
        tensor_x = dropout_layer(tensor_x)
        tensor_x = dense_output_layer(tensor_x)
        training_loss = log_softmax(tensor_x, dim=1)  # Apply log-softmax
        training_loss = loss_criterion(training_loss, target)
        training_loss.backward()
        adam_optimizer.step()
        if batch_index % 100 == 0:
            print(f'Train Epoch: {epoch} [{batch_index * len(data)}/{len(cifar_train_loader.dataset)} ({100. * batch_index / len(cifar_train_loader):.0f}%)]\tLoss: {training_loss.item():.6f}')

In [19]:
# Testing the model
def test():
    test_loss = 0
    correct = 0
    print(f'Testing model for Experiment 4')
    with torch.no_grad():
        for data, target in cifar_test_loader:
            data, target = data.to(gpu_device), target.to(gpu_device)
            # Adding batch normalization layers to the convolutional layers sequentially to make it easier to read
            tensor_x = relu(convolutional_layer(data))
            tensor_x = batch_normalization_1(tensor_x)
            tensor_x = relu(convolutional_layer2(tensor_x))
            tensor_x = batch_normalization_2(tensor_x)
            tensor_x = max_pooling_layer(tensor_x)
            tensor_x = tensor_x.view(-1, 64 * 16 * 16)  # Flatten layer
            tensor_x = relu(dense_layer(tensor_x))
            tensor_x = dropout_layer(tensor_x)
            tensor_x = relu(dense_layer2(tensor_x))
            tensor_x = dropout_layer(tensor_x)
            tensor_x = dense_output_layer(tensor_x)
            test_loss_prob = log_softmax(tensor_x, dim=1)  # Apply log-softmax
            test_loss += loss_criterion(test_loss_prob, target).item()
            pred = test_loss_prob.argmax(dim=1, keepdim=True)
            correct += pred.eq(target.view_as(pred)).sum().item()
    test_loss /= len(cifar_test_loader.dataset)
    accuracy = 100. * correct / len(cifar_test_loader.dataset)
    print(f'\nTest set: Average loss: {test_loss:.4f}, Accuracy: {correct}/{len(cifar_test_loader.dataset)} ({accuracy:.0f}%)\n')

In [23]:
# Run training and testing for twenty epochs, same as experiment 3
for epoch in range(1, 21):
    train(epoch)
    test()

Training model for Experiment 4
Testing model for Experiment 4

Test set: Average loss: 0.0091, Accuracy: 5891/10000 (59%)

Training model for Experiment 4
Testing model for Experiment 4

Test set: Average loss: 0.0081, Accuracy: 6410/10000 (64%)

Training model for Experiment 4
Testing model for Experiment 4

Test set: Average loss: 0.0076, Accuracy: 6727/10000 (67%)

Training model for Experiment 4
Testing model for Experiment 4

Test set: Average loss: 0.0077, Accuracy: 6792/10000 (68%)

Training model for Experiment 4
Testing model for Experiment 4

Test set: Average loss: 0.0076, Accuracy: 6909/10000 (69%)

Training model for Experiment 4
Testing model for Experiment 4

Test set: Average loss: 0.0086, Accuracy: 6886/10000 (69%)

Training model for Experiment 4
Testing model for Experiment 4

Test set: Average loss: 0.0093, Accuracy: 6922/10000 (69%)

Training model for Experiment 4
Testing model for Experiment 4

Test set: Average loss: 0.0100, Accuracy: 6928/10000 (69%)

Training

#### Now something more interesting, let's do transfer learning with the RESnet-34 model

In [6]:
from torchvision import models

In [7]:
# Transformaciones diferentes para el entrenamiento y la prueba
transform_train = transforms.Compose([
    transforms.RandomCrop(32, padding=4),
    transforms.RandomHorizontalFlip(),
    transforms.ToTensor(),
    transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010)),
])

transform_test = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010)),
])

In [8]:
# Load datasets with the new transformations
trainset = datasets.CIFAR10(root='./data', train=True, download=True, transform=transform_train)
testset = datasets.CIFAR10(root='./data', train=False, download=True, transform=transform_test)

Files already downloaded and verified
Files already downloaded and verified


In [9]:
# Load the transformed datasets into the dataloaders
trainloader = torch.utils.data.DataLoader(trainset, batch_size=256, shuffle=True, num_workers=4, pin_memory=True)
testloader = torch.utils.data.DataLoader(testset, batch_size=100, shuffle=False, num_workers=4, pin_memory=True)

In [10]:
# Load a pre-trained ResNet-34 model to do the transfer learning
resnet = models.resnet34(pretrained=True)

Downloading: "https://download.pytorch.org/models/resnet34-b627a593.pth" to C:\Users\alope/.cache\torch\hub\checkpoints\resnet34-b627a593.pth
100.0%


In [11]:
# Modify the final fully connected layer for CIFAR-10, which has 10 classes, so 10 outputs
num_ftrs = resnet.fc.in_features
resnet.fc = nn.Linear(num_ftrs, 10)
model = resnet.to(gpu_device) # Send the model to the GPU

In [12]:
# Freeze the model initial layers
for param in model.parameters():
    param.requires_grad = False

# Only the final layer parameters will be updated
for param in model.fc.parameters():
    param.requires_grad = True

In [13]:
# Define loss function and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.fc.parameters(), lr=0.0005)

In [14]:
# Training function
def train(epoch):
    print(f'Training model for Experiment 4 - Transfer Learning')
    model.train()
    for batch_idx, (data, target) in enumerate(trainloader):
        data, target = data.to(gpu_device), target.to(gpu_device)
        optimizer.zero_grad()
        output = model(data)
        loss = criterion(output, target)
        loss.backward()
        optimizer.step()
        if batch_idx % 100 == 0:
            print(f'Train Epoch: {epoch} [{batch_idx * len(data)}/{len(trainloader.dataset)} ({100. * batch_idx / len(trainloader):.0f}%)]\tLoss: {loss.item():.6f}')

In [15]:
# Testing function
def test():
    print(f'Testing model for Experiment 4 - Transfer Learning')
    model.eval()
    test_loss = 0
    correct = 0
    with torch.no_grad():
        for data, target in testloader:
            data, target = data.to(gpu_device), target.to(gpu_device)
            output = model(data)
            test_loss += criterion(output, target).item()
            pred = output.argmax(dim=1, keepdim=True)
            correct += pred.eq(target.view_as(pred)).sum().item()
    
    test_loss /= len(testloader.dataset)
    print(f'\nTest set: Average loss: {test_loss:.4f}, Accuracy: {correct}/{len(testloader.dataset)} ({100. * correct / len(testloader.dataset):.0f}%)\n')

In [16]:
# Run training and testing for 20 epochs
for epoch in range(1, 21):
    train(epoch)
    test()

Training model for Experiment 4 - Transfer Learning
Testing model for Experiment 4 - Transfer Learning

Test set: Average loss: 0.0189, Accuracy: 3692/10000 (37%)

Training model for Experiment 4 - Transfer Learning
Testing model for Experiment 4 - Transfer Learning

Test set: Average loss: 0.0178, Accuracy: 3951/10000 (40%)

Training model for Experiment 4 - Transfer Learning
Testing model for Experiment 4 - Transfer Learning

Test set: Average loss: 0.0177, Accuracy: 4008/10000 (40%)

Training model for Experiment 4 - Transfer Learning
Testing model for Experiment 4 - Transfer Learning

Test set: Average loss: 0.0176, Accuracy: 4069/10000 (41%)

Training model for Experiment 4 - Transfer Learning
Testing model for Experiment 4 - Transfer Learning

Test set: Average loss: 0.0173, Accuracy: 4183/10000 (42%)

Training model for Experiment 4 - Transfer Learning
Testing model for Experiment 4 - Transfer Learning

Test set: Average loss: 0.0173, Accuracy: 4197/10000 (42%)

Training model f