FREMIUD OTERO CORDERO
802-19-4359

#Set-Up

In [None]:
!pip install torchinfo

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


In [None]:
import torch
import torchvision
import torchvision.transforms as transforms
from torch.utils.data import DataLoader
import torch.optim as optim
import torch.nn as nn
import torch.nn.functional as F

from torchinfo import summary


This code segment sets up the device for PyTorch operations and prepares the MNIST dataset for training and testing. It first checks if a CUDA-enabled GPU is available and assigns the device variable accordingly. If CUDA is not available, it checks for the Multi-Process Service (MPS) backend and assigns the device variable to "mps" if available, or "cpu" if neither CUDA nor MPS is available. The selected device is then printed. 

In [None]:
device = "cuda" if torch.cuda.is_available() else "mps" if torch.backends.mps.is_available() else "cpu"
print("Device:", device)
torch.device(device)
torch.cuda.is_available()

Device: cuda


True

This code further defines a transformation pipeline for data preprocessing and sets the batch size. It creates data loaders for both the training and testing datasets, enabling efficient loading of data in batches for training and evaluation purposes.

In [None]:
transform = transforms.Compose(
    [transforms.ToTensor(),
     transforms.Normalize((0.5,), (0.5,))])

batch_size = 64

trainset = torchvision.datasets.MNIST(root='./data', train=True, download=True, transform=transform)
trainloader = DataLoader(trainset, batch_size=batch_size, shuffle=True, num_workers=2)

testset = torchvision.datasets.MNIST(root='./data', train=False, download=True, transform=transform)
testloader = DataLoader(testset, batch_size=batch_size, shuffle=False, num_workers=2)


#Define Convolutional Neural Networks:

Network #1: 4 layers
* Layer 1 – convolution with 16 filters, each filter 5x5 with same padding
* Layer 2 – ReLU activation
* Layer 3 – Flatten layer
* Layer 4 – fully connected layer with 10 neurons as output (using cross entropy
loss does the softmax)

In [None]:
class Net1(nn.Module):
    def __init__(self):
        super(Net1, self).__init__()
        self.conv1 = nn.Conv2d(1, 16, 5, padding='same')
        self.relu = nn.ReLU()
        
        self.flatten = nn.Flatten()
        self.fc1 = nn.Linear(16 * 28 * 28, 10)

    def forward(self, x):
        x = self.conv1(x)
        x = self.relu(x)

        x = self.flatten(x)
        x = self.fc1(x)
        return x


Network #2: 8- layer network
* Layer 1 – convolution with 6 filters , each filter 5x5 with same padding
* Layer 2 – ReLU activation
* Layer 3 – convolution with 16 filters , each filter 5x5 with same padding
* Layer 4 – ReLU activation
* Layer 5 – flatten layer
* Leyer 6 – fully connected layer with 84 neurons as ouput
* Layer 7 – ReLU activation
* Layer 8 - fully connected layer with 10 neurons as output (using cross entropy
loss does the softmax)

In [None]:
class Net2(nn.Module):
    def __init__(self):
        super(Net2, self).__init__()
        self.conv1 = nn.Conv2d(1, 6, 5, padding='same')
        self.conv2 = nn.Conv2d(6, 16, 5, padding='same')
        self.relu = nn.ReLU()
        
        self.flatten = nn.Flatten()
        self.fc1 = nn.Linear(16 * 28 * 28, 84)
        self.fc2 = nn.Linear(84, 10)

    def forward(self, x):
        x = self.conv1(x)
        x = self.relu(x)
        x = self.conv2(x)
        x = self.relu(x)

        x = self.flatten(x)
        x = self.fc1(x)
        x = self.relu(x)
        x = self.fc2(x)
        return x


Network #3: 14- layer network
* Layer 1 – convolution with 6 filters , each filter 5x5 with same padding
* Layer 2 – batch normalization for 6 filters
* Layer 3 – ReLU activation
* Layer 4 – Max pooling of size 2 (to halve de image)
* Layer 5 – convolution with 16 filters , each filter 5x5 with same padding
* Layer 6 – batch normalization for 16 filters
* Layer 7 – ReLU activation
* Layer 8 – Max pooling of size 2 (to halve de image)
* Layer 9 – flatten layer
* Leyer 10 – fully connected layer with 120 neurons as ouput
* Layer 11 – ReLU activation
* Leyer 12 – fully connected layer with 84 neurons as ouput
* Layer 13 – ReLU activation
* Layer 14 - fully connected layer with 10 neurons as output (using cross entropy
loss does the softmax)

In [None]:
class Net3(nn.Module):
    def __init__(self):
        super(Net3, self).__init__()
        self.conv1 = nn.Conv2d(1, 6, 5, padding='same')
        self.bn1 = nn.BatchNorm2d(6)
        self.relu = torch.nn.ReLU()
        self.max_pool = torch.nn.MaxPool2d(2)
        self.conv2 = nn.Conv2d(6, 16, 5, padding='same')
        self.bn2 = nn.BatchNorm2d(16)
        
        self.flatten = torch.nn.Flatten()
        self.fc1 = nn.Linear(16 * 7 * 7, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
        x = self.conv1(x)
        x = self.bn1(x)
        x = self.relu(x)
        x = self.max_pool(x)
        x = self.conv2(x)
        x = self.bn2(x)
        x = self.relu(x)        
        x = self.max_pool(x)

        x = self.flatten(x)
        x = self.fc1(x)
        x = self.relu(x)
        x = self.fc2(x)
        x = self.relu(x)
        x = self.fc3(x)
        return x


Bonus Network: 16-layer network

* Layer 1 – Convolutional layer with 1 input channel, 6 output channels, a kernel size of 3x3, and padding of 2.
* Layer 2 – Batch normalization applied to the outputs of the previous convolutional layer.
* Layer 3 – ReLU activation function.
* Layer 4 – Max pooling of size 2x2.
* Layer 5 – Convolutional layer with 6 input channels, 16 output channels, a kernel size of 3x3, and padding of 2.
* Layer 6 – Batch normalization applied to the outputs of the previous convolutional layer.
* Layer 7 – ReLU activation function.
* Layer 8 – Max pooling of size 2x2.
* Layer 9 – Flatten layer to transform the data into a 1-dimensional tensor.
* Layer 10 – Fully connected layer with 16 * 8 * 8 = 1024 input neurons and 256 output neurons.
* Layer 11 – ReLU activation function.
* Layer 12 – Dropout layer with a specified dropout rate.
* Layer 13 – Fully connected layer with 256 input neurons and 64 output neurons.
* Layer 14 – ReLU activation function.
* Layer 15 – Dropout layer with a specified dropout rate.
* Layer 16 – Fully connected layer with 64 input neurons and 10 output neurons.



In [None]:
class Bonus(nn.Module):
    def __init__(self, dropout_rate=0.2):
        super(Bonus, self).__init__()
        self.conv1 = nn.Conv2d(1, 6, 3, padding=2)
        self.bn1 = nn.BatchNorm2d(6)
        self.relu = nn.ReLU()
        self.max_pool = nn.MaxPool2d(2)

        self.conv2 = nn.Conv2d(6, 16, 3, padding=2)
        self.bn2 = nn.BatchNorm2d(16)

        self.flatten = nn.Flatten()
        self.fc1 = nn.Linear(16 * 8 * 8, 256)
        self.fc2 = nn.Linear(256, 64)
        self.fc3 = nn.Linear(64, 10)
        self.dropout = nn.Dropout(dropout_rate)


    def forward(self, x):
        x = self.conv1(x)
        x = self.bn1(x)
        x = self.relu(x)
        x = self.max_pool(x)
        x = self.conv2(x)
        x = self.bn2(x)
        x = self.relu(x)
        x = self.max_pool(x)
        
        x = self.flatten(x)
        x = self.fc1(x)
        x = self.relu(x)
        x = self.dropout(x)
        x = self.fc2(x)
        x = self.relu(x)
        x = self.dropout(x)
        x = self.fc3(x)
        return x



#Define evaluate and train functions:

The train function trains a neural network model by iterating over the data batches, performing forward and backward passes, updating the model parameters, and calculating the running loss and accuracy. It returns the average loss, accuracy, and a list of individual losses for each batch.

In [None]:
def train(net, dataloader, criterion, optimizer, device):
    net.train()
    correct = 0
    running_loss = 0.0
    size = len(dataloader.dataset)
    num_batches = len(dataloader)
    loss_list=[]
    for i, data in enumerate(dataloader, 0):
        inputs, labels = data
        inputs, labels = inputs.to(device), labels.to(device)

        optimizer.zero_grad()

        outputs = net(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()

        _, predicted = torch.max(outputs.data, 1)
        correct += (predicted == labels).sum().item()

        running_loss += loss.item()
        loss_list.append(loss.item())

        if i % 100 == 0:
            loss, current = loss.item(), i * len(inputs)
            print(f"loss: {loss:>7f}  [{current:>5d}/{size:>5d}]")

    accuracy = 100 * correct / size
    return running_loss / num_batches,accuracy,loss_list


The evaluate function evaluates the trained model on a separate dataset by iterating over the data batches, calculating predictions, and calculating the running loss and accuracy. It returns the average loss and accuracy.

In [None]:
def evaluate(net, dataloader, criterion, device):
    net.eval()
    correct = 0
    running_loss = 0.0
    size = len(dataloader.dataset)
    num_batches = len(dataloader)

    with torch.no_grad():
        for data in dataloader:
            images, labels = data
            images, labels = images.to(device), labels.to(device)
            outputs = net(images)

            _, predicted = torch.max(outputs.data, 1)
            correct += (predicted == labels).sum().item()

            loss = criterion(outputs, labels)
            running_loss += loss.item()

    accuracy = 100 * correct / size
    return running_loss / num_batches, accuracy


The train_test_loop function combines the training and evaluation process in a loop for a specified number of epochs. It prints the training accuracy and loss for each epoch and evaluates the model on a test dataset. It returns a list of individual losses for each batch during training.

In [None]:
import time

def train_test_loop(net,epochs,lr,criterion,optimizer):
    device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
    net.to(device)

    start_time = time.time()
         
    for epoch in range(epochs):
        print("Epoch: ", epoch+1)
        print("--------------") 
        train_loss,train_accuracy,loss_list = train(net, trainloader, criterion, optimizer, device)
        print(f"Train Error: \n Accuracy: {(train_accuracy):>0.2f}%, Avg loss: {train_loss:>8f} \n")
    
    end_time = time.time()
    elapsed_time = end_time - start_time

    val_loss, test_accuracy = evaluate(net, testloader, criterion, device)
    print(f"Test Accuracy: {(test_accuracy):>0.2f}%, Avg Test loss: {val_loss:>8f}, Training Time: {elapsed_time/60:.2f} minutes \n")
    return loss_list


#Run training and evaluation:

These code snippets train multiple neural network models (Net1, Net2, Net3, and Bonus) using the same hyperparameters (20 epochs, learning rate of 0.01) and optimization settings. Each network is created, trained using the train_test_loop function, and the loss values are stored. The training process is printed for each network, and a "Finished Training" message is displayed after each training session.

In [None]:
epochs = 20
lr= 0.01
criterion = nn.CrossEntropyLoss()


In [None]:
net1=Net1()
optimizer = optim.Adam(net1.parameters(), lr=lr)

print(f"Training Network 1")
print("--------------") 
train_losses1=train_test_loop(net1,epochs,lr,criterion,optimizer)
print(f"Finished Training Network 1\n")


Training Network 1
--------------
Epoch:  1
--------------
loss: 2.345009  [    0/60000]
loss: 0.239233  [ 6400/60000]
loss: 0.202744  [12800/60000]
loss: 0.011976  [19200/60000]
loss: 0.136911  [25600/60000]
loss: 0.223936  [32000/60000]
loss: 0.178793  [38400/60000]
loss: 0.028444  [44800/60000]
loss: 0.034234  [51200/60000]
loss: 0.053359  [57600/60000]
Train Error: 
 Accuracy: 94.84%, Avg loss: 0.225845 

Epoch:  2
--------------
loss: 0.135367  [    0/60000]
loss: 0.008215  [ 6400/60000]
loss: 0.018929  [12800/60000]
loss: 0.004535  [19200/60000]
loss: 0.063151  [25600/60000]
loss: 0.013442  [32000/60000]
loss: 0.185921  [38400/60000]
loss: 0.047552  [44800/60000]
loss: 0.030402  [51200/60000]
loss: 0.097579  [57600/60000]
Train Error: 
 Accuracy: 97.73%, Avg loss: 0.077627 

Epoch:  3
--------------
loss: 0.032922  [    0/60000]
loss: 0.009159  [ 6400/60000]
loss: 0.011327  [12800/60000]
loss: 0.018186  [19200/60000]
loss: 0.048579  [25600/60000]
loss: 0.140997  [32000/60000]
los

In [None]:
net2=Net2()
optimizer = optim.Adam(net2.parameters(), lr=lr)

print(f"Training Network 2")
print("--------------") 
train_losses2=train_test_loop(net2,epochs,lr,criterion,optimizer)
print(f"Finished Training Network 2\n")


Training Network 2
--------------
Epoch:  1
--------------
loss: 2.303062  [    0/60000]
loss: 0.257647  [ 6400/60000]
loss: 0.176573  [12800/60000]
loss: 0.156160  [19200/60000]
loss: 0.033404  [25600/60000]
loss: 0.196200  [32000/60000]
loss: 0.057554  [38400/60000]
loss: 0.204686  [44800/60000]
loss: 0.073617  [51200/60000]
loss: 0.072436  [57600/60000]
Train Error: 
 Accuracy: 95.20%, Avg loss: 0.160587 

Epoch:  2
--------------
loss: 0.101352  [    0/60000]
loss: 0.019056  [ 6400/60000]
loss: 0.042979  [12800/60000]
loss: 0.064706  [19200/60000]
loss: 0.149065  [25600/60000]
loss: 0.045363  [32000/60000]
loss: 0.177812  [38400/60000]
loss: 0.081487  [44800/60000]
loss: 0.104640  [51200/60000]
loss: 0.030801  [57600/60000]
Train Error: 
 Accuracy: 97.50%, Avg loss: 0.088685 

Epoch:  3
--------------
loss: 0.105157  [    0/60000]
loss: 0.012798  [ 6400/60000]
loss: 0.021179  [12800/60000]
loss: 0.007718  [19200/60000]
loss: 0.026105  [25600/60000]
loss: 0.128694  [32000/60000]
los

In [None]:
net3=Net3()
optimizer = optim.Adam(net3.parameters(), lr=lr)

print(f"Training Network 3")
print("--------------") 
train_losses3=train_test_loop(net3,epochs,lr,criterion,optimizer)
print(f"Finished Training Network 3\n")


Training Network 3
--------------
Epoch:  1
--------------
loss: 2.335370  [    0/60000]
loss: 0.252432  [ 6400/60000]
loss: 0.228032  [12800/60000]
loss: 0.022070  [19200/60000]
loss: 0.202308  [25600/60000]
loss: 0.015645  [32000/60000]
loss: 0.131420  [38400/60000]
loss: 0.034878  [44800/60000]
loss: 0.010058  [51200/60000]
loss: 0.028678  [57600/60000]
Train Error: 
 Accuracy: 93.97%, Avg loss: 0.185372 

Epoch:  2
--------------
loss: 0.091281  [    0/60000]
loss: 0.065126  [ 6400/60000]
loss: 0.053839  [12800/60000]
loss: 0.136430  [19200/60000]
loss: 0.172828  [25600/60000]
loss: 0.137644  [32000/60000]
loss: 0.037939  [38400/60000]
loss: 0.009870  [44800/60000]
loss: 0.006453  [51200/60000]
loss: 0.075747  [57600/60000]
Train Error: 
 Accuracy: 98.02%, Avg loss: 0.068314 

Epoch:  3
--------------
loss: 0.036647  [    0/60000]
loss: 0.019313  [ 6400/60000]
loss: 0.087830  [12800/60000]
loss: 0.162716  [19200/60000]
loss: 0.008370  [25600/60000]
loss: 0.023044  [32000/60000]
los

In [None]:
bonus=Bonus()
optimizer = optim.Adam(bonus.parameters(), lr=lr)

print(f"Bonus Network")
print("--------------") 
bonus_losses=train_test_loop(bonus,epochs,lr,criterion,optimizer)
print(f"Finished Training Bonus Network\n")


Bonus Network
--------------
Epoch:  1
--------------
loss: 2.334696  [    0/60000]
loss: 0.494700  [ 6400/60000]
loss: 0.399079  [12800/60000]
loss: 0.500582  [19200/60000]
loss: 0.176117  [25600/60000]
loss: 0.118454  [32000/60000]
loss: 0.227607  [38400/60000]
loss: 0.101551  [44800/60000]
loss: 0.172351  [51200/60000]
loss: 0.485190  [57600/60000]
Train Error: 
 Accuracy: 91.38%, Avg loss: 0.290357 

Epoch:  2
--------------
loss: 0.142917  [    0/60000]
loss: 0.111168  [ 6400/60000]
loss: 0.149090  [12800/60000]
loss: 0.036165  [19200/60000]
loss: 0.067338  [25600/60000]
loss: 0.020781  [32000/60000]
loss: 0.187166  [38400/60000]
loss: 0.074264  [44800/60000]
loss: 0.228162  [51200/60000]
loss: 0.117457  [57600/60000]
Train Error: 
 Accuracy: 96.94%, Avg loss: 0.113426 

Epoch:  3
--------------
loss: 0.020276  [    0/60000]
loss: 0.110360  [ 6400/60000]
loss: 0.001852  [12800/60000]
loss: 0.091792  [19200/60000]
loss: 0.139854  [25600/60000]
loss: 0.078840  [32000/60000]
loss: 0.

#Summary of Networks:

These code use PyTorch summary to analyze and summarize the architecture of neural network models. It provides information about the number of parameters and output shapes for each layer. This helps in understanding the complexity and structure of the models.

In [None]:
#@title Network #1

summary(net1,input_size=(batch_size,1,28,28),device=device,col_names=['input_size', 'output_size','num_params'])

  action_fn=lambda data: sys.getsizeof(data.storage()),
  return super().__sizeof__() + self.nbytes()


Layer (type:depth-idx)                   Input Shape               Output Shape              Param #
Net1                                     [64, 1, 28, 28]           [64, 10]                  --
├─Conv2d: 1-1                            [64, 1, 28, 28]           [64, 16, 28, 28]          416
├─ReLU: 1-2                              [64, 16, 28, 28]          [64, 16, 28, 28]          --
├─Flatten: 1-3                           [64, 16, 28, 28]          [64, 12544]               --
├─Linear: 1-4                            [64, 12544]               [64, 10]                  125,450
Total params: 125,866
Trainable params: 125,866
Non-trainable params: 0
Total mult-adds (M): 28.90
Input size (MB): 0.20
Forward/backward pass size (MB): 6.43
Params size (MB): 0.50
Estimated Total Size (MB): 7.13

In [None]:
#@title Network #2
summary(net2,input_size=(batch_size,1,28,28),device=device,col_names=['input_size', 'output_size','num_params'])

Layer (type:depth-idx)                   Input Shape               Output Shape              Param #
Net2                                     [64, 1, 28, 28]           [64, 10]                  --
├─Conv2d: 1-1                            [64, 1, 28, 28]           [64, 6, 28, 28]           156
├─ReLU: 1-2                              [64, 6, 28, 28]           [64, 6, 28, 28]           --
├─Conv2d: 1-3                            [64, 6, 28, 28]           [64, 16, 28, 28]          2,416
├─ReLU: 1-4                              [64, 16, 28, 28]          [64, 16, 28, 28]          --
├─Flatten: 1-5                           [64, 16, 28, 28]          [64, 12544]               --
├─Linear: 1-6                            [64, 12544]               [64, 84]                  1,053,780
├─ReLU: 1-7                              [64, 84]                  [64, 84]                  --
├─Linear: 1-8                            [64, 84]                  [64, 10]                  850
Total params: 1,057,202

In [None]:
#@title Network #3
summary(net3,input_size=(batch_size,1,28,28),device=device,col_names=['input_size', 'output_size','num_params'])

Layer (type:depth-idx)                   Input Shape               Output Shape              Param #
Net3                                     [64, 1, 28, 28]           [64, 10]                  --
├─Conv2d: 1-1                            [64, 1, 28, 28]           [64, 6, 28, 28]           156
├─BatchNorm2d: 1-2                       [64, 6, 28, 28]           [64, 6, 28, 28]           12
├─ReLU: 1-3                              [64, 6, 28, 28]           [64, 6, 28, 28]           --
├─MaxPool2d: 1-4                         [64, 6, 28, 28]           [64, 6, 14, 14]           --
├─Conv2d: 1-5                            [64, 6, 14, 14]           [64, 16, 14, 14]          2,416
├─BatchNorm2d: 1-6                       [64, 16, 14, 14]          [64, 16, 14, 14]          32
├─ReLU: 1-7                              [64, 16, 14, 14]          [64, 16, 14, 14]          --
├─MaxPool2d: 1-8                         [64, 16, 14, 14]          [64, 16, 7, 7]            --
├─Flatten: 1-9                 

In [None]:
#@title Bonus Network
summary(bonus,input_size=(batch_size,1,28,28),device=device,col_names=['input_size', 'output_size','num_params'])

Layer (type:depth-idx)                   Input Shape               Output Shape              Param #
Bonus                                    [64, 1, 28, 28]           [64, 10]                  --
├─Conv2d: 1-1                            [64, 1, 28, 28]           [64, 6, 30, 30]           60
├─BatchNorm2d: 1-2                       [64, 6, 30, 30]           [64, 6, 30, 30]           12
├─ReLU: 1-3                              [64, 6, 30, 30]           [64, 6, 30, 30]           --
├─MaxPool2d: 1-4                         [64, 6, 30, 30]           [64, 6, 15, 15]           --
├─Conv2d: 1-5                            [64, 6, 15, 15]           [64, 16, 17, 17]          880
├─BatchNorm2d: 1-6                       [64, 16, 17, 17]          [64, 16, 17, 17]          32
├─ReLU: 1-7                              [64, 16, 17, 17]          [64, 16, 17, 17]          --
├─MaxPool2d: 1-8                         [64, 16, 17, 17]          [64, 16, 8, 8]            --
├─Flatten: 1-9                    

#Which of the models had the least amount of error for validation? How long it took to train each model?


Data From Test Run #1:

|Model|Accuracy (Testing data)|Average Test Loss|Time|
|-|-|-|-|
|Model #1|98.00%|0.458179|5.05 minutes|
|Model #2|97.00%|0.488106|5.29 minutes|
|Model #3|98.80%|0.064194|5.27 minutes|
|Bonus   |98.69%|0.065839|6.00 minutes|


Data From Test Run #2:

|Model|Accuracy (Testing data)|Average Test Loss|Time|
|-|-|-|-|
|Model #1|97.95%|0.442075|5.10 minutes|
|Model #2|97.31%|0.253610|5.25 minutes|
|Model #3|98.76%|0.067962|5.28 minutes|
|Bonus   |98.78%|0.063171|6.33 minutes|


Data From Test Run #3:

|Model|Accuracy (Testing data)|Average Test Loss|Time|
|-|-|-|-|
|Model #1|97.53%|0.517097|5.57  minutes|
|Model #2|97.85%|0.375264|5.76  minutes|
|Model #3|98.99%|0.048471|5.74  minutes|
|Bonus   |98.74%|0.053674|5.84 minutes|

Average From Test Runs:

|Model|Accuracy (Testing data)|Average Test Loss|Time|
|-|-|-|-|
|Model #1|97.83%|0.472450|5.24 minutes|
|Model #2|97.39%|0.372327|5.43 minutes|
|Model #3|98.85%|0.060209|5.43 minutes|
|Bonus   |98.74%|0.060895|6.06 minutes|


Each of the models—Model #1, Model #2, Model #3, and the Bonus model—was run through a series of tests to determine their accuracy and average test loss.

Accuracy, in this context, refers to the percentage of predictions that the model got correct. For example, if a model has an accuracy of 97.85%, that means it correctly predicted the output 97.85% of the time during testing.

On the other hand, the average test loss is a measure of how well the model performs on the test dataset. A lower loss score means the model made fewer errors on the test data, indicating better performance.

When comparing the first three models, the **Model#3** performed the best. It had the highest average accuracy across all test runs (98.85%) and the lowest average test loss (0.060209). 

But when comparing the four models,including the Bonus, we see that the competition of the best model is between the Model#3 and the Bonus. The Bonus had the second highest average accuracy across all test runs (98.74%) and the second lowest average test loss (0.060895). Suggesting that this two models are the most efficient and effective of the four models in this context.

In terms of training time, all models took approximately the same amount of time to train—5 minutes each, with the outlier of the Bonus that took 6 minutes.  This means that despite their differences in performance, they all required the same time to train. This could be due to similar amount of training data used.