# Training the model.
This notebook will be used to train and evaluate the model. We will do the following steps:

1. Load and transform the data
2. Load the data in batches using a custom data generator (Dataloader)
3. Define train and test functions
4. Define the model architecture and train loop while visualizing the loss and accuracy of the model.
6. Evaluate the trained model on the test set
7. Save the trained model weights

## 1. Load and transform the data

In this step we will load the data and split it into training, validation and testing sets.

We will split the training set into 80% training and 20% validation sets and and use the test set to evaluate the model after training.


In [1]:
from torchvision.transforms import  Resize, CenterCrop, ToTensor, Compose
from torchvision.datasets import ImageFolder
from torch.utils.data import random_split
import torch
import os


# Data augmentation and normalization for training
trans = Compose([
    Resize(256),
    CenterCrop(224),
    ToTensor(),
])

# Load dataset from disk
data_dir = "../dataset"
train_dir = os.path.join(data_dir, "training_set")
test_dir = os.path.join(data_dir, "test_set")
training_set = ImageFolder(train_dir, transform=trans)
test_set = ImageFolder(test_dir, transform=trans)

# Split training dataset into training and validation sets
train_size = int(0.8 * len(training_set))  # 80% for training
val_size = len(training_set) - train_size  # 20% for validation
train_dataset, val_dataset = random_split(training_set, [train_size, val_size])



## 2. Load the data in batches using a custom data generator (Dataloader)

We create a custom data generator to load the data in batches. This is done to avoid loading the entire dataset into memory at once. We will use the data generator to load the training, validation and testing data and sending the batch to the GPU (if available) for training.

In [2]:
from torch.backends.mps import is_available as mps_is_available
from torch.utils.data import DataLoader
from torch.cuda import is_available
from tqdm import tqdm

device = (
    "cuda"
    if is_available()
    else "mps"
    if mps_is_available()
    else "cpu"
)
print(f"Using {device} device")

class CustomDataLoader(DataLoader):
    def __iter__(self):
        for batch in tqdm(super(CustomDataLoader, self).__iter__(), unit='batch', dynamic_ncols=True):
            yield [item.to(device) if isinstance(item, torch.Tensor) else item for item in batch]


# Create data loaders

params = {"batch_size": 64, "shuffle": True}
train_loader = CustomDataLoader(train_dataset, **params)
val_loader = CustomDataLoader(val_dataset, **params)
test_loader = CustomDataLoader(test_set, **params)

Using mps device


## 3.  Define train and test functions

We define the train and test functions to train and evaluate the model. The train function will be used to train the model on the training set and the test function will be used to evaluate the model on the validation and test sets.

In [7]:
from tqdm import tqdm
from torch import float as torch_float
from torch.utils.tensorboard import SummaryWriter

writer = SummaryWriter()

def train(dataloader, model, loss_fn, optimizer):
    size = len(dataloader.dataset)
    # Set the model to training mode - important for batch normalization and dropout layers
    # Unnecessary in this situation but added for best practices
    model.train()
    for batch, (X, y) in enumerate(dataloader):
        # Compute prediction and loss
        pred = model(X)
        loss = loss_fn(pred, y)

        # Backpropagation
        loss.backward()
        optimizer.step()
        optimizer.zero_grad()
        if batch % 100 == 0 and batch > 0:
            loss, current = loss.item(), (batch + 1) * len(X)
            writer.add_scalar("Loss/train", loss, current)
            print(f"loss: {loss:>7f}  [{current:>5d}/{size:>5d}]")


def test(dataloader, model, loss_fn):
    # Set the model to evaluation mode - important for batch normalization and dropout layers
    # Unnecessary in this situation but added for best practices
    model.eval()
    size = len(dataloader.dataset)
    num_batches = len(dataloader)
    test_loss, correct = 0, 0

    # Evaluating the model with torch.no_grad() ensures that no gradients are computed during test mode
    # also serves to reduce unnecessary gradient computations and memory usage for tensors with requires_grad=True
    with torch.no_grad():
        for X, y in dataloader:
            X, y = X.to(device), y.to(device)
            pred = model(X)
            test_loss += loss_fn(pred, y).item()
            correct += (pred.argmax(1) == y).type(torch_float).sum().item()

    test_loss /= num_batches
    correct /= size
    print(f"Test Error: \n Accuracy: {(100*correct):>0.1f}%, Avg loss: {test_loss:>8f}")

## 4. Define the model architecture and train the model

We define the model architecture and train the model on the training set. We will use the SGD optimizer and the Cross Entropy Loss function to train the model.

In [8]:
from torchvision.models import resnet18
from torch.nn import CrossEntropyLoss
from torch.optim import SGD


# Load pretrained ResNet-18 model
model = resnet18().to(device)

# Optimizers specified in the torch.optim package
optimizer = SGD(model.parameters(), lr=1e-3, momentum=0.9)

# Loss functions specified in the torch.nn package
loss_fn = CrossEntropyLoss()

# training loop
epochs = 5
for t in range(epochs):
    print(f"Epoch {t+1}:  ")
    train(train_loader, model, loss_fn, optimizer)
    test(val_loader, model, loss_fn)
    print("-------------------------------")
writer.flush()
writer.close()


print("Training complete! \n ------------------------------- \nComputing test error...")
test(test_loader, model, loss_fn)
print("Done!")


Epoch 1:  


100%|██████████| 67/67 [00:36<00:00,  1.83batch/s]
100%|██████████| 17/17 [00:05<00:00,  2.90batch/s]


Test Error: 
 Accuracy: 77.3%, Avg loss: 0.534390
-------------------------------
Epoch 2:  


100%|██████████| 67/67 [00:35<00:00,  1.88batch/s]
100%|██████████| 17/17 [00:05<00:00,  2.94batch/s]


Test Error: 
 Accuracy: 77.8%, Avg loss: 0.500273
-------------------------------
Epoch 3:  


100%|██████████| 67/67 [00:35<00:00,  1.89batch/s]
100%|██████████| 17/17 [00:05<00:00,  2.88batch/s]


Test Error: 
 Accuracy: 77.2%, Avg loss: 0.487948
-------------------------------
Epoch 4:  


100%|██████████| 67/67 [23:04<00:00, 20.66s/batch] 
100%|██████████| 17/17 [00:09<00:00,  1.85batch/s]


Test Error: 
 Accuracy: 77.3%, Avg loss: 0.479376
-------------------------------
Epoch 5:  


100%|██████████| 67/67 [00:33<00:00,  1.98batch/s]
100%|██████████| 17/17 [00:07<00:00,  2.17batch/s]


Test Error: 
 Accuracy: 75.8%, Avg loss: 0.491938
-------------------------------
Training complete! 
 ------------------------------- 
Computing test error...


100%|██████████| 32/32 [00:11<00:00,  2.70batch/s]

Test Error: 
 Accuracy: 60.3%, Avg loss: 0.656415
Done!



