<a href="https://colab.research.google.com/github/Manjunayak007-Ai/AI-Training-Project/blob/main/Assignment_2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Task
Read the file "/content/assignment-2.txt" to understand the requirements for creating PyTorch DataLoaders for image classification, including defining transformations, creating a custom dataset, and instantiating DataLoaders for training, validation, and testing.

In [23]:
# Define the file path
file_path = "/content/assignment-2.txt"

# Read the entire content of the file
with open(file_path, 'r') as f:
    file_content = f.read()

# Print the file content
print(file_content)

Assignment 2 : Image Classification with Pretrained CNNs

	1- Introduction
	Convolutional Neural Networks (CNNs) have achieved remarkable performance on large-scale image
	classification tasks such as ImageNet. In this assignment, you will work on a 10-class classification problem using subsets of ImageNet. You are required to fine-tune pretrained models on this
	dataset, modify the classification head, and evaluate the performance of your models. The goal
	is to gain hands-on experience with transfer learning, model fine-tuning, and evaluation of deep
	learning models.
	2- Dataset Selection and Preparation
	The datasets for this assignment will be provided in a shared drive. Each student will be assigned
	a specific dataset number from the provided list (e.g., dataset1, dataset2, . . . , dataset10). You
	must work only with the dataset assigned to you.
	Each dataset folder contains three subfolders:
	• train/ – Training images organized into class-wise folders.
	• val/ – Validation im

## Define transformations

### Subtask:
Define the image transformations for training, validation, and testing using `torchvision.transforms`.


**Reasoning**:
Define the image transformations for training, validation, and testing as instructed.



In [24]:
import torchvision.transforms as transforms

# Define transformations for the training set with data augmentation
train_transforms = transforms.Compose([
    transforms.Resize(256),
    transforms.RandomCrop(224),
    transforms.RandomHorizontalFlip(),
    transforms.ColorJitter(brightness=0.2, contrast=0.2, saturation=0.2, hue=0.1), # Added ColorJitter
    transforms.RandomRotation(degrees=15), # Added RandomRotation
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
])

# Define transformations for the validation and test sets without data augmentation
val_test_transforms = transforms.Compose([
    transforms.Resize(256),
    transforms.CenterCrop(224),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
])

# For clarity, assign val_test_transforms to both val_transforms and test_transforms
val_transforms = val_test_transforms
test_transforms = val_test_transforms

print("Train transformations defined with additional augmentation:")
print(train_transforms)
print("\nValidation and Test transformations defined:")
print(val_test_transforms)

Train transformations defined with additional augmentation:
Compose(
    Resize(size=256, interpolation=bilinear, max_size=None, antialias=True)
    RandomCrop(size=(224, 224), padding=None)
    RandomHorizontalFlip(p=0.5)
    ColorJitter(brightness=(0.8, 1.2), contrast=(0.8, 1.2), saturation=(0.8, 1.2), hue=(-0.1, 0.1))
    RandomRotation(degrees=[-15.0, 15.0], interpolation=nearest, expand=False, fill=0)
    ToTensor()
    Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
)

Validation and Test transformations defined:
Compose(
    Resize(size=256, interpolation=bilinear, max_size=None, antialias=True)
    CenterCrop(size=(224, 224))
    ToTensor()
    Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
)


## Create custom dataset

### Subtask:
Create a custom PyTorch `Dataset` class to load images and labels from the dataset folders.


**Reasoning**:
Create a custom PyTorch Dataset class to load images and labels from the dataset folders.



In [25]:
from torch.utils.data import Dataset
import os
from PIL import Image

class CustomImageDataset(Dataset):
    def __init__(self, root_dir, transform=None):
        self.root_dir = root_dir
        self.transform = transform
        self.image_paths = []
        self.labels = []
        self.classes = sorted(os.listdir(root_dir))
        self.class_to_idx = {cls_name: i for i, cls_name in enumerate(self.classes)}

        for class_name in self.classes:
            class_dir = os.path.join(root_dir, class_name)
            if os.path.isdir(class_dir):
                for img_name in os.listdir(class_dir):
                    img_path = os.path.join(class_dir, img_name)
                    if os.path.isfile(img_path):
                        self.image_paths.append(img_path)
                        self.labels.append(self.class_to_idx[class_name])

    def __len__(self):
        return len(self.image_paths)

    def __getitem__(self, idx):
        img_path = self.image_paths[idx]
        label = self.labels[idx]
        image = Image.open(img_path).convert("RGB")

        if self.transform:
            image = self.transform(image)

        return image, label

print("CustomImageDataset class defined successfully.")

CustomImageDataset class defined successfully.


## Instantiate datasets

### Subtask:
Create instances of the custom dataset for the training, validation, and test sets with the defined transformations.


**Reasoning**:
Create instances of the custom dataset for the training, validation, and test sets using the defined transformations and print the size of each dataset.



In [26]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [27]:
# Define the base directory for the dataset
# Replace "path/to/your/dataset" with the actual path to your dataset folder
# Example: base_dir = "/content/my_image_dataset"
base_dir = "/content/drive/My Drive/dataset4" # <<< !!! IMPORTANT: Replace this with your actual dataset path

# Create dataset instances for training, validation, and testing
train_dataset = CustomImageDataset(root_dir=os.path.join(base_dir, 'train'), transform=train_transforms)
val_dataset = CustomImageDataset(root_dir=os.path.join(base_dir, 'val'), transform=val_transforms)
#test_dataset = CustomImageDataset(root_dir=os.path.join(base_dir, 'test'), transform=test_transforms)

# Print the number of samples in each dataset
print(f"Number of samples in training dataset: {len(train_dataset)}")
print(f"Number of samples in validation dataset: {len(val_dataset)}")
#print(f"Number of samples in test dataset: {len(test_dataset)}")

Number of samples in training dataset: 11250
Number of samples in validation dataset: 450


## Create DataLoaders

### Subtask:
Create PyTorch `DataLoader` instances for the training, validation, and test sets.

**Reasoning**:
Create PyTorch `DataLoader` instances for the training, validation, and test sets using the instantiated datasets.

In [40]:
from torch.utils.data import DataLoader

# Define batch size (you can adjust this)
batch_size = 64

# Create DataLoaders for training, validation, and testing
train_dataloader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True,num_workers=2)
val_dataloader = DataLoader(val_dataset, batch_size=batch_size, shuffle=False,num_workers=2)
#test_dataloader = DataLoader(test_dataset, batch_size=batch_size, shuffle=False)

print(f"Train DataLoader created with batch size {batch_size}")
print(f"Validation DataLoader created with batch size {batch_size}")
#print(f"Test DataLoader created with batch size {batch_size}")

Train DataLoader created with batch size 64
Validation DataLoader created with batch size 64


## Verify DataLoaders

### Subtask:
Fetch a batch of data from the training DataLoader and display the shapes of the images and labels.

**Reasoning**:
Verify that the DataLoaders are working correctly by fetching a batch of data from the training DataLoader and displaying the shapes of the images and labels.

In [29]:
# Fetch a batch of data from the training DataLoader
images, labels = next(iter(train_dataloader))

# Print the shapes of the images and labels
print(f"Shape of images batch: {images.shape}")
print(f"Shape of labels batch: {labels.shape}")

# Display the first image in the batch (optional)
# import matplotlib.pyplot as plt
# import numpy as np
#
# # Helper function to show a batch of images
# def show_images(imgs):
#     grid = torchvision.utils.make_grid(imgs)
#     plt.imshow(grid.numpy().transpose((1, 2, 0)))
#     plt.title('Image Batch')
#     plt.axis('off')
#     plt.show()
#
# # Denormalize and show the first image
# # mean = np.array([0.485, 0.456, 0.406])
# # std = np.array([0.229, 0.224, 0.225])
# # img = images[0].numpy().transpose((1, 2, 0))
# # img = std * img + mean
# # img = np.clip(img, 0, 1)
# # plt.imshow(img)
# # plt.title(f"Label: {labels[0].item()}")
# # plt.axis('off')
# # plt.show()

Shape of images batch: torch.Size([64, 3, 224, 224])
Shape of labels batch: torch.Size([64])


# Task
Train and evaluate ResNet-18, ResNet-34, and VGG-19 models on the dataset located at "https://drive.google.com/drive/u/1/folders/1TnY_KitjtNLrC3qkjGM-v1CQEh4T9Jbg", following the requirements in "/content/assignment-2.txt". Comment out the test dataset part as it doesn't exist.

## Load pretrained models

### Subtask:
Load the specified pretrained models (ResNet-18, ResNet-34, VGG-19) from PyTorch.


**Reasoning**:
Load the specified pretrained models (ResNet-18, ResNet-34, VGG-19) from PyTorch.



In [41]:
import torchvision.models as models

# Load pretrained ResNet-18 model
resnet18_model = models.resnet18(pretrained=True)

# Load pretrained ResNet-34 model
resnet34_model = models.resnet34(pretrained=True)

# Load pretrained VGG-19 model
vgg19_model = models.vgg19(pretrained=True)

print("Pretrained ResNet-18, ResNet-34, and VGG-19 models loaded successfully.")



Pretrained ResNet-18, ResNet-34, and VGG-19 models loaded successfully.


## Modify Classification Head

### Subtask:
Replace the classification head of each model to adapt it to the 10-class dataset.

**Reasoning**:
Replace the classification head of each model to adapt it to the 10-class dataset.

In [42]:
import torch.nn as nn

num_classes = 10 # Based on the assignment description

# Modify ResNet-18 classification head and add Dropout
# Access the input features of the original fully connected layer
# Check if resnet18_model.fc is already a Sequential, if so, access the Linear layer within it
if isinstance(resnet18_model.fc, nn.Sequential):
    num_ftrs_resnet18 = resnet18_model.fc[1].in_features # Access the Linear layer after Dropout
else:
    num_ftrs_resnet18 = resnet18_model.fc.in_features # Access the original Linear layer

# Replace the fully connected layer with a Sequential block including Dropout
resnet18_model.fc = nn.Sequential(
    nn.Dropout(0.3), # Reduced Dropout rate to 0.3
    nn.Linear(num_ftrs_resnet18, num_classes)
)


# Modify ResNet-34 classification head
num_ftrs_resnet34 = resnet34_model.fc.in_features
resnet34_model.fc = nn.Linear(num_ftrs_resnet34, num_classes)

# Modify VGG-19 classification head
# VGG's classifier is a sequence of linear layers
num_ftrs_vgg19 = vgg19_model.classifier[6].in_features
vgg19_model.classifier[6] = nn.Linear(num_ftrs_vgg19, num_classes)

print("Classification heads modified successfully, increased Dropout for ResNet-18.")

Classification heads modified successfully, increased Dropout for ResNet-18.


## Define Training Components

### Subtask:
Define the loss function, optimizer, and optionally a learning rate scheduler.

**Reasoning**:
Define the loss function, optimizer, and optionally a learning rate scheduler.

In [43]:
import torch.optim as optim
from torch.optim import lr_scheduler
import torch.nn as nn

# Define the loss function (Cross-Entropy Loss is common for classification)
criterion = nn.CrossEntropyLoss()

# Define optimizers for each model
# You can choose different optimizers or hyperparameters here
optimizer_resnet18 = optim.SGD(resnet18_model.parameters(), lr=0.001, momentum=0.9)
optimizer_resnet34 = optim.SGD(resnet34_model.parameters(), lr=0.001, momentum=0.9)
optimizer_vgg19 = optim.SGD(vgg19_model.parameters(), lr=0.001, momentum=0.9)

# Define learning rate schedulers (optional but recommended)
# Example: StepLR decays the learning rate by a factor of 0.1 every 7 epochs
scheduler_resnet18 = lr_scheduler.StepLR(optimizer_resnet18, step_size=3, gamma=0.1)
scheduler_resnet34 = lr_scheduler.StepLR(optimizer_resnet34, step_size=3, gamma=0.1)
scheduler_vgg19 = lr_scheduler.StepLR(optimizer_vgg19, step_size=3, gamma=0.1)

print("Training components (Loss function, Optimizers, and Schedulers) defined successfully.")

Training components (Loss function, Optimizers, and Schedulers) defined successfully.


## Implement Training and Validation Loops

### Subtask:
Write the code for training each model and validating its performance on the validation set.

**Reasoning**:
Write the code for a training and validation loop function that can be reused for all models.

In [44]:
import torch
import time
import copy

def train_model(model, criterion, optimizer, scheduler, train_loader, val_loader, num_epochs=25, device='cuda'):
    since = time.time()

    best_model_wts = copy.deepcopy(model.state_dict())
    best_acc = 0.0

    # To store history of loss and accuracy
    train_loss_history = []
    train_acc_history = []
    val_loss_history = []
    val_acc_history = []

    model.to(device)

    for epoch in range(num_epochs):
        print(f'Epoch {epoch}/{num_epochs - 1}')
        print('-' * 10)

        # Each epoch has a training and validation phase
        for phase in ['train', 'val']:
            if phase == 'train':
                model.train()  # Set model to training mode
                dataloader = train_loader
            else:
                model.eval()   # Set model to evaluate mode
                dataloader = val_loader

            running_loss = 0.0
            running_corrects = 0

            # Iterate over data.
            for inputs, labels in dataloader:
                inputs = inputs.to(device)
                labels = labels.to(device)

                # zero the parameter gradients
                optimizer.zero_grad()

                # forward
                # track history if only in train
                with torch.set_grad_enabled(phase == 'train'):
                    outputs = model(inputs)
                    _, preds = torch.max(outputs, 1)
                    loss = criterion(outputs, labels)

                    # backward + optimize only if in training phase
                    if phase == 'train':
                        loss.backward()
                        optimizer.step()

                # statistics
                running_loss += loss.item() * inputs.size(0)
                running_corrects += torch.sum(preds == labels.data)

            if phase == 'train':
                scheduler.step()

            epoch_loss = running_loss / len(dataloader.dataset)
            epoch_acc = running_corrects.double() / len(dataloader.dataset)

            print(f'{phase} Loss: {epoch_loss:.4f} Acc: {epoch_acc:.4f}')

            # Store history
            if phase == 'train':
                train_loss_history.append(epoch_loss)
                train_acc_history.append(epoch_acc.item()) # Convert to float for plotting
            else:
                val_loss_history.append(epoch_loss)
                val_acc_history.append(epoch_acc.item()) # Convert to float for plotting


            # deep copy the model if it's the best accuracy
            if phase == 'val' and epoch_acc > best_acc:
                best_acc = epoch_acc
                best_model_wts = copy.deepcopy(model.state_dict())

        print()

    time_elapsed = time.time() - since
    print(f'Training complete in {time_elapsed // 60:.0f}m {time_elapsed % 60:.0f}s')
    print(f'Best val Acc: {best_acc:.4f}')

    # load best model weights
    model.load_state_dict(best_model_wts)

    return model, train_loss_history, train_acc_history, val_loss_history, val_acc_history

print("Training function 'train_model' modified to return history.")

Training function 'train_model' modified to return history.


## Implement Evaluation

### Subtask:
Write the code to evaluate the trained models on the test set (if available) and report the accuracy.

**Reasoning**:
Write the code for an evaluation function that can be reused for all models.

In [45]:
def evaluate_model(model, dataloader, device='cuda'):
    model.eval()  # Set model to evaluate mode
    running_corrects = 0
    total_samples = 0

    model.to(device)

    with torch.no_grad():
        for inputs, labels in dataloader:
            inputs = inputs.to(device)
            labels = labels.to(device)

            outputs = model(inputs)
            _, preds = torch.max(outputs, 1)

            running_corrects += torch.sum(preds == labels.data)
            total_samples += labels.size(0)

    accuracy = running_corrects.double() / total_samples

    print(f'Accuracy: {accuracy:.4f}')
    return accuracy

print("Evaluation function 'evaluate_model' defined.")

Evaluation function 'evaluate_model' defined.


## Train and Evaluate Models

### Subtask:
Train each of the modified pretrained models and evaluate their performance.

**Reasoning**:
Train each of the modified pretrained models and evaluate their performance.

In [15]:
# Set device
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Using device: {device}")

# Train ResNet-18
print("\nTraining ResNet-18:")
resnet18_model = train_model(resnet18_model, criterion, optimizer_resnet18, scheduler_resnet18, train_dataloader, val_dataloader, num_epochs=10, device=device) # Reduced epochs for demonstration

# Evaluate ResNet-18 on the validation set
print("\nEvaluating ResNet-18 on validation set:")
evaluate_model(resnet18_model, val_dataloader, device=device)

# Train ResNet-34
print("\nTraining ResNet-34:")
resnet34_model = train_model(resnet34_model, criterion, optimizer_resnet34, scheduler_resnet34, train_dataloader, val_dataloader, num_epochs=10, device=device) # Reduced epochs for demonstration

# Evaluate ResNet-34 on the validation set
print("\nEvaluating ResNet-34 on validation set:")
evaluate_model(resnet34_model, val_dataloader, device=device)

# Train VGG-19
print("\nTraining VGG-19:")
vgg19_model = train_model(vgg19_model, criterion, optimizer_vgg19, scheduler_vgg19, train_dataloader, val_dataloader, num_epochs=10, device=device) # Reduced epochs for demonstration

# Evaluate VGG-19 on the validation set
print("\nEvaluating VGG-19 on validation set:")
evaluate_model(vgg19_model, val_dataloader, device=device)

Using device: cuda

Training ResNet-18:
Epoch 0/9
----------


KeyboardInterrupt: 

## Adjust Hyperparameters and Retrain

### Subtask:
Adjust the learning rate and add weight decay to the optimizers and retrain the models to evaluate the impact on overfitting.

**Reasoning**:
Adjust the learning rate and add weight decay to the optimizers and retrain the models to evaluate the impact on overfitting.

In [35]:
import torch.optim as optim
from torch.optim import lr_scheduler
import torch.nn as nn

# Define the loss function (Cross-Entropy Loss is common for classification)
criterion = nn.CrossEntropyLoss()

# Define optimizers for each model
# You can choose different optimizers or hyperparameters here
# Adjusted learning rate and added weight decay to combat overfitting
optimizer_resnet18 = optim.Adam(resnet18_model.parameters(), lr=0.001, weight_decay=1e-3) # Switched to Adam optimizer
optimizer_resnet34 = optim.Adam(resnet34_model.parameters(), lr=0.001, weight_decay=1e-3) # Switched to Adam optimizer
optimizer_vgg19 = optim.Adam(vgg19_model.parameters(), lr=0.001, weight_decay=1e-3) # Reverted VGG-19 learning rate to 0.001


# Define learning rate schedulers (optional but recommended)
# Example: StepLR decays the learning rate by a factor of 0.1 every 7 epochs
scheduler_resnet18 = lr_scheduler.StepLR(optimizer_resnet18, step_size=7, gamma=0.1)
scheduler_resnet34 = lr_scheduler.StepLR(optimizer_resnet34, step_size=7, gamma=0.1)
scheduler_vgg19 = lr_scheduler.StepLR(optimizer_vgg19, step_size=7, gamma=0.1)

print("Training components (Loss function, Optimizers, and Schedulers) redefined with adjusted hyperparameters.")

Training components (Loss function, Optimizers, and Schedulers) redefined with adjusted hyperparameters.


## Save Trained Models

### Subtask:
Save the state dictionary of the trained ResNet-18, ResNet-34, and VGG-19 models to .pth files.

**Reasoning**:
Save the state dictionary of each trained model as required by the assignment for submission.

In [None]:
import torch
import os

# Define the directory to save the models
# You might want to change this path to a specific folder in your Google Drive
save_dir = "/content/drive/My Drive/trained_models"
os.makedirs(save_dir, exist_ok=True)

# Save the state dictionary of each model
torch.save(resnet18_model.state_dict(), os.path.join(save_dir, 'resnet18_model.pth'))
torch.save(resnet34_model.state_dict(), os.path.join(save_dir, 'resnet34_model.pth'))
torch.save(vgg19_model.state_dict(), os.path.join(save_dir, 'vgg19_model.pth'))

print(f"Trained models saved to {save_dir}")

Trained models saved to /content/drive/My Drive/trained_models


**Reasoning**:
Retrain each model with the adjusted hyperparameters to evaluate their impact on performance and overfitting.

## Recreated: Modify Training Loop for Mixup and Checkpointing

### Subtask:
Recreate the `train_model` function with Mixup integration and checkpoint saving.

**Reasoning**:
Recreate the `train_model` function to ensure the latest version with Mixup and checkpointing is available for use.

In [46]:
import torch
import time
import copy
import numpy as np
import os
import torch.amp as amp # Import amp from torch.amp

def train_model(model, criterion, optimizer, scheduler, train_loader, val_loader, num_epochs=25, device='cuda', use_mixup=True, mixup_alpha=1.0, save_dir=None):
    since = time.time()

    best_model_wts = copy.deepcopy(model.state_dict())
    best_acc = 0.0

    # To store history of loss and accuracy
    train_loss_history = []
    train_acc_history = []
    val_loss_history = []
    val_acc_history = []

    model.to(device)

    # Initialize GradScaler for mixed precision
    scaler = amp.GradScaler('cuda') # Use 'cuda' argument

    # Create save directory if it doesn't exist
    if save_dir and not os.path.exists(save_dir):
        os.makedirs(save_dir)

    for epoch in range(num_epochs):
        print(f'Epoch {epoch}/{num_epochs - 1}')
        print('-' * 10)

        # Each epoch has a training and validation phase
        for phase in ['train', 'val']:
            if phase == 'train':
                model.train()  # Set model to training mode
                dataloader = train_loader
            else:
                model.eval()   # Set model to evaluate mode
                dataloader = val_loader

            running_loss = 0.0
            running_corrects = 0
            total_samples = 0

            # Iterate over data.
            for inputs, labels in dataloader:
                inputs = inputs.to(device)
                labels = labels.to(device)

                # zero the parameter gradients
                optimizer.zero_grad()

                # forward
                # track history if only in train
                with torch.set_grad_enabled(phase == 'train'):
                    # Use autocast for mixed precision
                    with amp.autocast('cuda'): # Use 'cuda' argument
                        if phase == 'train' and use_mixup:
                            inputs, targets_a, targets_b, lam = mixup_data(inputs, labels, mixup_alpha, device)
                            outputs = model(inputs)
                            loss = mixup_criterion(criterion, outputs, targets_a, targets_b, lam)
                            _, preds = torch.max(outputs, 1)
                            corrects = (lam * preds.eq(targets_a.data).sum().item() + (1 - lam) * preds.eq(targets_b.data).sum().item())

                        else:
                            outputs = model(inputs)
                            _, preds = torch.max(outputs, 1)
                            loss = criterion(outputs, labels)
                            corrects = torch.sum(preds == labels.data)


                    # backward + optimize only if in training phase
                    if phase == 'train':
                        # Scale the loss and call backward()
                        scaler.scale(loss).backward()
                        # Unscale gradients and call optimizer.step()
                        scaler.step(optimizer)
                        # Update the scale for next iteration
                        scaler.update()


                # statistics
                running_loss += loss.item() * inputs.size(0)
                if phase == 'train' and use_mixup:
                     running_corrects += corrects
                else:
                     running_corrects += corrects.item()

                total_samples += inputs.size(0)


            if phase == 'train':
                scheduler.step()

            epoch_loss = running_loss / total_samples
            epoch_acc = running_corrects / total_samples


            print(f'{phase} Loss: {epoch_loss:.4f} Acc: {epoch_acc:.4f}')

            # Store history
            if phase == 'train':
                train_loss_history.append(epoch_loss)
                train_acc_history.append(epoch_acc)
            else:
                val_loss_history.append(epoch_loss)
                val_acc_history.append(epoch_acc)


            # deep copy the model if it's the best accuracy and save checkpoint
            if phase == 'val' and epoch_acc > best_acc:
                best_acc = epoch_acc
                best_model_wts = copy.deepcopy(model.state_dict())
                if save_dir:
                    model_name = model.__class__.__name__ # Get model class name
                    save_path = os.path.join(save_dir, f'{model_name}_best_val_acc.pth')
                    torch.save(model.state_dict(), save_path)
                    print(f"Saved best model checkpoint to {save_path}")


        print()

    time_elapsed = time.time() - since
    print(f'Training complete in {time_elapsed // 60:.0f}m {time_elapsed % 60:.0f}s')
    print(f'Best val Acc: {best_acc:.4f}')

    # load best model weights
    model.load_state_dict(best_model_wts)

    return model, train_loss_history, train_acc_history, val_loss_history, val_acc_history

print("Training function 'train_model' updated to use torch.amp.")

Training function 'train_model' updated to use torch.amp.


In [47]:
import numpy as np
import torch # Make sure torch is imported here too

def mixup_data(x, y, alpha=1.0, device='cuda'):
    '''Returns mixed inputs, pairs of targets, and lambda'''
    if alpha > 0:
        lam = np.random.beta(alpha, alpha)
    else:
        lam = 1

    batch_size = x.size()[0]
    index = torch.randperm(batch_size).to(device)

    mixed_x = lam * x + (1 - lam) * x[index, :]
    y_a, y_b = y, y[index]
    return mixed_x, y_a, y_b, lam

def mixup_criterion(criterion, pred, y_a, y_b, lam):
    return lam * criterion(pred, y_a) + (1 - lam) * criterion(pred, y_b)

## Retrain Models on Full Dataset with Enhanced Augmentation, Stronger Regularization, and Mixup (Fourth Pass)

### Subtask:
Retrain each model on the full training and validation datasets with enhanced per-image augmentation, current Adam settings (including increased weight decay), and Mixup data augmentation.

**Reasoning**:
Retrain each model on the full dataset with enhanced data augmentation and stronger regularization to further combat severe overfitting and evaluate the impact on validation accuracy.

In [None]:
import torch
import torch.optim as optim
from torch.optim import lr_scheduler
import torch.nn as nn

# Ensure models are on the correct device and reinitialize optimizers/schedulers
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Using device: {device}")

# Re-define optimizers and schedulers with Adam optimizer and current weight decay
# This is necessary because the previous training modified the model parameters
# and the optimizers need to be re-initialized with the new parameters
optimizer_resnet18 = optim.Adam(resnet18_model.parameters(), lr=0.0001, weight_decay=1e-4) # ResNet-18 with reduced LR and reduced WD to 1e-4
optimizer_resnet34 = optim.Adam(resnet34_model.parameters(), lr=0.001, weight_decay=1e-3) # ResNet-34 with original weight decay
optimizer_vgg19 = optim.Adam(vgg19_model.parameters(), lr=0.001, weight_decay=1e-3) # VGG-19 with original weight decay

scheduler_resnet18 = lr_scheduler.StepLR(optimizer_resnet18, step_size=3, gamma=0.1) # Adjusted step_size
scheduler_resnet34 = lr_scheduler.StepLR(optimizer_resnet34, step_size=3, gamma=0.1) # Adjusted step_size
scheduler_vgg19 = lr_scheduler.StepLR(optimizer_vgg19, step_size=3, gamma=0.1) # Adjusted step_size


print(f"Focusing on training ResNet-18 on full training dataset of size {len(train_dataset)} and validation dataset of size {len(val_dataset)} with enhanced augmentation and current regularization settings.")


# Train ResNet-18 with enhanced augmentation and current regularization
print("\nRetraining ResNet-18:")
resnet18_model, resnet18_train_loss, resnet18_train_acc, resnet18_val_loss, resnet18_val_acc = train_model(
    resnet18_model, criterion, optimizer_resnet18, scheduler_resnet18, train_dataloader, val_dataloader, num_epochs=5, device=device, use_mixup=True) # Reduced epochs to 5

# Evaluate Retrained ResNet-18 on the validation set
print("\nEvaluating Retrained ResNet-18 on validation set:")
evaluate_model(resnet18_model, val_dataloader, device=device)

# Train ResNet-34 with enhanced augmentation and current regularization
# print("\nRetraining ResNet-34:")
# resnet34_model, resnet34_train_loss, resnet34_train_acc, resnet34_val_loss, resnet34_val_acc = train_model(
# #     resnet34_model, criterion, optimizer_resnet34, scheduler_resnet34, train_dataloader, val_dataloader, num_epochs=5, device=device, use_mixup=True) # Using full dataloaders, enhanced augmentation (via train_transforms), and Mixup

# Evaluate Retrained ResNet-34 on the validation set
# print("\nEvaluating Retrained ResNet-34 on validation set:")
# evaluate_model(resnet34_model, val_dataloader, device=device)

# Train VGG-19 with enhanced augmentation and current regularization
# print("\nRetraining VGG-19:")
# vgg19_model, vgg19_train_loss, vgg19_train_acc, vgg19_val_loss, vgg19_val_acc = train_model(
#     vgg19_model, criterion, optimizer_vgg19, scheduler_vgg19, train_dataloader, val_dataloader, num_epochs=5, device=device, use_mixup=True) # Using full dataloaders, enhanced augmentation (via train_transforms), and Mixup

# Evaluate Retrained VGG-19 on the validation set
# print("\nEvaluating Retrained VGG-19 on validation set:")
# evaluate_model(vgg19_model, val_dataloader, device=device)

Using device: cuda
Focusing on training ResNet-18 on full training dataset of size 11250 and validation dataset of size 450 with enhanced augmentation and current regularization settings.

Retraining ResNet-18:
Epoch 0/4
----------
