
# Fine-tuning a Pre-trained Image Classification Model

In this project, you will work with a pre-trained image classification model and datasets such as MS COCO and Tiny ImageNet. The steps will guide you through loading the model, evaluating it, generating pseudo-labels, fine-tuning the model, and comparing results.

### Tasks
1. Load a pre-trained image classification model.
2. Evaluate its performance on MS COCO (You need to decide what will be the ground truth label!).
3. Use the pre-trained model to generate pseudo-labels for Tiny ImageNet.
4. Fine-tune the model using pseudo-labeled images.
5. Compare the model's performance before and after fine-tuning.

**Important**: At the end you should write a report of adequate size, which will probably mean at least half a page. In the report you should describe how you approached the task. You should describe:
- Encountered difficulties (due to the method, e.g. "not enough training samples to converge", not technical like "I could not install a package over pip")
- Steps taken to alleviate difficulties
- General description of what you did, explain how you understood the task and what you did to solve it in general language, no code.
- Potential limitations of your approach, what could be issues, how could this be hard on different data or with slightly different conditions
- If you have an idea how this could be extended in an interesting way, describe it.



## Step 1: Setup and Load Required Libraries

We will begin by setting up the necessary environment and importing the required libraries.


In [None]:
# Import required libraries
import torch
import torchvision
import torchvision.transforms as transforms
from torchvision.datasets import CocoDetection, ImageFolder
from torch.utils.data import DataLoader
from torchvision.models import resnet18 # You can use mobilenet small as alternative depending on your hardware. See https://pytorch.org/vision/main/models/mobilenetv3.html
import os

# Verify GPU availability
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f"Using device: {device}")


## Step 2: Load a Pre-trained Model

We will use ResNet-18, a lightweight pre-trained model, which is efficient and performs well for classification tasks. You can also use Mobilenet.


In [None]:
# Load the ResNet-18 model
model = resnet18(pretrained=True).to(device)
model.eval()


## Step 3: Load and Evaluate the Model on MS COCO

### Steps:
1. Use torchvision's `CocoDetection` to load a small subset of MS COCO.
2. Perform evaluation using a few images and calculate the accuracy. (You can also use pycocotools for everything.
3. Write code to handle image loading and transformations.

**Task:** Complete the data loader and evaluation function. Note. In MS COCO each image has potentially several labels. They will have different bounding box sizes. You need to decide how to handle the ground truth label. Either you accept any label as correct that belongs to an image or you count the amount of objects of a specific class or sizes of bounding boxes to determine the best ground truth label.


In [None]:

# Define data transformations
coco_transform = transforms.Compose([
    transforms.Resize((128, 128)), # Resize images for faster processing
    transforms.ToTensor()
])

coco_dataset = CocoDetection(
    root='path/to/coco/train2017',
    annFile='path/to/coco/annotations/instances_train2017.json',
    transform=coco_transform
)

# Create a DataLoader
coco_loader = DataLoader(coco_dataset, batch_size=16, shuffle=True, num_workers=2)

# Define an evaluation function (to be implemented by students)
def evaluate_model_on_coco(model, data_loader):
    pass

# accuracy = evaluate_model_on_coco(model, coco_loader)
# print(f"Model accuracy on MS COCO: {accuracy:.2f}%")



## Step 4: Generate Pseudo Labels for Tiny ImageNet

The goal here is to ignore the labels of tiny imagenet for now and instead create your own database using the model you loaded before. The labels of the images will be predicted by the model and you need to save the predictions in a way that you can access them later for training.

### Steps:
1. Load the Tiny ImageNet dataset using torchvision's `ImageFolder`.
2. Use the pre-trained model to assign pseudo-labels to each image.
3. Save the pseudo-labeled dataset.

**Task:** Complete the loop to generate pseudo-labels.

Note, for a good training you may want to add Normalization

In [None]:
# Define data transformations for Tiny ImageNet
tiny_imagenet_transform = transforms.Compose([
    transforms.Resize((128, 128)), # Resize images for faster processing
    transforms.ToTensor() 
])

# Load Tiny ImageNet dataset (modify the path to point to your dataset location)
tiny_imagenet_dataset = ImageFolder(root='path/to/tiny-imagenet', transform=tiny_imagenet_transform)

# Create a DataLoader
tiny_imagenet_loader = DataLoader(tiny_imagenet_dataset, batch_size=16, shuffle=False, num_workers=2)

# Generate pseudo-labels
pseudo_labels = []
for images, _ in tiny_imagenet_loader:
    images = images.to(device)
    # Use the model to generate pseudo-labels for images
    pass

# Save pseudo-labeled dataset


## Step 5: Fine-tune the Model

### Steps:
1. Train the pre-trained model on the pseudo-labeled Tiny ImageNet dataset.
2. Use a small number of epochs and reduced batch size for efficiency.

**Task:** Complete the training loop.


In [None]:
# Define training loop (students to complete)
def fine_tune_model(model, data_loader, optimizer, criterion, epochs=5):
    # Task for students: Implement the fine-tuning loop
    pass

# Example setup for optimizer and criterion
# optimizer = torch.optim.Adam(model.parameters(), lr=1e-4)
# criterion = torch.nn.CrossEntropyLoss()

# Fine-tune the model 
# fine_tune_model(model, pseudo_labeled_loader, optimizer, criterion)


## Step 6: Evaluate and Compare Performance

### Steps:
1. Evaluate the model's performance on the Tiny ImageNet test set before and after fine-tuning.
2. Plot and compare the results.

**Task:** Implement the evaluation and comparison.


In [1]:
# Evaluate the fine-tuned model
# final_accuracy = evaluate_model_on_tiny_imagenet(model, test_loader)
# print(f"Model accuracy on Tiny ImageNet after fine-tuning: {final_accuracy:.2f}%")

# Note 
At this point the performance of a pre-trained model may actually go down because it was trained for a long time on a large dataset like Imagenet. The solution here is to train the model on the large dataset AND the new pseudo-labeled dataset. However, that is not realistic given the time for this exercise. Therefore in order to see how and if the performance through pseudo-labeling improves here is one final exercise.

# Bonus Steps: 
1. Repeat the process but this time do not use a pre-trained model but train on a subset of MS COCO
2. Finetune on the subset and the pseudo-labeled images

Note: Because training on MS COCO would take too much time use a subset. In the following you find some code to help you but it is not in order. The point here is to do the pseudo-labeling of tiny imagenet images and then finetune on the subset of MS COCO and tiny imagenet

### This code can be used to create a subset of MS COCO. It needs to be heavily adjusted to work with your pipeline

In [None]:
from torchvision.datasets import CocoDetection
import random
from torch.utils.data import Dataset

class SubsetCocoDataset(Dataset):
    def __init__(self, coco_root, coco_ann_file, transform=None, subset_size=100):
        """
        Custom Dataset that loads a random subset of the MS COCO dataset.

        Args:
            coco_root (str): Path to the root directory of COCO images.
            coco_ann_file (str): Path to the annotation file.
            transform (callable, optional): A function/transform to apply to images.
            subset_size (int): Number of random samples to include in the subset.
        """
        self.coco = CocoDetection(root=coco_root, annFile=coco_ann_file, transform=transform)
        self.transform = transform
        self.subset_size = subset_size

        # Select a random subset of indices
        self.indices = random.sample(range(len(self.coco)), subset_size)
        self.subset = [self.coco[i] for i in self.indices]

    def __len__(self):
        return len(self.subset)

    def __getitem__(self, idx):
        return self.subset[idx]

# Example usage:
# transform = transforms.Compose([transforms.Resize((128, 128)), transforms.ToTensor()])
# coco_subset = SubsetCocoDataset(
#     coco_root='path/to/coco/images',
#     coco_ann_file='path/to/coco/annotations',
#     transform=transform,
#     subset_size=500  # Select 500 samples randomly
# )
# coco_loader = DataLoader(coco_subset, batch_size=16, shuffle=True, num_workers=2)

### The following code can be used to combine datasets but needs to be heavily adjusted

In [None]:
from torch.utils.data import Dataset

class CombinedDataset(Dataset):
    def __init__(self, datasets, transform=None):
        """
        Combines multiple datasets into one.

        Args:
            datasets (list): List of datasets to combine.
            transform (callable, optional): Transformations to apply to images.
        """
        self.datasets = datasets
        self.transform = transform

        # Store cumulative lengths for indexing
        self.cumulative_lengths = []
        total = 0
        for dataset in datasets:
            total += len(dataset)
            self.cumulative_lengths.append(total)

    def __len__(self):
        return sum(len(dataset) for dataset in self.datasets)

    def __getitem__(self, idx):
        # Determine which dataset the index falls into
        for i, cumulative_length in enumerate(self.cumulative_lengths):
            if idx < cumulative_length:
                # Adjust index to be relative to the dataset
                dataset_idx = idx if i == 0 else idx - self.cumulative_lengths[i - 1]
                image, label = self.datasets[i][dataset_idx]
                
                # Apply transforms if provided
                if self.transform:
                    image = self.transform(image)
                
                return image, label

        raise IndexError("Index out of bounds for CombinedDataset")

# Example usage:
# coco_dataset = SubsetCocoDataset(coco_root='path/to/coco/images', coco_ann_file='path/to/coco/annotations', subset_size=500)
# tiny_imagenet_dataset = ImageFolder(root='path/to/tiny-imagenet', transform=tiny_imagenet_transform)
# combined_dataset = CombinedDataset([coco_dataset, tiny_imagenet_dataset])

# combined_loader = DataLoader(combined_dataset, batch_size=16, shuffle=True, num_workers=2)