## Problem Definition



We are addressing an **image classification problem** with four distinct categories: *budgie*, *rubber duck*, *canary*, and *duckling*. The dataset consists of both **labeled** and **unlabeled images**. While the labeled data offers ground truth for model training, a significant portion of the dataset remains unlabeled, adding complexity to the task.



The objective is to develop a model that can leverage both the labeled and unlabeled data to enhance performance. The challenge lies in effectively utilizing the unlabeled data to improve classification accuracy and robustness.



### Requirements:

- The model must be implemented using **`torch`** and **`torchvision`** only (no other deep learning libraries are allowed for the model architecture).

- The main class for the model must be named <font color='red'>**`Model`**</font>, and participants <font color='red'>**must not change this name**</font>.

- Do not change the init function inside the **`Model`** class.

- The size of your model should not exceed 70 MB.

- Instantiating your model must not require any parameters.


In [1]:
# from datasets import load_dataset

from torch.utils.data import Dataset,DataLoader
import matplotlib.pyplot as plt
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from sklearn.metrics import f1_score
from torchvision import models, transforms
import os
import sys
from huggingface_hub import snapshot_download
from PIL import Image
from typing import Tuple, List
import random
from torch.optim.lr_scheduler import ReduceLROnPlateau

In [2]:
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

print(device)

cuda:0


In [3]:
dataset_id = "RayanAi/Noisy_birds"

# Set the local directory where you want to store the dataset

local_dataset_dir = "./Noisy_birds"  # You can change this path to your desired location



# Create the directory if it doesn't exist

os.makedirs(local_dataset_dir, exist_ok=True)



# Suppress the output by redirecting it to os.devnull

with open(os.devnull, 'w') as fnull:

    # Save the original stdout

    original_stdout = sys.stdout

    try:

        # Redirect stdout to devnull to suppress output

        sys.stdout = fnull

        # Download the dataset and store it locally

        snapshot_download(repo_id=dataset_id, local_dir=local_dataset_dir, repo_type="dataset")

    finally:

        # Restore the original stdout

        sys.stdout = original_stdout



# Print message when download is complete

print("Dataset downloaded completely.")



# Calculate and print the total size of the downloaded files

total_size = 0

for dirpath, dirnames, filenames in os.walk(local_dataset_dir):

    for f in filenames:

        fp = os.path.join(dirpath, f)

        total_size += os.path.getsize(fp)



# Convert size to MB and print

print(f"Total size of downloaded files: {total_size / (1024 * 1024):.2f} MB")



# Get the absolute path of the dataset directory and print it

dataset_abs_path = os.path.abspath(local_dataset_dir)

print(f"Dataset has been saved at: [{dataset_abs_path}]")


Fetching 2 files:   0%|          | 0/2 [00:00<?, ?it/s]

Noisy_birds.zip:   0%|          | 0.00/7.98M [00:00<?, ?B/s]

.gitattributes: 0.00B [00:00, ?B/s]

Dataset downloaded completely.
Total size of downloaded files: 7.61 MB
Dataset has been saved at: [/kaggle/working/Noisy_birds]


In [4]:
!unzip -qo ./Noisy_birds/Noisy_birds.zip -d ./Noisy_birds/

## Dataset

In this part, the dataset is downloaded and needed agumentation functions are applied. You only need to define the necessary transform functions for augmentation. At the end you are provided with a train_loader, val_loader and a test_loader.

In [27]:
transform = transforms.Compose([

    transforms.ToTensor(),

    transforms.Normalize(mean=[0.485, 0.456, 0.406],

                         std=[0.229, 0.224, 0.225])

])



transform_test = transforms.Compose([

    transforms.ToTensor(),

    transforms.Normalize(mean=[0.485, 0.456, 0.406],

                         std=[0.229, 0.224, 0.225])

])

In [28]:
import os

import numpy as np

import random

import torch

import torch.nn.functional as F

from torch.utils.data import Dataset

#Define the split ratio

split_ratio = 0.6



#Dataset function called

class Birddataset(Dataset):

    def __init__(self, image_dir: str, allowed_classes: List, transform=None, dataset_type: str = None):

        """

        Args:

            image_dir (str): Directory path containing input images.

            mask_dir (str): Directory path containing corresponding segmentation masks.

            transform (callable): Optional transformation to be applied to both the image and the mask. . Use ToTensorV2()

            dataset_type (str, optional): Type of dataset, e.g., 'Train' or 'Test'. Defaults to 'Train'.

        """

        # Initialize paths and transformation

        self.allowed_classes=allowed_classes

        self.image_dir = image_dir

        self.dataset_type = dataset_type

        self.transform = transform

        self.classes = [item for item in os.listdir(self.image_dir) if os.path.isdir(os.path.join(self.image_dir, item))]

        self.samples=[]

        for class_name in self.classes:

                if class_name in allowed_classes:



                    self.images = os.listdir(os.path.join(self.image_dir, class_name))

                    for img in self.images:

                        self.samples.append([img,class_name])



        random.seed(87)

        random.shuffle(self.samples)



        # print(self.samples)



        if dataset_type == 'Train':

            self.images = self.samples[:int(len(self.samples)*split_ratio)]

        elif dataset_type == 'Test':

            self.images = self.samples[int(len(self.samples)*split_ratio):]

        else:

            self.images = self.samples



    def __len__(self) -> int:

        """

        Returns:

            int: The total number of image-mask pairs in the designated dataset split.

        """

        # Return the length of the dataset (number of images)

        return len(self.images)





    def __getitem__(self, index: int) -> tuple[torch.Tensor, torch.Tensor]:

        """

        Args:

            index (int): Index of the image-mask pair to retrieve.



        Returns:

            Tuple[torch.Tensor, torch.Tensor]: A tuple containing the image and its corresponding one-hot encoded mask.

                - image (torch.Tensor): Transformed image tensor.

                - onehot_mask (torch.Tensor): One-hot encoded mask tensor for segmentation.

        """

        # Load the image and mask

        image_path = os.path.join(self.image_dir,self.images[index][1],self.images[index][0])







        # Load image and mask as grayscale

        image = Image.open(image_path)

        if self.transform:

            transformed = self.transform(image)

        else:

            transformed = transform_test(image)



        class_id = self.allowed_classes.index(self.images[index][1])



        return transformed, class_id




In [29]:
train_dataset = Birddataset(

    image_dir="./Noisy_birds",

    allowed_classes=["budgie","canary","duckling","rubber duck"],

    transform=transform,



    dataset_type='Train',



)



val_dataset = Birddataset(

    image_dir= "./Noisy_birds",

    allowed_classes=["budgie","canary","duckling","rubber duck"],

    transform=transform_test,

    dataset_type='Test',



)



unlabeled_dataset = Birddataset(

    image_dir="./Noisy_birds",

    allowed_classes=["unlabeled"],



)

In [30]:
batch_size = 128

num_workers = 2 # Change if you have beefy CPU

train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True,num_workers=num_workers)

val_loader = DataLoader(val_dataset, batch_size=batch_size, shuffle=False,num_workers=num_workers)

unlabeled_loader = torch.utils.data.DataLoader(unlabeled_dataset, batch_size=batch_size, shuffle=False,num_workers=num_workers)

In [31]:
# Updated Transformations with Data Augmentation
transform = transforms.Compose([
    transforms.RandomHorizontalFlip(),
    transforms.RandomRotation(15),
    transforms.RandomResizedCrop(128, scale=(0.8, 1.0)),
    transforms.ColorJitter(brightness=0.2, contrast=0.2, saturation=0.2, hue=0.2),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
])


## CNN

Define a CNN model. It should be small enough to require less than 2GB of Vram (GPU Memmory) when using a batch size of 128.

In [None]:
import torch
import torch.nn as nn
import torch.optim as optim
from torch.optim.lr_scheduler import ReduceLROnPlateau
from torchvision import models, transforms
from torch.utils.data import Dataset, DataLoader
from sklearn.metrics import f1_score, roc_auc_score
import numpy as np
import os
from PIL import Image
from huggingface_hub import snapshot_download
import random
import matplotlib.pyplot as plt
from typing import Tuple, List

# Assuming train_loader, val_loader, device, etc., are defined as in your original notebook

# Function to create different models
def get_model(model_name: str) -> nn.Module:
    if model_name == 'densenet121':
        base_model = models.densenet121(pretrained=True)
        num_ftrs = base_model.classifier.in_features
        base_model.classifier = nn.Linear(num_ftrs, 4)
    elif model_name == 'resnet50':
        base_model = models.resnet50(pretrained=True)
        num_ftrs = base_model.fc.in_features
        base_model.fc = nn.Linear(num_ftrs, 4)
    elif model_name == 'efficientnet_b0':
        base_model = models.efficientnet_b0(pretrained=True)
        num_ftrs = base_model.classifier[1].in_features
        base_model.classifier[1] = nn.Linear(num_ftrs, 4)
    elif model_name == 'mobilenet_v3_large':
        base_model = models.mobilenet_v3_large(pretrained=True)
        num_ftrs = base_model.classifier[3].in_features
        base_model.classifier[3] = nn.Linear(num_ftrs, 4)
    elif model_name == 'resnet18':
        base_model = models.resnet18(pretrained=True)
        num_ftrs = base_model.fc.in_features
        base_model.fc = nn.Linear(num_ftrs, 4)
    else:
        raise ValueError(f"Unknown model name: {model_name}")
    return base_model

# Training function
def train_model(model, train_loader, val_loader, epochs=30, lr=0.001):
    criterion = nn.CrossEntropyLoss()
    optimizer = optim.Adam(model.parameters(), lr=lr)
    scheduler = ReduceLROnPlateau(optimizer, mode='min', factor=0.1, patience=5, verbose=True)
    
    best_val_loss = float('inf')
    best_model_state = None
    
    for epoch in range(epochs):
        model.train()
        running_loss = 0.0
        for inputs, labels in train_loader:
            inputs, labels = inputs.to(device), labels.to(device)
            optimizer.zero_grad()
            outputs = model(inputs)
            loss = criterion(outputs, labels)
            loss.backward()
            optimizer.step()
            running_loss += loss.item()
        
        val_loss = 0.0
        model.eval()
        with torch.no_grad():
            for inputs, labels in val_loader:
                inputs, labels = inputs.to(device), labels.to(device)
                outputs = model(inputs)
                loss = criterion(outputs, labels)
                val_loss += loss.item()
        
        val_loss /= len(val_loader)
        scheduler.step(val_loss)
        
        print(f"Epoch {epoch+1}/{epochs}, Training Loss: {running_loss/len(train_loader):.4f}, Validation Loss: {val_loss:.4f}")
        
        if val_loss < best_val_loss:
            best_val_loss = val_loss
            best_model_state = model.state_dict()
    
    model.load_state_dict(best_model_state)
    return model, best_val_loss

# Evaluation function with F1, Accuracy, and AUC
def evaluate_model(model, val_loader):
    model.eval()
    all_preds = []
    all_labels = []
    all_probs = []
    with torch.no_grad():
        for inputs, labels in val_loader:
            inputs = inputs.to(device)
            outputs = model(inputs)
            probs = torch.softmax(outputs, dim=1)  # Get probabilities for AUC
            _, preds = torch.max(outputs, 1)
            all_preds.extend(preds.cpu().numpy())
            all_labels.extend(labels.cpu().numpy())
            all_probs.extend(probs.cpu().numpy())
    
    # Calculate F1 score and accuracy
    f1 = f1_score(all_labels, all_preds, average='macro')
    accuracy = np.mean(np.array(all_preds) == np.array(all_labels))
    
    # Calculate AUC score (one-vs-rest for multi-class)
    try:
        auc = roc_auc_score(all_labels, all_probs, multi_class='ovr')
    except ValueError:
        auc = float('nan')  # In case AUC calculation fails due to class imbalance or other issues
    
    print(f"F1 Score: {f1:.4f}, Accuracy: {accuracy:.4f}, AUC: {auc:.4f}")
    print("Predictions:", all_preds)
    print("True Labels:", all_labels)
    return f1, accuracy, auc

# List of models to try
model_names = ['densenet121', 'resnet50', 'efficientnet_b0', 'mobilenet_v3_large', 'resnet18']

# Dictionary to store results
results = {}

# Train and evaluate each model
for model_name in model_names:
    print(f"\nTraining {model_name}...")
    model = get_model(model_name).to(device)
    trained_model, val_loss = train_model(model, train_loader, val_loader, epochs=30)
    f1, acc, auc = evaluate_model(trained_model, val_loader)
    results[model_name] = {'f1': f1, 'acc': acc, 'auc': auc, 'val_loss': val_loss}
    # Save each model
    torch.save(trained_model.state_dict(), f"{model_name}_model.pth")

# Choose the best model based on F1 score
best_model_name = max(results, key=lambda k: results[k]['f1'])
best_f1 = results[best_model_name]['f1']
print(f"\nBest model: {best_model_name} with F1 Score: {best_f1:.4f}, Accuracy: {results[best_model_name]['acc']:.4f}, AUC: {results[best_model_name]['auc']:.4f}")

# Define the final Model class with the best model
class Model(nn.Module):
    def __init__(self):
        super(Model, self).__init__()
        self.model = get_model(best_model_name)
        self.model.load_state_dict(torch.load(f"{best_model_name}_model.pth"))

    def forward(self, x):
        return self.model(x)

# Save the best model
torch.save(Model().cpu().state_dict(), "best_model.pth")


Training densenet121...




Epoch 1/30, Training Loss: 1.4647, Validation Loss: 1.4107
Epoch 2/30, Training Loss: 0.2483, Validation Loss: 1.3515
Epoch 3/30, Training Loss: 0.0540, Validation Loss: 1.3133
Epoch 4/30, Training Loss: 0.0200, Validation Loss: 1.2333
Epoch 5/30, Training Loss: 0.0103, Validation Loss: 1.1474
Epoch 6/30, Training Loss: 0.0060, Validation Loss: 1.0629
Epoch 7/30, Training Loss: 0.0039, Validation Loss: 0.9865
Epoch 8/30, Training Loss: 0.0027, Validation Loss: 0.9280
Epoch 9/30, Training Loss: 0.0019, Validation Loss: 0.8879
Epoch 10/30, Training Loss: 0.0014, Validation Loss: 0.8667
Epoch 11/30, Training Loss: 0.0011, Validation Loss: 0.8588
Epoch 12/30, Training Loss: 0.0009, Validation Loss: 0.8641
Epoch 13/30, Training Loss: 0.0007, Validation Loss: 0.8770
Epoch 14/30, Training Loss: 0.0006, Validation Loss: 0.8940
Epoch 15/30, Training Loss: 0.0005, Validation Loss: 0.9130
Epoch 16/30, Training Loss: 0.0005, Validation Loss: 0.9332
Epoch 17/30, Training Loss: 0.0004, Validation Lo



Epoch 1/30, Training Loss: 1.5784, Validation Loss: 2.7072
Epoch 2/30, Training Loss: 0.6803, Validation Loss: 1.7645
Epoch 3/30, Training Loss: 0.1112, Validation Loss: 2.1827
Epoch 4/30, Training Loss: 0.0097, Validation Loss: 3.2508
Epoch 5/30, Training Loss: 0.0025, Validation Loss: 4.7228
Epoch 6/30, Training Loss: 0.0012, Validation Loss: 6.0738
Epoch 7/30, Training Loss: 0.0009, Validation Loss: 7.1014
Epoch 8/30, Training Loss: 0.0007, Validation Loss: 7.6567
Epoch 9/30, Training Loss: 0.0005, Validation Loss: 6.7316
Epoch 10/30, Training Loss: 0.0005, Validation Loss: 5.8794
Epoch 11/30, Training Loss: 0.0005, Validation Loss: 5.1335
Epoch 12/30, Training Loss: 0.0004, Validation Loss: 4.4860
Epoch 13/30, Training Loss: 0.0004, Validation Loss: 3.9496
Epoch 14/30, Training Loss: 0.0004, Validation Loss: 3.5022
Epoch 15/30, Training Loss: 0.0003, Validation Loss: 3.1088
Epoch 16/30, Training Loss: 0.0003, Validation Loss: 2.7899
Epoch 17/30, Training Loss: 0.0003, Validation Lo