<a href="https://colab.research.google.com/github/Mechanics-Mechatronics-and-Robotics/CV-2025/blob/main/Assignment_01/UQ_CIFAR-10N_Ensembling.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#Uncertainty Quantification with CIFAR-10N and Ensembling
By *First name* *Second name*.

*Month, Day, 2025.*

## Problem Statement

Re-annotated versions of the CIFAR-10 and CIFAR-100 data which contains real-world human annotation errors. We show how these noise patterns deviate from the classically assumed ones and what the new challenges are. The website of CIFAR-N is available at [cifar-10-100n
](https://github.com/UCSC-REAL/cifar-10-100n/tree/main) project.

# Preparation of simulation models

## Import and Install Libraries

In [1]:
# !pip install pytorch-lightning clearml
# !pip install nbconvert

In [2]:
#Pytorch modules
import torch
from torch import nn
from torch.nn import functional as F
from torch.utils.data import DataLoader, random_split, TensorDataset
from torchvision.datasets import CIFAR10
from torchvision import datasets, transforms, models
#scipy
from scipy.stats import mode
#sklearn
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import confusion_matrix
from sklearn.metrics import accuracy_score
#Numpy
import numpy as np
#Pandas
import pandas as pd
#Lightning & logging
import pytorch_lightning as pl
from pytorch_lightning import Trainer
from pytorch_lightning.callbacks import ModelCheckpoint
#Data observation
import os
import sys
import pickle
import requests
from pathlib import Path
#Plotting
import matplotlib.pyplot as plt
import seaborn as sns
#Logging
from clearml import Task

  Referenced from: <FB2FD416-6C4D-3621-B677-61F07C02A3C5> /Users/damirnurtdinov/miniconda3/envs/py39/lib/python3.9/site-packages/torchvision/image.so
  warn(
  from .autonotebook import tqdm as notebook_tqdm


## Set the Models

### Simulation Settings

Check the current directory

In [3]:
os.getcwd() #returns the current working directory

'/Users/damirnurtdinov/Desktop/My Courses/Classical CV/assignments/Assignment_01'

In [4]:
# Path to the folder where the pretrained models are saved
CHECKPOINT_PATH = os.environ.get("PATH_CHECKPOINT", "saved_models/")
print(f'CHECKPOINT_PATH: {CHECKPOINT_PATH}')

os.makedirs(CHECKPOINT_PATH, exist_ok=True)

CHECKPOINT_PATH: saved_models/


Set the reproducibility options

In [5]:
# Function for setting the seed to implement parallel tests
SEEDS = [42, 0, 17, 9, 3, 16, 2]
SEED = 42 # random seed by default
pl.seed_everything(SEED)

# Determine the device (GPU if available, otherwise CPU)
# device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
device = torch.device('mps') # I have M1 chip

# Prioritizes speed but may reduce precision
torch.set_float32_matmul_precision('high')

# # Ensure that all operations are deterministic on GPU (if used) for reproducibility
torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = False
torch.use_deterministic_algorithms(True)

torch.manual_seed(SEED)
np.random.seed(SEED)

Seed set to 42


### Logging

To configure ClearML in your Colab environment, follow these steps:

---

*Step 1: Create a ClearML Account*
1. Go to the [ClearML website](https://clear.ml/).
2. Sign up for a free account if you don’t already have one.
3. Once registered, log in to your ClearML account.

---

*Step 2: Get Your ClearML Credentials*
1. After logging in, navigate to the **Settings** page (click on your profile icon in the top-right corner and select **Settings**).
2. Under the **Workspace** section, find your **+ Create new credentials**.
3. Copy these credentials for a Jupiter notebook into the code cell below.

---

*Step 3: Accessing the ClearML Dashboard*
1. Go to your ClearML dashboard (https://app.clear.ml).
2. Navigate to the **Projects** section to see your experiments.
3. Click on the experiment (e.g., `Lab_1`) to view detailed metrics, logs, and artifacts.

---

In [6]:
#Enter your code here to implement Step 2 of the logging instruction as it is shown below
%env CLEARML_WEB_HOST=https://app.clear.ml/
%env CLEARML_API_HOST=https://api.clear.ml
%env CLEARML_FILES_HOST=https://files.clear.ml
%env CLEARML_API_ACCESS_KEY=TE4F21KJIPICL6GUUCTSQ7K1OC3C48
%env CLEARML_API_SECRET_KEY=RqUK21J4VhHSpRyf5MTF6LBMnj6-C1V6pE-vfJFvZ7QFuGtz_MbojHgorRIEmrJHawA

env: CLEARML_WEB_HOST=https://app.clear.ml/
env: CLEARML_API_HOST=https://api.clear.ml
env: CLEARML_FILES_HOST=https://files.clear.ml
env: CLEARML_API_ACCESS_KEY=TE4F21KJIPICL6GUUCTSQ7K1OC3C48
env: CLEARML_API_SECRET_KEY=RqUK21J4VhHSpRyf5MTF6LBMnj6-C1V6pE-vfJFvZ7QFuGtz_MbojHgorRIEmrJHawA


### Dataset

Summary

In [7]:
DATASET = 'CIFAR10N' # dataset with the real-world noise
# Can be 'clean_label', 'worse_label', 'aggre_label', 'random_label1', 'random_label2', 'random_label3'
NOISE_TYPE = 'worse_label'

NS = {
    'train': 45000,
    'val': 5000,
    'test': 10000
} # for MNIST

SIZE = 32 #image size
NUM_CLASSES = 10
CLASS_NAMES = ['plane', 'car', 'bird', 'cat',
               'deer', 'dog', 'frog', 'horse', 'ship', 'truck']

Normalization parameters

In [8]:
#For the MNIST dataset
MEAN = np.array([0.491,0.482,0.447])
STD  = np.array([0.247,0.243,0.261])

Transforms

### Collect parameters

In [9]:
#Model parameters
LOSS_FUN = 'N' # 'CE','CELoss'(custom), 'N', 'B', etc.
ARCHITECTURE = 'CNN' # 'CNN, 'ResNet50', 'ViT', etc.

#Collect the parameters (hyperparams and others)
hparams = {
    "seed": SEED,
    "lr": 0.001,
    'weight_decay': 0.0,
    "dropout": 0.0,
    "bs": 128,
    "num_workers": 0, #set 2 in Colab, or 0 in InnoDataHub
    "num_epochs": 20,
    "criterion": LOSS_FUN,
    "architecture": ARCHITECTURE,
    "num_samples": NS,
    "im_size": SIZE,
    "mean": np.array([0.4914, 0.4822, 0.4465]),
    "std": np.array([0.2470, 0.2435, 0.2616]),
    'randResCrop': {'size': (SIZE, SIZE), 'scale': (0.8, 1.0), 'ratio': (0.9, 1.1)},
    "n_classes": NUM_CLASSES,
    "noise_path": './data/CIFAR-10_human.pt',
    "noise_type": NOISE_TYPE  # Can be 'clean_label', 'worse_label', 'aggre_label', etc.
}

#Visualization
vis_params = {
    'fig_size': 5,
    'num_samples': 5,
    'num_bins': 50,
}

## Functions

### Lightning

Data module

In [10]:
def download_file(url, save_path):
    """Download a file from a URL and save it to the specified path."""
    response = requests.get(url, stream=True)
    if response.status_code == 200:
        os.makedirs(os.path.dirname(save_path), exist_ok=True)  # Ensure directory exists
        with open(save_path, 'wb') as f:
            for chunk in response.iter_content(chunk_size=8192):
                f.write(chunk)
        print(f"File downloaded and saved to {save_path}")
    else:
        raise Exception(f"Failed to download file from {url}. Status code: {response.status_code}")

In [11]:
class CIFAR10(datasets.CIFAR10):
    """CIFAR10 dataset with noisy labels."""
    def __init__(self, root, train=True, transform=None, target_transform=None,
                 download=False, noise_type=None, noise_path=None, is_human=True):
        super().__init__(root, train=train, transform=transform,
                         target_transform=target_transform, download=download)
        self.noise_type = noise_type
        self.noise_path = noise_path
        self.is_human = is_human

        if self.train and self.noise_type is not None:
            self.load_noisy_labels()

    def load_noisy_labels(self):
        noise_file = torch.load(self.noise_path)
        if isinstance(noise_file, dict):
            if "clean_label" in noise_file.keys():
                clean_label = torch.tensor(noise_file['clean_label'])
                assert torch.sum(torch.tensor(self.targets) - clean_label) == 0
                print(f'Loaded {self.noise_type} from {self.noise_path}.')
                print(f'The overall noise rate is {1 - np.mean(clean_label.numpy() == noise_file[self.noise_type])}')
            self.noisy_labels = noise_file[self.noise_type].reshape(-1)
        else:
            raise Exception('Input Error')

    def __getitem__(self, index):
        img, target = super().__getitem__(index)
        if self.train and self.noise_type is not None:
            target = self.noisy_labels[index]
        return img, target, index

In [12]:
class CIFAR10DataModule(pl.LightningDataModule):
    def __init__(self, params):
        super().__init__()
        self.seed = params['seed']
        self.batch_size = params['bs']
        self.num_workers = params['num_workers']
        self.mean = params['mean']
        self.std = params['std']
        self.ns = params['num_samples']
        self.rand_res_crop = params['randResCrop']
        self.noise_path = params.get('noise_path', './data/CIFAR-10_human.pt')
        self.noise_type = params.get('noise_type', 'worse_label')  # Default to 'worse_label'

        # Ensure the data directory exists
        os.makedirs(os.path.dirname(self.noise_path), exist_ok=True)

        # Download the CIFAR-10_human.pt file if it doesn't exist
        if not os.path.exists(self.noise_path):
            print(f"Downloading CIFAR-10_human.pt from GitHub...")
            download_file(
                url="https://github.com/UCSC-REAL/cifar-10-100n/raw/main/data/CIFAR-10_human.pt",
                save_path=self.noise_path
            )

        self.transform = transforms.Compose([
            transforms.RandomResizedCrop(size=self.rand_res_crop['size'],
                                         scale=self.rand_res_crop['scale'],
                                         ratio=self.rand_res_crop['ratio']),
            transforms.ToTensor(),
            transforms.Normalize(self.mean, self.std)
        ])

    def prepare_data(self):
        # Download CIFAR-10 dataset
        datasets.CIFAR10(root='./data', train=True, download=True)
        datasets.CIFAR10(root='./data', train=False, download=True)

    def setup(self, stage=None):
        # Load noisy labels
        noise_file = torch.load(self.noise_path)
        clean_label = noise_file['clean_label']
        noisy_label = noise_file[self.noise_type]

        # Split dataset into train and validation sets
        cifar10_full = CIFAR10(root='./data', train=True, transform=self.transform,
                               noise_type=self.noise_type, noise_path=self.noise_path, is_human=True)
        pl.seed_everything(self.seed)
        self.cifar10_train, self.cifar10_val = random_split(cifar10_full,
                                                            [self.ns['train'],
                                                             self.ns['val']])
        self.cifar10_test = CIFAR10(root='./data', train=False, transform=self.transform)

    def train_dataloader(self):
        return DataLoader(self.cifar10_train, batch_size=self.batch_size,
                          num_workers=self.num_workers, shuffle=True)

    def val_dataloader(self):
        return DataLoader(self.cifar10_val, batch_size=self.batch_size,
                          num_workers=self.num_workers)

    def test_dataloader(self):
        return DataLoader(self.cifar10_test, batch_size=self.batch_size,
                          shuffle=False)

Training module

In [13]:
class train_model(pl.LightningModule):
    def __init__(self, model=None, loss=None, hparams=hparams):
        super().__init__()
        self.save_hyperparameters(hparams)
        self.model = model
        self.loss_fn = loss
        self.nc = hparams['n_classes']
        self.lr = hparams['lr']
        self.wd = hparams['weight_decay']

    def forward(self, x):
        return self.model(x)

    def training_step(self, batch, batch_idx):
        x, y, _ = batch  # Unpack batch (ignore indices for now)
        logits = self(x)
        loss = self.loss_fn(logits, y)

        # Log training loss and accuracy
        # preds = torch.argmax(logits[:, :self.nc], dim=1)
        # acc = (preds == y).float().mean()
        self.log('train_loss', loss, on_step=True, on_epoch=True, prog_bar=True)
        # self.log('train_acc', acc, on_step=True, on_epoch=True, prog_bar=True)
        return loss

    def validation_step(self, batch, batch_idx):
        x, y, _ = batch  # Unpack batch (ignore indices for now)
        logits = self(x)
        loss = self.loss_fn(logits, y)

        # Log validation loss and accuracy
        # preds = torch.argmax(logits[:, :self.nc], dim=1)
        # acc = (preds == y).float().mean()
        self.log('val_loss', loss, on_step=True, on_epoch=True, prog_bar=True)
        # self.log('val_acc', acc, on_step=True, on_epoch=True, prog_bar=True)
        return loss

    def test_step(self, batch, batch_idx):
        x, y, _ = batch  # Unpack batch (ignore indices for now)
        logits = self(x)
        loss = self.loss_fn(logits, y)

        # Log test loss and accuracy
        preds = torch.argmax(logits[:, :self.nc], dim=1)
        acc = (preds == y).float().mean()
        self.log('test_loss', loss, on_step=True, on_epoch=True, prog_bar=True)
        self.log('test_acc', acc, on_step=True, on_epoch=True, prog_bar=True)
        return {'loss': loss, 'preds': preds, 'y': y}

    def configure_optimizers(self):
        optimizer = torch.optim.Adam(self.parameters(), lr=self.lr, weight_decay=self.wd)

        # Optionally, add a learning rate scheduler
        scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=10, gamma=1.0)
        return [optimizer], [scheduler]

### Models

CNN from paper by [Xia](https://arxiv.org/abs/2106.00445)

In [14]:
# #Copy the code from the paper
# class CNN(nn.Module):
#     def __init__(self, input_size, n_outputs=10):
#         super(CNN, self).__init__()
        
#         # First block
#         self.conv1 = nn.Conv2d(3, 128, kernel_size=3, padding=1)
#         self.conv2 = nn.Conv2d(128, 128, kernel_size=3, padding=1)
#         self.conv3 = nn.Conv2d(128, 128, kernel_size=3, padding=1)
#         self.pool1 = nn.MaxPool2d(kernel_size=2, stride=2)
#         self.dropout1 = nn.Dropout(0.25)
        
#         # Second block
#         self.conv4 = nn.Conv2d(128, 256, kernel_size=3, padding=1)
#         self.conv5 = nn.Conv2d(256, 256, kernel_size=3, padding=1)
#         self.conv6 = nn.Conv2d(256, 256, kernel_size=3, padding=1)
#         self.pool2 = nn.MaxPool2d(kernel_size=2, stride=2)
#         self.dropout2 = nn.Dropout(0.25)
        
#         # Third block
#         self.conv7 = nn.Conv2d(256, 512, kernel_size=3, padding=1)
#         self.conv8 = nn.Conv2d(512, 256, kernel_size=3, padding=1)
#         self.conv9 = nn.Conv2d(256, 128, kernel_size=3, padding=1)
#         self.avgpool = nn.AdaptiveAvgPool2d((1, 1))
        
#         # Fully connected layer
#         self.fc = nn.Linear(128, n_outputs)
        
#     def forward(self, x):
#         # First block
#         x = F.leaky_relu(self.conv1(x))
#         x = F.leaky_relu(self.conv2(x))
#         x = F.leaky_relu(self.conv3(x))
#         x = self.pool1(x)
#         x = self.dropout1(x)
        
#         # Second block
#         x = F.leaky_relu(self.conv4(x))
#         x = F.leaky_relu(self.conv5(x))
#         x = F.leaky_relu(self.conv6(x))
#         x = self.pool2(x)
#         x = self.dropout2(x)
        
#         # Third block
#         x = F.leaky_relu(self.conv7(x))
#         x = F.leaky_relu(self.conv8(x))
#         x = F.leaky_relu(self.conv9(x))
#         x = self.avgpool(x)
#         x = x.view(x.size(0), -1)
        
#         # Fully connected layer
#         x = self.fc(x)
#         return x

def call_bn(bn, x):
    return bn(x)

class CNN(nn.Module):
    def __init__(self, input_channel=3, n_outputs=10, dropout_rate=0.25, top_bn=False):
        self.dropout_rate = dropout_rate
        self.top_bn = top_bn
        super(CNN, self).__init__()
        self.c1=nn.Conv2d(input_channel,128,kernel_size=3,stride=1, padding=1)
        self.c2=nn.Conv2d(128,128,kernel_size=3,stride=1, padding=1)
        self.c3=nn.Conv2d(128,128,kernel_size=3,stride=1, padding=1)
        self.c4=nn.Conv2d(128,256,kernel_size=3,stride=1, padding=1)
        self.c5=nn.Conv2d(256,256,kernel_size=3,stride=1, padding=1)
        self.c6=nn.Conv2d(256,256,kernel_size=3,stride=1, padding=1)
        self.c7=nn.Conv2d(256,512,kernel_size=3,stride=1, padding=0)
        self.c8=nn.Conv2d(512,256,kernel_size=3,stride=1, padding=0)
        self.c9=nn.Conv2d(256,128,kernel_size=3,stride=1, padding=0)
        self.l_c1=nn.Linear(128,n_outputs)
        self.bn1=nn.BatchNorm2d(128)
        self.bn2=nn.BatchNorm2d(128)
        self.bn3=nn.BatchNorm2d(128)
        self.bn4=nn.BatchNorm2d(256)
        self.bn5=nn.BatchNorm2d(256)
        self.bn6=nn.BatchNorm2d(256)
        self.bn7=nn.BatchNorm2d(512)
        self.bn8=nn.BatchNorm2d(256)
        self.bn9=nn.BatchNorm2d(128)

    def forward(self, x,):
        h=x
        h=self.c1(h)
        h=F.leaky_relu(call_bn(self.bn1, h), negative_slope=0.01)
        h=self.c2(h)
        h=F.leaky_relu(call_bn(self.bn2, h), negative_slope=0.01)
        h=self.c3(h)
        h=F.leaky_relu(call_bn(self.bn3, h), negative_slope=0.01)
        h=F.max_pool2d(h, kernel_size=2, stride=2)
        h=F.dropout2d(h, p=self.dropout_rate)

        h=self.c4(h)
        h=F.leaky_relu(call_bn(self.bn4, h), negative_slope=0.01)
        h=self.c5(h)
        h=F.leaky_relu(call_bn(self.bn5, h), negative_slope=0.01)
        h=self.c6(h)
        h=F.leaky_relu(call_bn(self.bn6, h), negative_slope=0.01)
        h=F.max_pool2d(h, kernel_size=2, stride=2)
        h=F.dropout2d(h, p=self.dropout_rate)

        h=self.c7(h)
        h=F.leaky_relu(call_bn(self.bn7, h), negative_slope=0.01)
        h=self.c8(h)
        h=F.leaky_relu(call_bn(self.bn8, h), negative_slope=0.01)
        h=self.c9(h)
        h=F.leaky_relu(call_bn(self.bn9, h), negative_slope=0.01)
        h=F.avg_pool2d(h, kernel_size=h.data.shape[2])

        h = h.view(h.size(0), h.size(1))
        logit=self.l_c1(h)
        if self.top_bn:
            logit=call_bn(self.bn_c1, logit)
        return logit

ResNet50

In [15]:
class ResNet50(nn.Module):
    def __init__(self, n_outputs, freeze=False):
        """
        Args:
            n_outputs (int): Number of output classes.
            freeze (bool): If True, freeze all layers except the head.
        """
        super(ResNet50, self).__init__()
        self.n_outputs = n_outputs
        self.freeze = freeze

        # Load the pre-trained ResNet50 model
        self.resnet50 = models.resnet50(pretrained=True)

        # Modify the final layer to match the number of outputs
        self.resnet50.fc = nn.Linear(self.resnet50.fc.in_features, n_outputs)

        # Freeze all layers except the head if freeze=True
        if self.freeze:
            self._freeze_layers()

    def _freeze_layers(self):
        """
        Freeze all layers except the head.
        """
        # Freeze all parameters in the model
        for param in self.resnet50.parameters():
            param.requires_grad = False

        # Unfreeze the final classification layer (head)
        for param in self.resnet50.fc.parameters():
            param.requires_grad = True

    def forward(self, x):
        return self.resnet50(x)

def count_trainable_parameters(model):
    return sum(p.numel() for p in model.parameters() if p.requires_grad)

In [16]:
# class Bottleneck(nn.Module):
#     expansion = 4

#     def __init__(self, in_channels, out_channels, stride=1, downsample=None):
#         super(Bottleneck, self).__init__()
#         self.conv1 = nn.Conv2d(in_channels, out_channels, kernel_size=1, bias=False)
#         self.bn1 = nn.BatchNorm2d(out_channels)
#         self.conv2 = nn.Conv2d(out_channels, out_channels, kernel_size=3, stride=stride, padding=1, bias=False)
#         self.bn2 = nn.BatchNorm2d(out_channels)
#         self.conv3 = nn.Conv2d(out_channels, out_channels * self.expansion, kernel_size=1, bias=False)
#         self.bn3 = nn.BatchNorm2d(out_channels * self.expansion)
#         self.relu = nn.ReLU(inplace=True)
#         self.downsample = downsample
#         self.stride = stride

#     def forward(self, x):
#         identity = x

#         out = self.conv1(x)
#         out = self.bn1(out)
#         out = self.relu(out)

#         out = self.conv2(out)
#         out = self.bn2(out)
#         out = self.relu(out)

#         out = self.conv3(out)
#         out = self.bn3(out)

#         if self.downsample is not None:
#             identity = self.downsample(x)

#         out += identity
#         out = self.relu(out)

#         return out

# class ResNet50(nn.Module):
#     def __init__(self, n_outputs):
#         super(ResNet50, self).__init__()
#         self.in_channels = 64

#         self.conv1 = nn.Conv2d(3, 64, kernel_size=7, stride=2, padding=3, bias=False)
#         self.bn1 = nn.BatchNorm2d(64)
#         self.relu = nn.ReLU(inplace=True)
#         self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)

#         self.layer1 = self._make_layer(64, 3)
#         self.layer2 = self._make_layer(128, 4, stride=2)
#         self.layer3 = self._make_layer(256, 6, stride=2)
#         self.layer4 = self._make_layer(512, 3, stride=2)

#         self.avgpool = nn.AdaptiveAvgPool2d((1, 1))
#         self.fc = nn.Linear(512 * Bottleneck.expansion, n_outputs)

#     def _make_layer(self, out_channels, blocks, stride=1):
#         downsample = None
#         if stride != 1 or self.in_channels != out_channels * Bottleneck.expansion:
#             downsample = nn.Sequential(
#                 nn.Conv2d(self.in_channels, out_channels * Bottleneck.expansion, kernel_size=1, stride=stride, bias=False),
#                 nn.BatchNorm2d(out_channels * Bottleneck.expansion),
#             )

#         layers = []
#         layers.append(Bottleneck(self.in_channels, out_channels, stride, downsample))
#         self.in_channels = out_channels * Bottleneck.expansion

#         for _ in range(1, blocks):
#             layers.append(Bottleneck(self.in_channels, out_channels))

#         return nn.Sequential(*layers)

#     def forward(self, x):
#         x = self.conv1(x)
#         x = self.bn1(x)
#         x = self.relu(x)
#         x = self.maxpool(x)

#         x = self.layer1(x)
#         x = self.layer2(x)
#         x = self.layer3(x)
#         x = self.layer4(x)

#         x = self.avgpool(x)
#         x = torch.flatten(x, 1)
#         x = self.fc(x)

#         return x

ViT

In [17]:
class ViT(nn.Module):
    def __init__(self, n_outputs, freeze=False):
        """
        Args:
            n_outputs (int): Number of output classes.
            freeze (bool): If True, freeze all layers except the head.
        """
        super(ViT, self).__init__()
        self.n_outputs = n_outputs
        self.freeze = freeze

        # Load the pre-trained ViT model
        self.vit = models.vit_b_16(pretrained=True)

        # Modify the final layer to match the number of outputs
        self.vit.heads.head = nn.Linear(self.vit.heads.head.in_features, n_outputs)

        # Freeze all layers except the head if freeze=True
        if self.freeze:
            self._freeze_layers()

    def _freeze_layers(self):
        """
        Freeze all layers except the head.
        """
        # Freeze all parameters in the model
        for param in self.vit.parameters():
            param.requires_grad = False

        # Unfreeze the head (final classification layer)
        for param in self.vit.heads.head.parameters():
            param.requires_grad = True

    def forward(self, x):
        return self.vit(x)

def count_trainable_parameters(model):
    return sum(p.numel() for p in model.parameters() if p.requires_grad)


In [18]:
# import torch.nn.functional as F

# class PatchEmbedding(nn.Module):
#     def __init__(self, img_size=224, patch_size=16, in_channels=3, embed_dim=768):
#         super(PatchEmbedding, self).__init__()
#         self.img_size = img_size
#         self.patch_size = patch_size
#         self.n_patches = (img_size // patch_size) ** 2

#         self.proj = nn.Conv2d(in_channels, embed_dim, kernel_size=patch_size, stride=patch_size)

#     def forward(self, x):
#         x = self.proj(x)  # (B, E, H, W)
#         x = x.flatten(2)  # (B, E, N)
#         x = x.transpose(1, 2)  # (B, N, E)
#         return x

# class MultiHeadAttention(nn.Module):
#     def __init__(self, embed_dim, num_heads):
#         super(MultiHeadAttention, self).__init__()
#         self.embed_dim = embed_dim
#         self.num_heads = num_heads
#         self.head_dim = embed_dim // num_heads

#         self.qkv = nn.Linear(embed_dim, embed_dim * 3)
#         self.proj = nn.Linear(embed_dim, embed_dim)

#     def forward(self, x):
#         B, N, E = x.shape
#         qkv = self.qkv(x).reshape(B, N, 3, self.num_heads, self.head_dim).permute(2, 0, 3, 1, 4)
#         q, k, v = qkv[0], qkv[1], qkv[2]

#         attn = (q @ k.transpose(-2, -1)) * (self.head_dim ** -0.5)
#         attn = F.softmax(attn, dim=-1)

#         x = (attn @ v).transpose(1, 2).reshape(B, N, E)
#         x = self.proj(x)
#         return x

# class MLP(nn.Module):
#     def __init__(self, embed_dim, hidden_dim):
#         super(MLP, self).__init__()
#         self.fc1 = nn.Linear(embed_dim, hidden_dim)
#         self.fc2 = nn.Linear(hidden_dim, embed_dim)
#         self.act = nn.GELU()

#     def forward(self, x):
#         x = self.fc1(x)
#         x = self.act(x)
#         x = self.fc2(x)
#         return x

# class TransformerBlock(nn.Module):
#     def __init__(self, embed_dim, num_heads, hidden_dim):
#         super(TransformerBlock, self).__init__()
#         self.attn = MultiHeadAttention(embed_dim, num_heads)
#         self.mlp = MLP(embed_dim, hidden_dim)
#         self.norm1 = nn.LayerNorm(embed_dim)
#         self.norm2 = nn.LayerNorm(embed_dim)

#     def forward(self, x):
#         x = x + self.attn(self.norm1(x))
#         x = x + self.mlp(self.norm2(x))
#         return x

# class ViT(nn.Module):
#     def __init__(self, img_size=224, patch_size=16, in_channels=3, embed_dim=768, num_heads=12, num_layers=12, hidden_dim=3072, n_outputs=10):
#         super(ViT, self).__init__()
#         self.patch_embed = PatchEmbedding(img_size, patch_size, in_channels, embed_dim)
#         self.cls_token = nn.Parameter(torch.zeros(1, 1, embed_dim))
#         self.pos_embed = nn.Parameter(torch.zeros(1, self.patch_embed.n_patches + 1, embed_dim))
#         self.blocks = nn.ModuleList([TransformerBlock(embed_dim, num_heads, hidden_dim) for _ in range(num_layers)])
#         self.norm = nn.LayerNorm(embed_dim)
#         self.fc = nn.Linear(embed_dim, n_outputs)

#     def forward(self, x):
#         B = x.shape[0]
#         x = self.patch_embed(x)

#         cls_tokens = self.cls_token.expand(B, -1, -1)
#         x = torch.cat((cls_tokens, x), dim=1)

#         x = x + self.pos_embed

#         for block in self.blocks:
#             x = block(x)

#         x = self.norm(x)
#         x = x[:, 0]
#         x = self.fc(x)
#         return x

### Loss functions

Create a loss function class, or use a standart one.

In [19]:
# Cross entropy loss maden from scratch (just in case)
class CELoss(nn.Module):
    def __init__(self, reduction='mean'):
        super(CELoss, self).__init__()
        self.reduction = reduction

    def forward(self, x, y):
        # Compute softmax probabilities
        prob = nn.functional.softmax(x, 1)
        # Compute log probabilities
        log_prob = -1.0 * torch.log(prob)
        # Gather the log probabilities for the true labels
        loss = log_prob.gather(1, y.unsqueeze(1))
        # Apply reduction
        if self.reduction == 'mean':
            loss = loss.mean()
        elif self.reduction == 'sum':
            loss = loss.sum()
        elif self.reduction == 'none':
            loss = loss.squeeze()  # Remove extra dimension for consistency
        else:
            raise ValueError("Invalid reduction option.")

        return loss

In [20]:
class NLoss(nn.Module):
    def __init__(self, params=hparams):
        super(NLoss, self).__init__()
        self.smoothing =   params.get('label_smoothing', 0.0)
        self.num_classes = params.get('n_classes', 10)
        self.inv_smoothing = 1.0 - self.smoothing  # Probability for the correct class

    def forward(self, x, y):
        """
        x: Model output (logits + log variance)
            - x[:, :self.num_classes]: Logits for class probabilities (h)
            - x[:, self.num_classes:]: Logarithmic variance (s)
        y: Labels
        """
        # Split the model output into predictions (h) and log variance (s)
        logits = x[:, :self.num_classes]  # Predictions (h)
        log_var = x[:, self.num_classes:]  # Logarithmic variance (s)

        # Apply label smoothing to the one-hot encoded labels
        with torch.no_grad():
            yoh = torch.zeros_like(logits)
            yoh.fill_(self.smoothing / (self.num_classes - 1))
            yoh.scatter_(1, y.data.unsqueeze(1), self.inv_smoothing)

        # Compute the squared differences between predictions and smoothed labels
        # logits = torch.softmax(logits, dim=1)  # Convert logits to probabilities
        squared_diff = torch.pow(yoh - logits, 2)  # (y_k - h_k)^2

        # Compute the exponential of the negative log variance (e^{-s})
        # log_var = torch.clamp(log_var, min=-10, max=10)  # Clamp log_var to a reasonable range
        exp_neg_log_var = torch.exp(-log_var)

        # Compute the first term of the loss: e^{-s} * sum((y_k - h_k)^2)
        term1 = exp_neg_log_var * squared_diff.sum(dim=1)

        # Compute the second term of the loss: N * s
        term2 = self.num_classes * log_var

        # Combine the terms and compute the mean over the batch
        loss = (term1 + term2).mean()

        return loss

In [21]:
class BLoss(nn.Module):
    def __init__(self, params=hparams):
        super(BLoss, self).__init__()
        self.smoothing = params.get('label_smoothing', 0.0)
        self.num_classes = params.get('n_classes', 10)
        self.inv_smoothing = 1.0 - self.smoothing  # Probability for the correct class
        self.eps = 1e-10  # Small epsilon for numerical stability

    def forward(self, x, y):
        """
        x: Model output (logits + certainty)
            - x[:, :self.num_classes]: Logits for class probabilities
            - x[:, self.num_classes:]: Certainty values
        y: Ground truth labels (class indices)
        """
        # Ensure y is a tensor of class indices
        if y.dtype != torch.long:
            y = y.long()

        # Extract certainty and probabilities from the model output
        certainty = torch.sigmoid(x[:, self.num_classes:])  # Certainty values (batch_size, 1)
        logits = x[:, :self.num_classes]  # Logits for class probabilities (batch_size, num_classes)
        prob = F.softmax(logits, dim=1)  # Softmax probabilities (batch_size, num_classes)

        # Apply label smoothing to the one-hot encoded labels
        with torch.no_grad():
            yoh = torch.zeros_like(logits)
            yoh.fill_(self.smoothing / (self.num_classes - 1))
            yoh.scatter_(1, y.unsqueeze(1), self.inv_smoothing)

        # Compute cosine similarity between predictions and labels
        cos = nn.CosineSimilarity(dim=1)
        cosyh = cos(yoh, prob)  # Cosine similarity (batch_size,)

        # Compute the terms of the loss
        delta = yoh * prob  # Element-wise product of one-hot labels and probabilities
        entropy_term = delta * torch.log(delta + self.eps)  # Entropy term (avoid log(0))

        # Loss terms
        loss0 = -cosyh * torch.log(certainty.squeeze() / self.num_classes + self.eps)  # First term
        loss1 = -(self.num_classes - 1) * (1 - cosyh) * torch.log((1 - certainty.squeeze()) / self.num_classes + self.eps)  # Second term

        # Combine the terms and compute the mean over the batch
        loss = (entropy_term.sum(dim=1) + loss0 + loss1).mean()

        return loss

### Models zoo

Architectures and loss functions

In [22]:
def get_arch_and_loss(hparams):
    """
    Returns the architecture and loss function based on the provided hparams.

    Args:
        hparams (dict): Hyperparameters dictionary, including 'ARCHITECTURE' and 'criterion'.

    Returns:
        arch: The model architecture.
        loss: The loss function.
    """
    # Determine the number of outputs based on the loss function
    if hparams['criterion'] in ['B', 'N']:
        n_outputs = hparams['n_classes'] + 1  # Add 1 output neuron for BLoss or NLoss
    else:
        n_outputs = hparams['n_classes']  # Default number of outputs

    # Define the architectures
    architectures = {
        'CNN': CNN(n_outputs=n_outputs),
        'ResNet50': ResNet50(n_outputs=n_outputs, freeze=hparams.get('freeze', False)),
        'ViT': ViT(n_outputs=n_outputs, freeze=hparams.get('freeze', False)),
    }

    # Define the loss functions
    losses = {
        'CE': nn.CrossEntropyLoss(),
        'B': BLoss(),
        'N': NLoss(),
    }

    # Get the architecture and loss based on hparams
    arch = architectures.get(hparams['architecture'])
    loss = losses.get(hparams['criterion'])

    if arch is None:
        raise ValueError(f"Architecture '{hparams['ARCHITECTURE']}' is not supported.")
    if loss is None:
        raise ValueError(f"Loss function '{hparams['criterion']}' is not supported.")

    return arch, loss


### Metrics

In [23]:
def metrics(dataloader,model,hparams=hparams,loss_fn_red=None):
    # Collect images, predictions, and losses
    # images = []
    preds  = []
    labels = []
    losses = []
    correct= 0
    total  = 0
    for batch in dataloader:
        x, y, _ = batch
        with torch.no_grad():
            logits = model(x)
            # loss = loss_fn_red(h,y)
            pred = torch.argmax(logits[:,:hparams['n_classes']], dim=1)
        correct += (pred == y).sum().item()  # Number of correct predictions
        total += y.size(0)  # Total number of samples

        # images.extend(x.cpu())
        preds.extend(pred.cpu().numpy())
        labels.extend(y.cpu().numpy())
        # losses.extend(loss.cpu().numpy())
    acc = correct / total
    return preds, labels, acc

# Ensembling
This approach is expected to give a robust ensemble model that leverages the diversity introduced by different seeds, potentially improving the overall accuracy on the test set.

## Create Dataset and Data Loaders

Initialization of the dataset, the dataloader, and the training module

In [24]:
data_module = CIFAR10DataModule(hparams)
data_module.prepare_data()
data_module.setup()

Files already downloaded and verified
Files already downloaded and verified


  noise_file = torch.load(self.noise_path)
  noise_file = torch.load(self.noise_path)
Seed set to 42


Loaded worse_label from ./data/CIFAR-10_human.pt.
The overall noise rate is 0.40208


## Train the Ensemble

Loop over different seeds

In [25]:
# List to store predictions from each model
all_predictions = []

In [26]:
for seed in SEEDS:
    # Set seed for reproducibility at the VERY BEGINNING
    pl.seed_everything(seed)

    # Reinitialize the model architecture for each seed
    # arch, loss_fn = CNN(SIZE*SIZE), nn.CrossEntropyLoss()
    arch, loss_fn = get_arch_and_loss(hparams)

    #checkpoint_callback_img = #Enter your code here
    checkpoint_callback_img = ModelCheckpoint(
        monitor='val_loss',
        dirpath='checkpoints/',
        filename=f'model-{hparams["architecture"]}-{hparams["criterion"]}-seed-{seed}-{{epoch}}-{{val_loss:.2f}}',
        save_top_k=1,
        mode='min'
    )

    task = Task.init(project_name="CV-2025", task_name=f'Assignment1=model-{hparams["architecture"]}-{hparams["criterion"]}-seed-{seed}-{{epoch}}_img')

    # Initialize the model with the reinitialized architecture
    model = train_model(model=arch,loss=loss_fn)

    # Log hyperparameters to ClearML
    task.connect(model.hparams)

    trainer = Trainer(max_epochs=hparams['num_epochs'],
                  callbacks=[checkpoint_callback_img],
                  accelerator="auto", devices="auto")
    # Train the model
    trainer.fit(model, data_module)

    # Get the path to the best model
    best_model_path = checkpoint_callback_img.best_model_path

    # Update the output model in the task
    task.update_output_model(model_path=best_model_path, auto_delete_file=False)

    # Test set
    test_dataloader = data_module.test_dataloader()

    # Move the model to the correct device
    # best_model = model.model.load_state_dict(torch.load(best_model_path))
    # state_dict = torch.load(best_model_path)['state_dict']
    # new_state_dict = {key.replace("model.", ""): value for key, value in state_dict.items()}
    # best_model = train_model(model=arch,loss=loss_fn)
    # best_model.model.load_state_dict(new_state_dict)
    # best_model = best_model.to(device)
    best_model = train_model.load_from_checkpoint(best_model_path, model=arch, loss=loss_fn)
    best_model = best_model.to(device)
    test_results = trainer.test(best_model, dataloaders=test_dataloader)
    
    # predictions = trainer.predict(best_model, dataloaders=test_dataloader)
    predictions = []
    for batch in test_dataloader:
        x, y, _ = batch  # Unpack batch (ignore indices for now)
        with torch.no_grad():
            logits = best_model(x)
            preds = torch.argmax(logits[:, :best_model.nc], dim=1)
            predictions.extend(preds.cpu().numpy())  # Store predictions as numpy array
    # Store predictions
    all_predictions.append(predictions)
    # predictions = trainer.predict(best_model, test_dataloader)

    if seed != SEEDS[-1]:
        task.close()
        del[model, best_model, task, arch, loss_fn]

Seed set to 42


ClearML Task: created new task id=d054b9b5958e4e6db53b92338ef4c202
2025-03-01 21:23:11,580 - clearml.Task - INFO - Storing jupyter notebook directly as code
ClearML results page: https://app.clear.ml/projects/edae844b1820483eb0d3e3030b2a943d/experiments/d054b9b5958e4e6db53b92338ef4c202/output/log


GPU available: True (mps), used: True
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs

Starting from v1.9.0, `tensorboardX` has been removed as a dependency of the `pytorch_lightning` package, due to potential conflicts with other packages in the ML ecosystem. For this reason, `logger=True` will use `CSVLogger` as the default logger, unless the `tensorboard` or `tensorboardX` packages are found. Please `pip install lightning[extra]` or one of them to enable TensorBoard support by default



ClearML Monitor: GPU monitoring failed getting GPU reading, switching off GPU monitoring
Files already downloaded and verified
Files already downloaded and verified



You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.



2025-03-01 21:23:17,638 - clearml.model - INFO - Selected model id: 1c572eccb1704c7aa0d671c090d9c8d9



You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.





Seed set to 42


Loaded worse_label from ./data/CIFAR-10_human.pt.
The overall noise rate is 0.40208



Checkpoint directory /Users/damirnurtdinov/Desktop/My Courses/Classical CV/assignments/Assignment_01/checkpoints exists and is not empty.


  | Name    | Type  | Params | Mode 
------------------------------------------
0 | model   | CNN   | 4.4 M  | train
1 | loss_fn | NLoss | 0      | train
------------------------------------------
4.4 M     Trainable params
0         Non-trainable params
4.4 M     Total params
17.739    Total estimated model params size (MB)
21        Modules in train mode
0         Modules in eval mode


Sanity Checking DataLoader 0:   0%|          | 0/2 [00:00<?, ?it/s]


The 'val_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=7` in the `DataLoader` to improve performance.



                                                                           


The 'train_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=7` in the `DataLoader` to improve performance.



Epoch 1:  68%|██████▊   | 238/352 [01:03<00:30,  3.74it/s, v_num=46, train_loss_step=-15.4, val_loss_step=-17.1, val_loss_epoch=-14.9, train_loss_epoch=-14.1]ClearML Monitor: Could not detect iteration reporting, falling back to iterations as seconds-from-start
Epoch 19: 100%|██████████| 352/352 [01:38<00:00,  3.59it/s, v_num=46, train_loss_step=-19.7, val_loss_step=-17.5, val_loss_epoch=-16.9, train_loss_epoch=-18.9]

`Trainer.fit` stopped: `max_epochs=20` reached.


Epoch 19: 100%|██████████| 352/352 [01:38<00:00,  3.59it/s, v_num=46, train_loss_step=-19.7, val_loss_step=-17.5, val_loss_epoch=-16.9, train_loss_epoch=-18.9]
2025-03-01 21:57:05,970 - clearml.storage - INFO - Uploading: 50.82MB to /Users/damirnurtdinov/Desktop/My Courses/Classical CV/assignments/Assignment_01/checkpoints/model-CNN-N-seed-42-epoch=13-val_loss=-17.20.ckpt



Attribute 'model' is an instance of `nn.Module` and is already saved during checkpointing. It is recommended to ignore them using `self.save_hyperparameters(ignore=['model'])`.


Attribute 'loss' is an instance of `nn.Module` and is already saved during checkpointing. It is recommended to ignore them using `self.save_hyperparameters(ignore=['loss'])`.

███████████████▏                 49% | 25.00/50.82 MB [00:02<00:02, 10.17MB/s]: 

Files already downloaded and verified


██████████████████▎              59% | 30.00/50.82 MB [00:02<00:02,  9.99MB/s]: 

Files already downloaded and verified



You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.


clamping frac to range [0, 1]

███████████████████████████████ 100% | 50.82/50.82 MB [00:05<00:00,  9.36MB/s]: 

2025-03-01 21:57:11,412 - clearml.Task - INFO - Completed model upload to https://files.clear.ml/CV-2025/Assignment1%3Dmodel-CNN-N-seed-42-%7Bepoch%7D_img.d054b9b5958e4e6db53b92338ef4c202/models/model-CNN-N-seed-42-epoch%3D13-val_loss%3D-17.20.ckpt




You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.

Seed set to 42


Loaded worse_label from ./data/CIFAR-10_human.pt.
The overall noise rate is 0.40208



The 'test_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=7` in the `DataLoader` to improve performance.



Testing DataLoader 0: 100%|██████████| 79/79 [30:31<00:00,  0.04it/s]
────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
       Test metric             DataLoader 0
────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
     test_acc_epoch         0.7394999861717224
     test_loss_epoch        -20.603679656982422
────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────


KeyboardInterrupt: 

In [None]:
predictions = []
for batch in test_dataloader:
        x, y, _ = batch  # Unpack batch (ignore indices for now)
        with torch.no_grad():
            logits = best_model(x)
            preds = torch.argmax(logits[:, :best_model.nc], dim=1)
            predictions.extend(preds.cpu().numpy())  # Store predictions as numpy array

In [None]:
predictions

[np.int64(3),
 np.int64(1),
 np.int64(1),
 np.int64(8),
 np.int64(6),
 np.int64(6),
 np.int64(1),
 np.int64(6),
 np.int64(3),
 np.int64(1),
 np.int64(0),
 np.int64(9),
 np.int64(5),
 np.int64(7),
 np.int64(9),
 np.int64(6),
 np.int64(5),
 np.int64(7),
 np.int64(8),
 np.int64(6),
 np.int64(7),
 np.int64(2),
 np.int64(2),
 np.int64(1),
 np.int64(4),
 np.int64(2),
 np.int64(6),
 np.int64(0),
 np.int64(9),
 np.int64(2),
 np.int64(6),
 np.int64(5),
 np.int64(2),
 np.int64(5),
 np.int64(9),
 np.int64(3),
 np.int64(4),
 np.int64(1),
 np.int64(9),
 np.int64(5),
 np.int64(4),
 np.int64(6),
 np.int64(2),
 np.int64(6),
 np.int64(8),
 np.int64(1),
 np.int64(3),
 np.int64(9),
 np.int64(7),
 np.int64(2),
 np.int64(8),
 np.int64(8),
 np.int64(7),
 np.int64(6),
 np.int64(8),
 np.int64(8),
 np.int64(7),
 np.int64(5),
 np.int64(3),
 np.int64(5),
 np.int64(7),
 np.int64(5),
 np.int64(6),
 np.int64(5),
 np.int64(5),
 np.int64(2),
 np.int64(1),
 np.int64(2),
 np.int64(5),
 np.int64(7),
 np.int64(0),
 np.in

## Test the models and the ensemble of the models

In [None]:
all_predictions

[[np.int64(3),
  np.int64(1),
  np.int64(1),
  np.int64(0),
  np.int64(6),
  np.int64(6),
  np.int64(1),
  np.int64(2),
  np.int64(3),
  np.int64(1),
  np.int64(0),
  np.int64(9),
  np.int64(5),
  np.int64(7),
  np.int64(9),
  np.int64(8),
  np.int64(5),
  np.int64(7),
  np.int64(8),
  np.int64(6),
  np.int64(7),
  np.int64(0),
  np.int64(0),
  np.int64(9),
  np.int64(4),
  np.int64(2),
  np.int64(3),
  np.int64(0),
  np.int64(9),
  np.int64(6),
  np.int64(6),
  np.int64(5),
  np.int64(2),
  np.int64(3),
  np.int64(9),
  np.int64(3),
  np.int64(7),
  np.int64(1),
  np.int64(9),
  np.int64(5),
  np.int64(0),
  np.int64(6),
  np.int64(5),
  np.int64(6),
  np.int64(0),
  np.int64(9),
  np.int64(3),
  np.int64(9),
  np.int64(7),
  np.int64(2),
  np.int64(9),
  np.int64(8),
  np.int64(7),
  np.int64(3),
  np.int64(8),
  np.int64(8),
  np.int64(5),
  np.int64(3),
  np.int64(3),
  np.int64(2),
  np.int64(7),
  np.int64(5),
  np.int64(6),
  np.int64(9),
  np.int64(6),
  np.int64(7),
  np.int64

Individual models

In [None]:
# List to store individual model accuracies
individual_accuracies = []

# Compute accuracy for each model
for i, predictions in enumerate(all_predictions):
    # Get predictions for the current model
    model_predictions = predictions  # Shape: (num_samples,)

    # Get true labels (already collected earlier)
    true_labels = np.array(data_module.cifar10_test.targets)

    # Calculate accuracy for the current model
    accuracy = accuracy_score(true_labels, model_predictions)
    individual_accuracies.append(accuracy)
    print(f'Model {i+1} Accuracy: {accuracy:.4f}')

# Convert to numpy array for easier calculations
individual_accuracies = np.array(individual_accuracies)

# Compute mean accuracy
mean_accuracy = np.mean(individual_accuracies)

# Compute standard deviation of accuracy
std_accuracy = np.std(individual_accuracies)

print(f'Mean Accuracy: {mean_accuracy:.4f}')
print(f'Standard Deviation of Accuracy: {std_accuracy:.4f}')

Model 1 Accuracy: 0.6981
Model 2 Accuracy: 0.6831
Mean Accuracy: 0.6906
Standard Deviation of Accuracy: 0.0075


Ensemble

In [None]:
# Stack predictions from all models
all_predictions = np.stack(all_predictions)  # Shape: (num_models, num_samples, num_classes)

# Ensemble predictions (e.g., by averaging)
ensemble_predictions = np.mean(all_predictions, axis=0)  # Shape: (num_samples, num_classes)
final_predictions, _ = mode(all_predictions, axis=0)  # Majority voting
final_predictions = final_predictions.flatten()  # Flatten to 1D array

# Get true labels from the CIFAR-10 data set
test_labels = np.array(data_module.cifar10_test.targets)
# test_labels = data_module.test_dataset.labels  # Adjust this based on your dataset

# Calculate accuracy
accuracy = accuracy_score(test_labels, final_predictions)
print(f'Ensemble Accuracy: {accuracy:.4f}')

# Compute confusion matrix
cm = confusion_matrix(test_labels, final_predictions)

Ensemble Accuracy: 0.6844


In [None]:
# Simulated test metrics
test_metrics = {
    "Mean Accuracy (individual)": mean_accuracy,
    "Standard Deviation of Accuracy (individual)": std_accuracy,
    "Ensemble Accuracy": accuracy,
}

task.connect(test_metrics)

{'Mean Accuracy (individual)': np.float64(0.6906000000000001),
 'Standard Deviation of Accuracy (individual)': np.float64(0.007500000000000007),
 'Ensemble Accuracy': 0.6844}

In [None]:
task.close()