KUL H02A5a Computer Vision: Group Assignment 2
---------------------------------------------------------------
Student numbers: <span style="color:red">r1, r2, r3, r4, r5</span>. (fill in your student numbers!)

In this group assignment your team will delve into some deep learning applications for computer vision. The assignment will be delivered in the same groups from *Group assignment 1* and you start from this template notebook. The notebook you submit for grading is the last notebook pinned as default and submitted to the [Kaggle competition](https://www.kaggle.com/t/90a3b6380ecb4700857b9e07a44ca41b) prior to the deadline on **Tuesday 20 May 23:59**. Closely follow [these instructions](https://github.com/gourie/kaggle_inclass) for joining the competition, sharing your notebook with the TAs and making a valid notebook submission to the competition. A notebook submission not only produces a *submission.csv* file that is used to calculate your competition score, it also runs the entire notebook and saves its output as if it were a report. This way it becomes an all-in-one-place document for the TAs to review. As such, please make sure that your final submission notebook is self-contained and fully documented (e.g. provide strong arguments for the design choices that you make). Most likely, this notebook format is not appropriate to run all your experiments at submission time (e.g. the training of CNNs is a memory hungry and time consuming process; due to limited Kaggle resources). It can be a good idea to distribute your code otherwise and only summarize your findings, together with your final predictions, in the submission notebook. For example, you can substitute experiments with some text and figures that you have produced "offline" (e.g. learning curves and results on your internal validation set or even the test set for different architectures, pre-processing pipelines, etc). We advise you to first go through the PDF of this assignment entirely before you really start. Then, it can be a good idea to go through this notebook and use it as your first notebook submission to the competition. You can make use of the *Group assignment 2* forum/discussion board on Toledo if you have any questions. Good luck and have fun!

---------------------------------------------------------------
NOTES:
* This notebook is just a template. Please keep the five main sections, but feel free to adjust further in any way you please!
* Clearly indicate the improvements that you make! You can for instance use subsections like: *3.1. Improvement: applying loss function f instead of g*.


# 1. Overview
This assignment consists of *three main parts* for which we expect you to provide code and extensive documentation in the notebook:
* Image classification (Sect. 2)
* Semantic segmentation (Sect. 3)
* Adversarial attacks (Sect. 4)

In the first part, you will train an end-to-end neural network for image classification. In the second part, you will do the same for semantic segmentation. For these two tasks we expect you to put a significant effort into optimizing performance and as such competing with fellow students via the Kaggle competition. In the third part, you will try to find and exploit the weaknesses of your classification and/or segmentation network. For the latter there is no competition format, but we do expect you to put significant effort in achieving good performance on the self-posed goal for that part. Finally, we ask you to reflect and produce an overall discussion with links to the lectures and "real world" computer vision (Sect. 5). It is important to note that only a small part of the grade will reflect the actual performance of your networks. However, we do expect all things to work! In general, we will evaluate the correctness of your approach and your understanding of what you have done that you demonstrate in the descriptions and discussions in the final notebook.

## 1.1 Deep learning resources
If you did not yet explore this in *Group assignment 1 (Sect. 2)*, we recommend using the TensorFlow and/or Keras library for building deep learning models. You can find a nice crash course [here](https://colab.research.google.com/drive/1UCJt8EYjlzCs1H1d1X0iDGYJsHKwu-NO).

In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
import numpy as np
import pandas as pd
import tensorflow as tf
from matplotlib import pyplot as plt

## 1.2 PASCAL VOC 2009
For this project you will be using the [PASCAL VOC 2009](http://host.robots.ox.ac.uk/pascal/VOC/voc2009/index.html) dataset. This dataset consists of colour images of various scenes with different object classes (e.g. animal: *bird, cat, ...*; vehicle: *aeroplane, bicycle, ...*), totalling 20 classes.

In [None]:
# Loading the training data
train_df = pd.read_csv('/kaggle/input/kul-computer-vision-ga-2-2025/train/train_set.csv', index_col="Id")
labels = train_df.columns
train_df["img"] = [np.load('/kaggle/input/kul-computer-vision-ga-2-2025/train/img/train_{}.npy'.format(idx)) for idx, _ in train_df.iterrows()]
train_df["seg"] = [np.load('/kaggle/input/kul-computer-vision-ga-2-2025/train/seg/train_{}.npy'.format(idx)) for idx, _ in train_df.iterrows()]
print("The training set contains {} examples.".format(len(train_df)))

# Show some examples
fig, axs = plt.subplots(2, 20, figsize=(10 * 20, 10 * 2))
for i, label in enumerate(labels):
    df = train_df.loc[train_df[label] == 1]
    axs[0, i].imshow(df.iloc[0]["img"], vmin=0, vmax=255)
    axs[0, i].set_title("\n".join(label for label in labels if df.iloc[0][label] == 1), fontsize=40)
    axs[0, i].axis("off")
    axs[1, i].imshow(df.iloc[0]["seg"], vmin=0, vmax=20)  # with the absolute color scale it will be clear that the arrays in the "seg" column are label maps (labels in [0, 20])
    axs[1, i].axis("off")
    
plt.show()

# The training dataframe contains for each image 20 columns with the ground truth classification labels and 20 column with the ground truth segmentation maps for each class
train_df.head(1)

In [None]:
# Loading the test data
test_df = pd.read_csv('/kaggle/input/kul-computer-vision-ga-2-2025/test/test_set.csv', index_col="Id")
test_df["img"] = [np.load('/kaggle/input/kul-computer-vision-ga-2-2025/test/img/test_{}.npy'.format(idx)) for idx, _ in test_df.iterrows()]
test_df["seg"] = [-1 * np.ones(img.shape[:2], dtype=np.int8) for img in test_df["img"]]
print("The test set contains {} examples.".format(len(test_df)))

# The test dataframe is similar to the training dataframe, but here the values are -1 --> your task is to fill in these as good as possible in Sect. 2 and Sect. 3; in Sect. 6 this dataframe is automatically transformed in the submission CSV!
test_df.head(1)

## 1.3 Your Kaggle submission
Your filled test dataframe (during Sect. 2 and Sect. 3) must be converted to a submission.csv with two rows per example (one for classification and one for segmentation) and with only a single prediction column (the multi-class/label predictions running length encoded). You don't need to edit this section. Just make sure to call this function at the right position in this notebook.

In [None]:
def _rle_encode(img):
    """
    Kaggle requires RLE encoded predictions for computation of the Dice score (https://www.kaggle.com/lifa08/run-length-encode-and-decode)

    Parameters
    ----------
    img: np.ndarray - binary img array
    
    Returns
    -------
    rle: String - running length encoded version of img
    """
    pixels = img.flatten()
    pixels = np.concatenate([[0], pixels, [0]])
    runs = np.where(pixels[1:] != pixels[:-1])[0] + 1
    runs[1::2] -= runs[::2]
    rle = ' '.join(str(x) for x in runs)
    return rle

def generate_submission(df):
    """
    Make sure to call this function once after you completed Sect. 2 and Sect. 3! It transforms and writes your test dataframe into a submission.csv file.
    
    Parameters
    ----------
    df: pd.DataFrame - filled dataframe that needs to be converted
    
    Returns
    -------
    submission_df: pd.DataFrame - df in submission format.
    """
    df_dict = {"Id": [], "Predicted": []}
    for idx, _ in df.iterrows():
        df_dict["Id"].append(f"{idx}_classification")
        df_dict["Predicted"].append(_rle_encode(np.array(df.loc[idx, labels])))
        df_dict["Id"].append(f"{idx}_segmentation")
        df_dict["Predicted"].append(_rle_encode(np.array([df.loc[idx, "seg"] == j + 1 for j in range(len(labels))])))
    
    submission_df = pd.DataFrame(data=df_dict, dtype=str).set_index("Id")
    submission_df.to_csv("submission.csv")
    return submission_df

# 2. Image classification
The goal here is simple: implement a classification CNN and train it to recognise all 20 classes (and/or background) using the training set and compete on the test set (by filling in the classification columns in the test dataframe).

In [None]:
class RandomClassificationModel:
    """
    Random classification model: 
        - generates random labels for the inputs based on the class distribution observed during training
        - assumes an input can have multiple labels
    """
    def fit(self, X, y):
        """
        Adjusts the class ratio variable to the one observed in y. 

        Parameters
        ----------
        X: list of arrays - n x (height x width x 3)
        y: list of arrays - n x (nb_classes)

        Returns
        -------
        self
        """
        self.distribution = np.mean(y, axis=0)
        print("Setting class distribution to:\n{}".format("\n".join(f"{label}: {p}" for label, p in zip(labels, self.distribution))))
        return self
        
    def predict(self, X):
        """
        Predicts for each input a label.
        
        Parameters
        ----------
        X: list of arrays - n x (height x width x 3)
            
        Returns
        -------
        y_pred: list of arrays - n x (nb_classes)
        """
        np.random.seed(0)
        return [np.array([int(np.random.rand() < p) for p in self.distribution]) for _ in X]
    
    def __call__(self, X):
        return self.predict(X)
    
model = RandomClassificationModel()
model.fit(train_df["img"], train_df[labels])
test_df.loc[:, labels] = model.predict(test_df["img"])
test_df.head(1)

# 3. Semantic segmentation
The goal here is to implement a segmentation CNN that labels every pixel in the image as belonging to one of the 20 classes (and/or background). Use the training set to train your CNN and compete on the test set (by filling in the segmentation column in the test dataframe).

In [None]:
class RandomSegmentationModel:
    """
    Random segmentation model: 
        - generates random label maps for the inputs based on the class distributions observed during training
        - every pixel in an input can only have one label
    """
    def fit(self, X, Y):
        """
        Adjusts the class ratio variable to the one observed in Y. 

        Parameters
        ----------
        X: list of arrays - n x (height x width x 3)
        Y: list of arrays - n x (height x width)

        Returns
        -------
        self
        """
        self.distribution = np.mean([[np.sum(Y_ == i) / Y_.size for i in range(len(labels) + 1)] for Y_ in Y], axis=0)
        print("Setting class distribution to:\nbackground: {}\n{}".format(self.distribution[0], "\n".join(f"{label}: {p}" for label, p in zip(labels, self.distribution[1:]))))
        return self
        
    def predict(self, X):
        """
        Predicts for each input a label map.
        
        Parameters
        ----------
        X: list of arrays - n x (height x width x 3)
            
        Returns
        -------
        Y_pred: list of arrays - n x (height x width)
        """
        np.random.seed(0)
        return [np.random.choice(np.arange(len(labels) + 1), size=X_.shape[:2], p=self.distribution) for X_ in X]
    
    def __call__(self, X):
        return self.predict(X)
    
model = RandomSegmentationModel()
model.fit(train_df["img"], train_df["seg"])
test_df.loc[:, "seg"] = model.predict(test_df["img"])
test_df.head(1)

## Transfer Learning

In [None]:
import torch
from torch.utils.data import Dataset
from PIL import Image
import torchvision.transforms as transforms
import albumentations as A
import numpy as np
from torch.utils.data import DataLoader
import torchvision.models.segmentation as models
import torch.nn as nn
import matplotlib.pyplot as plt
from tqdm import tqdm
import pandas as pd
from sklearn.model_selection import train_test_split
import matplotlib.colors as mcolors
torch.cuda.empty_cache()

In [None]:
BATCH_SIZE = 16
NUM_WORKERS = 0
EPOCH = 50
LR = 1e-5

In [None]:
def get_device():
    if torch.cuda.is_available():
        return torch.device('cuda')
    elif torch.backends.mps.is_available():
        return torch.device('mps')
    elif hasattr(torch, 'xla') and torch.xla.device_count() > 0:
        return torch.device('xla')
    else:
        return torch.device('cpu')

device = get_device()

In [None]:
# Loading the training data
train_df = pd.read_csv('data/train/train_set.csv', index_col="Id")
labels = train_df.columns
train_df["img"] = [np.load('/kaggle/input/kul-computer-vision-ga-2-2025/train/img/train_{}.npy'.format(idx)) for idx, _ in train_df.iterrows()]
train_df["seg"] = [np.load('/kaggle/input/kul-computer-vision-ga-2-2025/train/seg/train_{}.npy'.format(idx)) for idx, _ in train_df.iterrows()]
print("The training set contains {} examples.".format(len(train_df)))

# Show some examples
fig, axs = plt.subplots(2, 20, figsize=(10 * 20, 10 * 2))
for i, label in enumerate(labels):
    df = train_df.loc[train_df[label] == 1]
    axs[0, i].imshow(df.iloc[0]["img"], vmin=0, vmax=255)
    axs[0, i].set_title("\n".join(label for label in labels if df.iloc[0][label] == 1), fontsize=40)
    axs[0, i].axis("off")
    axs[1, i].imshow(df.iloc[0]["seg"], vmin=0, vmax=20)  # with the absolute color scale it will be clear that the arrays in the "seg" column are label maps (labels in [0, 20])
    axs[1, i].axis("off")
    
plt.show()

# The training dataframe contains for each image 20 columns with the ground truth classification labels and 20 column with the ground truth segmentation maps for each class
train_df.head(1)

In [None]:
# Loading the test data
test_df = pd.read_csv('/kaggle/input/kul-computer-vision-ga-2-2025/test/test_set.csv', index_col="Id")
test_df["img"] = [np.load('/kaggle/input/kul-computer-vision-ga-2-2025/test/img/test_{}.npy'.format(idx)) for idx, _ in test_df.iterrows()]
test_df["seg"] = [-1 * np.ones(img.shape[:2], dtype=np.int8) for img in test_df["img"]]
print("The test set contains {} examples.".format(len(test_df)))

# The test dataframe is similar to the training dataframe, but here the values are -1 --> your task is to fill in these as good as possible in Sect. 2 and Sect. 3; in Sect. 6 this dataframe is automatically transformed in the submission CSV!
test_df.head(1)

In [None]:
class VOC2009Dataset(Dataset):
    def __init__(self, dataframe, transform=None, target_transform=None, paired_transform=None, ignore_label=21):
        self.df = dataframe.reset_index()
        self.transform = transform
        self.target_transform = target_transform
        self.paired_transform = paired_transform
        
        self.label_columns = [col for col in self.df.columns if col not in ['img', 'seg', 'Id']]
        self.ignore_label = ignore_label
        self.classes = 22  # 20 classes + background + void

    def __len__(self):
        return len(self.df)

    def __getitem__(self, idx):
        image = self.df.iloc[idx]['img'] 
        mask = self.df.iloc[idx]['seg']   

        image = Image.fromarray(image.astype(np.uint8))  
        mask = Image.fromarray(mask.astype(np.uint8))    

        if self.paired_transform:
            image, mask = self.paired_transform(image, mask)
        if self.transform:
            image = self.transform(image)
        if self.target_transform:
            mask = self.target_transform(mask)

        return image, mask

In [None]:
paired_transform = A.Compose([
    A.Resize(256, 256, interpolation=1)
], additional_targets={'mask': 'mask'})

image_transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
])

mask_transform = transforms.Compose([
    transforms.Lambda(lambda x: torch.tensor(np.array(x), dtype=torch.long)),
    transforms.Lambda(lambda x: torch.where(x == 255, 21, x))
])

def apply_paired_transform(image, mask):
    image_np = np.array(image)
    mask_np = np.array(mask)
    
    augmented = paired_transform(image=image_np, mask=mask_np)
    
    image_aug = Image.fromarray(augmented['image'])
    mask_aug = Image.fromarray(augmented['mask'])
    
    return image_aug, mask_aug

In [None]:
def split_dataframe(df, val_split=0.2, random_state=42):
    df = df.reset_index()
    
    train_df, val_df = train_test_split(
        df,
        test_size=val_split,
        random_state=random_state,
        shuffle=True
    )
    
    train_df = train_df.reset_index(drop=True)
    val_df = val_df.reset_index(drop=True)
    
    return train_df, val_df

train_df, val_df = split_dataframe(train_df)

train_dataset = VOC2009Dataset(
    dataframe=train_df,
    transform=image_transform,
    target_transform=mask_transform,
    paired_transform=apply_paired_transform
    )
train_dataloader = DataLoader(train_dataset, batch_size=BATCH_SIZE, shuffle=True, num_workers=NUM_WORKERS)

val_dataset = VOC2009Dataset(
    dataframe=val_df,
    transform=image_transform,
    target_transform=mask_transform,
    paired_transform=apply_paired_transform
)
val_dataloader = DataLoader(val_dataset, batch_size=BATCH_SIZE, shuffle=False, num_workers=NUM_WORKERS)

In [None]:
class EarlyStopping:
    def __init__(self, patience=5, delta=1e-4, verbose=False):
        self.patience = patience
        self.delta = delta # Minimum improvement
        self.verbose = verbose
        self.best_score = None
        self.early_stop = False
        self.counter = 0
        self.best_loss = float('inf')

    def __call__(self, val_loss, model):
        score = -val_loss  # Convert to negative if minimizing loss

        if self.best_score is None:
            self.best_score = score
            self.best_loss = val_loss
            self.save_checkpoint(val_loss, model)
        elif score < self.best_score + self.delta:
            self.counter += 1
            if self.verbose:
                print(f'EarlyStopping counter: {self.counter}/{self.patience}')
            if self.counter >= self.patience:
                self.early_stop = True
        else:
            self.best_score = score
            self.save_checkpoint(val_loss, model)
            self.counter = 0

    def save_checkpoint(self, val_loss, model):
        if self.verbose:
            print(f'Validation loss decreased ({self.best_loss:.4f} --> {val_loss:.4f}). Saving model...')
        torch.save(model.state_dict(), 'checkpoint.pt')
        self.best_loss = val_loss

In [None]:
class DiceLoss(nn.Module):
    def __init__(self, smooth=1, ignore_index=21):
        super(DiceLoss, self).__init__()
        self.smooth = smooth
        self.ignore_index = ignore_index

    def forward(self, pred, target):
        pred = torch.softmax(pred, dim=1) 
        num_classes = pred.size(1) + 1
        
        mask = (target != self.ignore_index).float()
        
        target = torch.nn.functional.one_hot(target.long(), num_classes=num_classes)  # [batch_size, height, width, num_classes]
        target = target.permute(0, 3, 1, 2).float()  # [batch_size, num_classes, height, width]
        
        # Apply mask to target
        mask_target = mask.unsqueeze(1).expand_as(target)  # [batch_size, 22, height, width]
        target = target * mask_target  # Zero out ignored pixels
        target = target[:, :-1] # [batch_size, 21, height, width]
        
        # Apply mask to predictions
        mask_pred = mask.unsqueeze(1).expand_as(pred) # [batch_size, 21, height, width]
        pred = pred * mask_pred 

        # Flatten predictions and targets for each class
        pred = pred.contiguous().view(-1, pred.size(1))  # [batch_size * height * width, num_classes]
        target = target.contiguous().view(-1, target.size(1))  # [batch_size * height * width, num_classes]
        
        # Compute Dice coefficient for each class
        intersection = (pred * target).sum(dim=0)  # Sum over pixels for each class
        union = pred.sum(dim=0) + target.sum(dim=0)  # Sum over pixels for each class ans substract the intersection
        dice = (2. * intersection + self.smooth) / (union + self.smooth + 1e-8)  # Dice score per class
        
        # Return 1 - mean Dice score as loss
        return 1 - dice.mean()

In [None]:
class WeightedCEDiceLoss(nn.Module):
    def __init__(self, smooth=1, ignore_index=21, alpha=0.5):
        super(WeightedCEDiceLoss, self).__init__()
        self.alpha = alpha
        self.diceloss_fn = DiceLoss(smooth, ignore_index)
        self.wceloss_fn = nn.CrossEntropyLoss(ignore_index=ignore_index)
        
    def forward(self, pred, target):
        diceloss = self.diceloss_fn.forward(pred, target)
        wceloss = self.wceloss_fn.forward(pred, target)
        return self.alpha * diceloss + (1 - self.alpha) * wceloss

In [None]:
model = models.deeplabv3_resnet50(pretrained=True, num_classes=21) 
criterion = WeightedCEDiceLoss(alpha=0.5)
optimizer = torch.optim.Adam(model.parameters(), lr=LR)
scheduler = torch.optim.lr_scheduler.ExponentialLR(optimizer, gamma=0.8)

In [None]:
train_losses = []
val_losses = []

In [None]:
num_epochs = EPOCH
model = model.to(device)
early_stopping = EarlyStopping(patience=5, verbose=True)

for epoch in range(num_epochs):
    model.train()
    running_loss = 0.0

    with tqdm(train_dataloader, desc=f"Training epoch [{epoch+1}/{num_epochs}]", unit="batch") as pbar:
        for images, masks in pbar:
            images = images.to(device)
            masks = masks.to(device)
            
            # Forward pass
            logits = model(images)['out']
            loss = criterion(logits, masks)
            
            # Backward pass
            optimizer.zero_grad()
            loss.backward()
            optimizer.step()
            
            running_loss += loss.item()
        
        print(f"Epoch [{epoch+1}/{num_epochs}], Loss: {running_loss/len(train_dataloader):.4f}")
        train_losses.append(running_loss/len(train_dataloader))
    
    # Validation phase
    model.eval()
    val_loss = 0.0
    with torch.no_grad():
        with tqdm(val_dataloader, desc=f"Validating", unit="batch") as pbar:
            for images, masks in pbar:
                images = images.to(device)
                masks = masks.to(device)
                logits = model(images)['out']
                loss = criterion(logits, masks)
                val_loss += loss.item()
    
    val_loss = val_loss / len(val_dataloader)
    print(f"Epoch [{epoch+1}/{num_epochs}], Val Loss: {val_loss:.4f}")
    val_losses.append(val_loss)

    early_stopping(val_loss, model)
    if early_stopping.early_stop:
        print("Early stopping triggered.")
        break
    
# Load the best model
model.load_state_dict(torch.load('checkpoint.pt'))

In [None]:
dice_loss = DiceLoss(smooth=1)
    
model.eval()
total_loss = 0.0
num_batches = 0

with torch.no_grad():
    with tqdm(val_dataloader, desc=f"Validating", unit="batch") as pbar:
        for images, masks in pbar:
            images = images.to(device)
            masks = masks.to(device)
            logits = model(images)['out']
            loss = dice_loss(logits, masks)
            total_loss += loss.item()
            num_batches += 1
    
avg_loss = total_loss / num_batches
print(f'Final average DICE score: {1 - avg_loss}')

In [None]:
def visualize_segmentation(image, mask, pred):
    # Convert tensors to NumPy
    image = image.permute(1, 2, 0).cpu().numpy()  # Convert to HWC
    mask = mask.cpu().numpy()
    pred = pred.cpu().numpy()
    
    # Ensure mask and pred are 2D (H, W)
    if mask.ndim > 2:
        mask = mask.squeeze()
    if pred.ndim > 2:
        pred = pred.squeeze()
    
    # Initialize RGB images for masks
    height, width = mask.shape
    colored_mask = np.zeros((height, width, 3), dtype=np.uint8)
    colored_pred = np.zeros((height, width, 3), dtype=np.uint8)
    
    # Get the viridis colormap
    cmap = plt.get_cmap('viridis')
    norm = mcolors.Normalize(vmin=0, vmax=20)  # Scale for labels [0, 20]
    
    # Map class indices to colors for ground truth and prediction
    for class_idx in np.unique(np.concatenate([mask, pred])):
        if class_idx <= 20:
            # Convert normalized colormap value to RGB (0-255)
            color = cmap(norm(class_idx))[:3]  # Get RGB (ignore alpha)
            color = (np.array(color) * 255).astype(np.uint8)
            colored_mask[mask == class_idx] = color
            colored_pred[pred == class_idx] = color
        elif class_idx == 255:
            # Void label mapped to white, consistent with original visualize_segmentation
            colored_mask[mask == class_idx] = (255, 255, 255)
            colored_pred[pred == class_idx] = (255, 255, 255)
        else:
            print(f"Warning: Class index {class_idx} not in expected range [0, 20] or 255. Using black.")
            colored_mask[mask == class_idx] = (0, 0, 0)
            colored_pred[pred == class_idx] = (0, 0, 0)
    
    # Visualize
    plt.figure(figsize=(15, 5))
    
    plt.subplot(1, 3, 1)
    plt.title("Input Image")
    plt.imshow(image)  # May need denormalization if normalized
    plt.axis('off')
    
    plt.subplot(1, 3, 2)
    plt.title("Ground Truth")
    plt.imshow(colored_mask)
    plt.axis('off')
    
    plt.subplot(1, 3, 3)
    plt.title("Prediction")
    plt.imshow(colored_pred)
    plt.axis('off')
    
    plt.show()
    
    # Return colored masks as PIL Images
    return Image.fromarray(colored_mask), Image.fromarray(colored_pred)

# Example visualization
images, masks = next(iter(val_dataloader))
images = images.to(device)
masks = masks.to(device)
with torch.no_grad():
    outputs = model(images)['out']
    preds = torch.argmax(outputs, dim=1)
    preds = torch.where(preds == 21, 255, preds)

visualize_segmentation(images[8], masks[8], preds[8])

In [None]:
def visualize_loss(train_loss, val_loss, save_path=None):
    epochs = list(range(1, len(train_loss) + 1))

    plt.figure(figsize=(10, 6))
    plt.plot(epochs, train_loss, label='Training Loss', marker='o', color='blue')
    plt.plot(epochs, val_loss, label='Validation Loss', marker='s', color='orange')
    
    plt.xlabel('Epoch')
    plt.ylabel('Loss (Cross-Entropy)')
    plt.title('Training and Validation Loss Over Epochs')
    plt.legend()
    plt.grid(True)
    
    if save_path:
        plt.savefig(save_path)
        print(f"Plot saved to {save_path}")

    plt.show()
    plt.close()

visualize_loss(train_losses, val_losses, 'loss.png')

## Submit to competition
You don't need to edit this section. Just use it at the right position in the notebook. See the definition of this function in Sect. 1.3 for more details.

In [None]:
generate_submission(test_df)

# 4. Adversarial attack
For this part, your goal is to fool your classification and/or segmentation CNN, using an *adversarial attack*. More specifically, the goal is build a CNN to perturb test images in a way that (i) they look unperturbed to humans; but (ii) the CNN classifies/segments these images in line with the perturbations.

# 5. Discussion
Finally, take some time to reflect on what you have learned during this assignment. Reflect and produce an overall discussion with links to the lectures and "real world" computer vision.