HHU Deep Learning, SS2025, Prof. Dr. Markus Kollmann

Lecturers and Tutoring is done by Nikolas Adaloglou and Felix Michels.

# Assignment 02 - Rotation prediction as a pretext task
---

Submit the solved notebook (not a zip) with your full name plus assignment number for the filename as an indicator, e.g `max_mustermann_a1.ipynb` for assignment 1. If we feel like you have genuinely tried to solve the exercise, you will receive 1 point for this assignment, regardless of the quality of your solution.

## <center> DUE FRIDAY 02.05.2025 2:30 pm </center>

Drop-off link: [https://uni-duesseldorf.sciebo.de/s/e0cYundCC8FG8QS](https://uni-duesseldorf.sciebo.de/s/e0cYundCC8FG8QS)

---



## Introduction to rotatation prediction


Rotation prediction provides a simple, yet effective way to learn rich representations from unlabeled image data.

The basic idea behind rotation prediction is that the network is trained to predict the orientation of a given image after it has been rotated by a certain angle (e.g., 0°, 90°, 180°, or 270°).

By doing so, the network is forced to learn features that are invariant to rotation, which can be very useful for downstream tasks such as object recognition or image classification.

Rotation prediction is also a relatively simple task that can be applied to large amounts of unlabeled data, which makes it a good choice for pretraining neural networks.

Therefore, it has become a popular choice for pretraining in many computer vision tasks.

In this exercise, we will train a ResNet18 on the task of rotation prediction on the STL10 dataset.

Related paper: [UNSUPERVISED REPRESENTATION LEARNING BY PREDICTING IMAGE ROTATIONS](https://arxiv.org/pdf/1803.07728.pdf)


# Part I. Basic imports


In [None]:
!wget -nc https://raw.githubusercontent.com/HHU-MMBS/RepresentationLearning_PUBLIC_2024/main/exercises/week02/utils.py

In [None]:
import os
import numpy as np
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.utils.data import DataLoader, Dataset
import torchvision
import torchvision.transforms as transforms
import torch.optim as optim
import torch.utils.data as data
import random
import matplotlib.pyplot as plt
from torchvision import transforms as T
from tqdm import tqdm

from utils import *

# Task 1: Preparing the data

One way to apply rotations while loading the data is to create a `Dataset` class.
Instead of returning a tuple of `(img, label)` the dataset will now return `(img, rotation_class_id)`



In [None]:
class STL10Rot(Dataset):
### START CODE HERE ### (> 10 line)
### END CODE HERE ###


def load_data(batch_size=128, train_split="unlabeled", test_split="test"):
    # Returns a train and validation dataloader for STL10 dataset
    transf = T.Compose([T.ToTensor()])
    train_ds = STL10Rot(split=train_split, transform=transf)
    val_ds = STL10Rot(split=test_split, transform=transf)
    train_dl = DataLoader(train_ds, batch_size=batch_size, shuffle=True, num_workers=2)
    val_dl = DataLoader(val_ds, batch_size=batch_size, shuffle=False, num_workers=2)
    return train_dl, val_dl

def test_load_data():
    train_dl, val_dl = load_data(batch_size=4)
    for i, (x, y) in enumerate(train_dl):
        print(x.shape, y)
        plt.figure(figsize=(6, 6))
        imshow(x[0,...])
        break
    for i, (x, y) in enumerate(val_dl):
        print(x.shape, y.shape)
        plt.figure(figsize=(6, 6))
        imshow(x[0,...])
        break

test_load_data()

# Task 2: Load and modify Resnet18

Your task is to load and mofidy the Resnet18 architecture.

Which layers do you need to modify?

In [None]:

def load_resnet():
    ### START CODE HERE ### ( 4 line)
    ### END CODE HERE ###

def test_load_resnet():
    model = load_resnet()
    train_dl , _ = load_data(batch_size=4)
    x, target = next(iter(train_dl))
    y = model(x)
    loss = F.cross_entropy(y, target)
    print(y.shape, target.shape, loss)

test_load_resnet()

# Task 3:Launch training!
Choose the model and hyperparameters and launch the training.\
You can use the `pretrain` function from `utils.py`.\
It will store the best checkpoint (based on validation loss) in `best_model_min_val_loss.pth`.

The training should not take more than 1h (~50 mins with our setup).

In [None]:
### START CODE HERE ###
### END CODE HERE ###

# Task 4: Loading the best model and visualizing reconstructions

Use the validation split to visualize the rotated images and their predicted angles VS the applied ones.
Visualize 16 predictions in a 4x4 grid as illustrated below.

In [None]:
### START CODE HERE ### (~10 lines)
### END CODE HERE ###

### Expected result

![](https://github.com/HHU-MMBS/RepresentationLearning_PUBLIC_2024/blob/main/exercises/week02/figs/viz_prediction_solution.png?raw=true)

# Task 5: Compute the validation rotation prediction accuracy

What percentage of image rotations is the model able to predict correctly? You should be able to achieve above 90%.

Use the validation split.

In [None]:
### START CODE HERE ### (~6 line of code)
### END CODE HERE ###

### Expected result

```
Val Accuracy 97.99 || Loss Val 0.070
```

# Task 6: Get the training features from the trained encoder

Save the representations and labels of the *labeled* train split and validiation split on the disk (you can use `utils.get_features`).
The names should be `"train_feats.pth", "val_feats.pth", "train_labels.pth", "val_labels.pth"`.
Note: here we only used the supervised training split.

In [None]:
### START CODE HERE ### (≈ 8 line of code)
### END CODE HERE ###

# Task 7: Linear evaluation: Probing

For probing, you can use `linear_eval(classifier, optimizer, epochs, train_feat_dl, val_feat_dl, device)`.


Use the saved features and train a liner classifier on top on the **training split** of STL10.


In [None]:
### START CODE HERE ### (≈ 10 line of code)
### END CODE HERE ###

### Expected results

```
Ep 49/50: Accuracy : Train:98.32 	 Val:98.41 || Loss: Train 0.160 	 Val 0.156: 100%|██████████| 50/50 [00:04<00:00, 11.74it/s]
```

# Task 8: Visualize the computed features with TSNE
- Color the points based on ground truth labels
- Hint: Use `from sklearn.manifold import TSNE`

In [None]:
from sklearn.manifold import TSNE

### START CODE HERE ### (≈ 10 line of code)
### END CODE HERE ###

val_feats, val_labels = torch.load("val_feats.pth"), torch.load("val_labels.pth")
class_names = torchvision.datasets.STL10(root='../data').classes
tsne_plot_embeddings(val_feats, val_labels, class_names)

### Expected result

![link text](https://github.com/HHU-MMBS/RepresentationLearning_PUBLIC_2024/blob/main/exercises/week02/figs/tsne_plot_embeddings_solution.png?raw=true)

# Conclusion and Bonus reads

Are the features visually seperable in the 2D space? Why is that?

That's the end of this exercise. If you reached this point, congratulations!

If you are interested to delve into this topic further, here are some links:

- [Self-supervised learning and computer vision](https://www.fast.ai/posts/2020-01-13-self_supervised.html)
- [Self-Supervised Representation Learning: Introduction, Advances and Challenges](https://arxiv.org/abs/2110.09327)

