HHU Deep Learning, SS2022/23, 12.05.2023, Prof. Dr. Markus Kollmann

Lecturers and Tutoring is done by Tim Kaiser, Nikolas Adaloglou and Felix Michels.

# Assignment 05 - Image Clustering


## Contents

1. Imports, basic utils, augmentations
2. Load the pretrained MoCO model ResNet50 pretrained on ImageNet
3. Compute the k-means clustering accuracy using the learned representations
4. T-SNE visualization of features
5. Compute the 50-NN
6. Write a new dataset class to load image pairs
7. Implement the SCAN loss
8. Implement the PMI loss. Train the clustering head and compute the validation accuracy
9. Pretraining code. (Provided, no need to change something here!)
10. Train with SCAN and PMI using the KNN pairs
11. Get cluster assignments and evaluate cluster accuracy

# Introduction 

Image clustering in deep learning can be mathematically described as a process of partitioning a set of images, X, into K clusters, where K is a user-defined parameter representing the number of desired clusters.

Let V(X) be the visual feature representation of the images in X, obtained using a deep learning algorithm such as a convolutional neural network (CNN). Each image in X is transformed into a feature vector in V(X), where the dimensions correspond to the learned features of the CNN.

Image clustering is a task in deep learning where an algorithm is used to group similar images together based on their visual characteristics. Ideally, images with similar ground truth labels will belong in the same cluster.

The goal of image clustering is to automatically categorize large sets of images into smaller subsets based on their similarities, which can help in organizing and managing large image datasets.

To accomplish this task, deep learning algorithms use complex mathematical models to analyze and identify patterns within the images, and then group the images that share these patterns into clusters. This process can be useful in a variety of applications, such as image recognition, image search, and content-based image retrieval.


[SimCLR Paper](https://arxiv.org/abs/2002.05709)

[MoCo Paper](https://arxiv.org/abs/1911.05722)

[SCAN Paper](https://arxiv.org/abs/2005.12320v2)

[TEMI](https://arxiv.org/abs/2303.17896)

# Part I. Imports, basic utils, augmentations

In [None]:
import os
import torch
import torchvision.models as models
import numpy as np

import torch
import torchvision
import torchvision.transforms as T
import torch.nn as nn
import torch.nn.functional as F
from torchvision.datasets import STL10
from torch.utils.data import DataLoader
from torch.optim import Adam
import tqdm

import numpy as np
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans
# Local imports
from utils import *

os.makedirs("./figs", exist_ok=True)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# Part II: Load the pretrained MoCO model ResNet50 pretrained on ImageNet

[Weights are available in this link](https://dl.fbaipublicfiles.com/moco/moco_checkpoints/moco_v2_800ep/moco_v2_800ep_pretrain.pth.tar)

You can download the weight by running the terminal command:

`$ wget link_to_model_weights`

In [None]:
def load_moco_model(pretrained_path = "./moco_v2_800ep_pretrain.pth.tar"):
    ### START CODE HERE ### (≈ 11 lines of code)
    ckpt = torch.load(pretrained_path, map_location='cpu')
    print(ckpt.keys(), ckpt["arch"], ckpt["epoch"])
    state_dict = ckpt["state_dict"]
    state_dict_new = dict()
    for key in state_dict.keys():
        new_key = key.replace("module.encoder_q.","")
        state_dict_new[new_key] = state_dict[key]
    model = getattr(models, ckpt["arch"])(pretrained=False)
    model.fc = nn.Identity()
    msg = model.load_state_dict(state_dict_new, strict=False)
    print("Loaded model with message:", msg)
    ### END CODE HERE ###
    model.eval()
    return model

encoder = load_moco_model() 

### Expected results

There should be no missing keys, while loading the model. There may be some unexpected keys based on your implementation.

```python
Loaded model with message: _IncompatibleKeys(missing_keys=[], unexpected_keys=['fc.0.weight', 'fc.0.bias', 'fc.2.weight', 'fc.2.bias'])
```

# Part III: Compute the k-means clustering accuracy using the learned representations


- Compute the frozen features representations of the backbone model.
- Compute the accuracy both for the `train` and `test` split using Kmeans.

Hint: you may use the function 'compute_clustering_metrics' defined in utils.py


In [None]:
transf = T.Compose([
            T.ToTensor(),
            T.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])])
### START CODE HERE ### (≈>10 lines of code)

# compute features for train and val

#Fitt k-means.....


# compute clustering metrics for train and val
train_acc = compute_clustering_metrics(train_labels.cpu().numpy(), train_preds,min_samples_per_class=10)[0]
val_acc = compute_clustering_metrics(train_labels.cpu().numpy(), train_preds, min_samples_per_class=10)[0]

### END CODE HERE ###
print(f"Train acc: {train_acc:.2f}, Val acc: {val_acc:.2f}")

### Expected results

`Train acc: 53.64, Val acc: 53.64`

# Part IV. T-SNE visualization of features

As in the previous exercise, check the results of linear probing on the supervised training split and the T-SNE visualization.

Code for the T-SNE visualization exists in `utils.py`.

In [None]:
### START CODE HERE ### (≈ 3 line of code)
# TSNE plot

### END CODE HERE ###

# Part V. Compute the 50-NN

- Load the train features
- Use the cosine similarity
- Compute the k=50 nearset neiboughrs(NN) on the feature space of the pretrained ResNet50
- save the indices of the k-NN.
- Visualize the top 5 NN for a couple of images (~10)

In [None]:
# Provided but optional to use!
class_names = torchvision.datasets.STL10(root='../data').classes
def vizualize_pairs(indices, true_labels, train_ds):
    # Visualize the reference image and its 7 nearest neighbors
    ref_ids = [0, 100, 200, 300, 400, 500, 600, 700, 800, 900]
    nn_viz = 6 
    plt.subplots_adjust(wspace=0.4, hspace=0.4)
    plt.figure(figsize = (22,22))
    ax = plt.gca()
    ax.get_xaxis().set_visible(False)
    ax.get_yaxis().set_visible(False)
    for c, ref in enumerate(ref_ids):
        knns = indices[ref, :nn_viz]
        imgs_to_viz = [train_ds[ref][0]]
        true_labels = [train_ds[ref][1]]
        for i in knns:
            imgs_to_viz.append(train_ds[i][0])
            true_labels.append(train_ds[i][1])
        # show the images
        for j in range(nn_viz):
            label = int(true_labels[j])
            plt.subplot(len(ref_ids), nn_viz, (c*nn_viz)+(j+1))
            imshow(imgs_to_viz[j])
            plt.title(f"{class_names[label]}, Label {label}", fontsize = 10)
            ax = plt.gca()
            ax.get_xaxis().set_visible(False)
            ax.get_yaxis().set_visible(False)
    plt.savefig(f'./figs/knn_viz', bbox_inches = "tight", dpi = 500) 

### START CODE HERE ### (≈ 10 line of code)

# compute the similarity matrix

# take top k similar images


# save the indices

### END CODE HERE ###

# Part VI. Write a new dataset class to load image pairs

- The new dataset class will inherit from `torch.utils.data.Dataset`
- It will return the representations of 2 images that are in the 50-NN (randomly sampled).

In [None]:
### START CODE HERE (≈ 12 lines of code)
class PairSTL10(torch.utils.data.Dataset):
    def __init__(self, indices_path="./knn_indices.pth", embeds_path="./train_feats.pth", l2_normalize=True):

    def __len__(self):

    def __getitem__(self, index):

### END CODE HERE
    
def test_get_pair():
    dataset = PairSTL10()
    emb1, emb2 = dataset[16]
    print(emb1.shape, emb2.shape)
    assert emb1.shape==emb2.shape 

test_get_pair()
train_loader = torch.utils.data.DataLoader(PairSTL10(), batch_size=128, shuffle=True, num_workers=4)
data_batch = next(iter(train_loader))

# Part VII. Implement the SCAN loss

Check the SCAN paper, specifically Eq.2 for details.

In [None]:
class SCAN(torch.nn.Module):
    def __init__(self, alpha=1):
        super().__init__()
        self.alpha = alpha

    def forward(self, proj_1, proj_2):
        # START CODE HERE (≈ 6 line of code)
        
        # dot product
        

        # self-entropy regularization

        ### END CODE HERE

def test_scan():
    torch.manual_seed(99)
    scan = SCAN(alpha=1)
    proj_1 = torch.randn(100, 128)
    proj_2 = torch.randn(100, 128)
    loss = scan(proj_1, proj_2)
    print(loss)
    assert loss.shape==torch.Size([])
test_scan()

### Expected results

For alpha=1, output = `tensor(0.0275)`

# Part VIII. Implement the PMI loss. Train the clustering head and compute the validation accuracy

Implement the PMI loss based on eq 6,7,8 from the paper https://arxiv.org/pdf/2303.17896.pdf

As a side note we didnt use the symmetrized version of the loss in the exercise: Loss = -PMI, don't forget the sign.

In [None]:
class PMI(torch.nn.Module):
    def __init__(self, gamma=1, momentum=0.99, temp=0.1):
        super().__init__()
        self.gamma = gamma
        self.temp = temp
        self.center  = None
        self.momentum = momentum
    
    # START CODE HERE (≈ 6 line of code)
    @torch.no_grad()
    def update_ema(self, output):
        """
        Update exponential moving average of the center (denominator)
        """
        
    def forward(self, proj_1, proj_2):
        
    ### END CODE HERE

def test_pmi():
    torch.manual_seed(99)
    criterion = PMI(gamma=1)
    proj_1 = torch.rand(100, 128)
    proj_2 = torch.rand(100, 128)
    loss = criterion(proj_1, proj_2)
    print(loss)
    assert loss.shape==torch.Size([])
 
test_pmi()

### Expected results 

`tensor(0.0738)`

# Part IX. PROVIDED: Pretraining code

This part is provided, but please take a look and identify what is changing compared to the standard train loop.

You don't need to code something here, unless there is some inconsitency with the previous parts of the code.

Still, this code works in our proposed solution and it's your job to modify it if it doesnt work well with the previous code based on your implementations.

In [None]:
import copy 


def pretrain(model, optimizer, num_epochs, train_loader, criterion, device, prefix="scan", model_ema=False):
    dict_log = {"train_loss":[]}
    best_loss = 1e8
    model = model.to(device)
    pbar = tqdm(range(num_epochs))
    for epoch in pbar:
        loss_curr_epoch = pretrain_one_epoch(model, optimizer, train_loader, criterion, device, model_ema=model_ema)
        msg = (f'Ep {epoch}/{num_epochs}: || Loss: Train {loss_curr_epoch:.3f}')
        pbar.set_description(msg)
        dict_log["train_loss"].append(loss_curr_epoch)
        if loss_curr_epoch < best_loss:
            best_loss = loss_curr_epoch
            save_model(model, f'{prefix}_best_model_min_val_loss.pth', epoch, optimizer, best_loss)   
    return dict_log

class EMA():
    def __init__(self, alpha, student):
        super().__init__()
        self.alpha = alpha
        self.teacher = copy.deepcopy(student)
        for p in self.teacher.parameters():
            p.requires_grad = False
    
    def update_average(self, old, new):
        if old is None:
            return new
        return old * self.alpha + (1 - self.alpha) * new
    
    def update_teacher(self, student):
        for ema_params, student_params in zip(self.teacher.parameters(), student.parameters()):
            old_weight, student_weight = ema_params.data, student_params.data
            ema_params.data = self.update_average(old_weight, student_weight)


def pretrain_one_epoch(model, optimizer, train_loader, criterion, device, model_ema=False):
    """
    model: the model to train
    optimizer: the optimizer to use
    train_loader: the train loader
    criterion: the loss function, PMI or SCAN
    device: the device to use
    model_ema: whether to use EMA or not
    """
    model.train()
    loss_step = []
    if model_ema:
        ema = EMA(0.99, model)
    for data in train_loader:
        # Move the data to the GPU
        img1, img2 = data
        img1, img2 = img1.to(device), img2.to(device)
        p1 = model(img1)
        p2 = ema.teacher(img2) if model_ema else model(img2)
        loss = criterion(p1, p2)
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        loss_step.append(loss.item())
        if model_ema:
            ema.update_teacher(model)
    loss_curr_epoch = np.mean(loss_step)
    return loss_curr_epoch

# Part X. Train with SCAN and PMI using the KNN pairs

- Load the data using the implemented dataloader
- Create a clustering head
- Train head using Adam: optimizer, lr=1e-4, weight_decay=1e-6 for 150 epochs.
- Train with SCAN and PMI and compare them with k-means.

You can use the pretrain function:
```python
dict = pretrain(head, optimizer, num_epochs, train_loader, criterion, ......)
```

Training should **not** take more than 5 minutes for both models.

We used: `PMI(gamma=0.65, momentum=0.9, temp=0.1)` for PMI

In [None]:
### START CODE HERE ### (>15 line of code)
# SCAN
criterion = SCAN(alpha=......)

optimizer = torch.optim.Adam(scan_head.parameters(), lr=1e-4, weight_decay=1e-6)
dict_log_scan = pretrain(scan_head, optimizer, num_epochs, train_loader, criterion, device, prefix="scan")

# PMI
criterion = PMI(.....)

optimizer = torch.optim.Adam(pmi_head.parameters(), lr=1e-4, weight_decay=1e-6)
dict_log_pmi = pretrain(pmi_head, optimizer, num_epochs, train_loader, criterion, device, prefix="pmi", model_ema=True)
### END CODE HERE ###

# Part XI. Get cluster assignments and evaluate cluster accuracy

- Load the model trained with both objectives.
- Predict cluster assignments.
- Compute the clustering accuracy using `compute_clustering_metrics`

In [None]:
@torch.no_grad()
def evaluate_clustering(model):
    model.eval()
    val_feats, val_labels = torch.load("val_feats.pth"), torch.load("val_labels.pth")
    train_feats, train_labels = torch.load("train_feats.pth"), torch.load("train_labels.pth")
    ### START CODE HERE ### (≈ 10 lines of code)
    # normalize feats
    
    # load features and compute logits


    # compute metrics
    print("Unique preds", np.unique(train_preds), np.unique(val_preds))
    metrics_train = compute_clustering_metrics(train_labels.cpu().numpy(), train_preds, min_samples_per_class=10)
    metrics_val = compute_clustering_metrics(val_labels.cpu().numpy(), val_preds,min_samples_per_class=10)
    return metrics_train[0], metrics_val[0]
    ### END CODE HERE ###
    

# Given but you may need to MODIFY the paths!!!!
n_clusters = 10
### START CODE HERE ### (4 lines of code)
model = ....
model_scan = load_model(model, "./scan_best_model_min_val_loss.pth")
model = ....
model_pmi = load_model(model, "./pmi_best_model_min_val_loss.pth")
### END CODE HERE ###
train_acc, val_acc = evaluate_clustering(model_scan)
print(f"SCAN: Train acc: {train_acc:.3f}, Val acc: {val_acc:.3f}")
train_acc, val_acc = evaluate_clustering(model_pmi)
print(f"PMI: Train acc: {train_acc:.3f}, Val acc: {val_acc:.3f}")

### Expected results:
Current best scores! Results may slightly vary between runs.
```
Model ./scan_best_model_min_val_loss.pth is loaded from epoch 148 , loss -22.383880043029784
Model ./pmi_best_model_min_val_loss.pth is loaded from epoch 129 , loss -2.0719790697097777
Unique preds [0 1 2 3 4 5 6 7 8 9] [0 1 2 3 4 5 6 7 8 9]
SCAN: Train acc: 74.380, Val acc: 74.450
Unique preds [0 1 2 3 4 5 6 7 8 9] [0 1 2 3 4 5 6 7 8 9]
PMI: Train acc: 77.280, Val acc: 78.238
```

# Conclusion and Bonus reads

That's the end of this exercise. If you reached this point, congratulations!

Additional things to to (Optional):

- Plot the histogram of class assignments for SCAN and PMI
- Compute the mean and median max softmax probability for SCAN and PMI
