<h1>DISCERN</h1>
<h2>Diversity-based Selection of Centroids and k-Estimation for Rapid Non-stochastic clustering</h2>
<h3>Applying DISCERN to an image dataset using ConvNet embeddings</h3>

Please note that running this notebook requires PyTorch (torchvision + CUDA). Please note that you may need to download the pretrained model, and the dataset beforehand.

<h3>1. Import dependencies</h3>

We're going to do something a bit different from the paper, which is to use a pretrained MoCo which is trained in unlabled data using contrastive loss. We highly recommend you check out <a href="https://arxiv.org/abs/1911.05722">the paper</a> as well as the <a href="https://github.com/facebookresearch/moco/">GitHub repo</a>.

In [1]:
pretrained_path = "moco_v2_800ep_pretrain.pth.tar" ####Link available in MoCo's GitHub Repo####

In [2]:
import numpy as np

import torch
import torch.nn as nn

import torchvision
import torchvision.transforms as transforms

from sklearn.decomposition import PCA

from DISCERN import DISCERN
from utils import purity_score as purity

Loading the weights of the MoCo query encoder from the checkpoint we downloaded.

In [3]:
model = torchvision.models.resnet50()
checkpoint = torch.load(pretrained_path, map_location="cpu")

state_dict = checkpoint['state_dict']
for k in list(state_dict.keys()):
    if k.startswith('module.encoder_q') and not k.startswith('module.encoder_q.fc'):
        state_dict[k[len("module.encoder_q."):]] = state_dict[k]
    del state_dict[k]

msg = model.load_state_dict(state_dict, strict=False)
model.fc = nn.Identity()
model = model.cuda()

<h3>2. Import data</h3>

We're going to try <a href="https://github.com/fastai/imagenette">ImageNette</a>, which was mentioned in the paper. The only difference is that we used an labeled ImageNet-pretrained ResNet101 then and here we're using a ResNet50 which was trained in a completely unsupervised setting (MoCo).

Note that the following was done on a single Cloud GPU (Tesla T4), so you may need to adjust the batch sizes.

In [4]:
tr = transforms.Compose([
                         transforms.Resize((224, 224)),
                         transforms.ToTensor(),
                         transforms.Normalize(mean=[0.485, 0.456, 0.406],
                                              std=[0.229, 0.224, 0.225])])

trainset = torchvision.datasets.ImageFolder("imagenette2-320/train", tr)
valset = torchvision.datasets.ImageFolder("imagenette2-320/val", tr)

train_loader = torch.utils.data.DataLoader(trainset, batch_size=1024, shuffle=False, num_workers=2)
val_loader = torch.utils.data.DataLoader(valset, batch_size=1024, shuffle=False, num_workers=2)

Now we're going to use the MoCo-trained ResNet50 model to get the latent embeddings of the training and validation sets into numpy ndarrays so that DISCERN can process them.

In [5]:
X_train = np.zeros((len(trainset), 2048))
X_val = np.zeros((len(valset), 2048))

y_train = np.zeros((len(trainset)), dtype=int)
y_val = np.zeros((len(valset)), dtype=int)

model.eval()
with torch.no_grad():
    ctr = 0
    for i, (images, target) in enumerate(train_loader):
        images = images.cuda()
        output = model(images)
        ctr_new = ctr + images.shape[0]
        X_train[ctr:ctr_new, :] = output.cpu().numpy()
        y_train[ctr:ctr_new] = target.numpy()
        ctr = ctr_new
    ctr = 0
    for i, (images, target) in enumerate(val_loader):
        images = images.cuda()
        output = model(images)
        ctr_new = ctr + images.shape[0]
        X_val[ctr:ctr_new, :] = output.cpu().numpy()
        y_val[ctr:ctr_new] = target.numpy()
        ctr = ctr_new


num_class = len(np.unique(y_train))

In [6]:
print("{} training samples, {} validation samples".format(X_train.shape[0], X_val.shape[0]))

9469 training samples, 3925 validation samples


<h3>3. Running DISCERN</h3>

Running DISCERN with a limit on the number of clusters it can find.

In [7]:
d = DISCERN(max_n_clusters=100)
d.fit(X_train)

In [8]:
c_val = d.predict(X_val)

In [9]:
val_accuracy = purity(y_val, c_val)*100
num_clusters = len(np.unique(d.labels_))

In [10]:
print("[Supervised   Performance] Accuracy: {} %".format(val_accuracy))
print("Predicted number of clusters: {}".format(num_clusters))
print("Number of classes: {}".format(num_class))

[Supervised   Performance] Accuracy: 50.318471337579616 %
Predicted number of clusters: 7
Number of classes: 10


As it can be seen, it is not perfect. When you apply PCA, you can further improve it.

In [11]:
X_train_ds = PCA(1024, tol=1e-10, random_state=0).fit_transform(X_train)
X_val_ds = PCA(1024, tol=1e-10, random_state=0).fit_transform(X_val)

In [12]:
d2 = DISCERN(max_n_clusters=100)
d2.fit(X_train_ds)
c_val_ds = d2.predict(X_val_ds)
val_accuracy_ds = purity(y_val, c_val_ds)*100
num_clusters_ds = len(np.unique(d2.labels_))

In [13]:
print("[Supervised   Performance] Accuracy: {} %".format(val_accuracy_ds))
print("Predicted number of clusters: {}".format(num_clusters_ds))
print("Number of classes: {}".format(num_class))

[Supervised   Performance] Accuracy: 67.87261146496816 %
Predicted number of clusters: 13
Number of classes: 10


<b>Note that NO LABELS were used to achieve these results, as MoCo was trained in a completely unsupervised setting.</b>

DISCERN has so far performed best on semi-supervised-trained networks such as FaceNet for object re-identification. However, its complexity remains an issue which we're looking forward to tackle.