Current State: <br/>

I copied over all modules needed for training and completing the tasks. I tried putting everything in one notebook, such that it is easier to read and understand for you. I really dislike my code style here but it was rather rushed. Unfortunately, I dont have a stable connection here to debug this notebook on a GPU. The main things that need to be tested are:
1. Dataset module: I adapted the dataset from the original GitHub ressource. They also have a complete implementation of a model and everything: https://github.com/CoinCheung/triplet-reid-pytorch/tree/master. If something is not working, it should only require small fixes. You would have to tune and change the data augmentations applied to the images as I do not know if they are useful in our case. The original implementation of the dataset was used for bounding box detection.
2. The data mining strategy: I tried to adapt the data mining strategy with major simplifications that we have to make due to the dataset and for easier data loading.
The original implementation can be found here: https://github.com/davidsandberg/facenet/blob/096ed770f163957c1e56efa7feeb194773920f6e/src/train_tripletloss.py#L271
I rather fly over it and did not found it that helpful as it is written in rather complex TensorFlow code and I did not want to take time to fully understand it.
3. The training runs: I think the training runs should work as it is mainly copied from the session.

The things that still need to be added are the evaluations and visualizations, but I think that does not need to be too thorough.

To work with the dataset, please view the dataset module. There I explained how to download and pre-process the dataset to be working for our use-case.

I hope, that it is not too much work to get this running and I did a somewhat good job. 

In [None]:
import os
import torch
import torch.nn as nn
import torch.nn.functional as F
import torchvision as tv
import torchvision.transforms as transforms
from torch.utils.data import Dataset
import cv2
import numpy as np
from PIL import Image
import json
import matplotlib.pyplot as plt
import random
from tqdm import tqdm

# Utility Functions

In [None]:

def smooth(f, K=5):
    """ Smoothing a function using a low-pass filter (mean) of size K """
    kernel = np.ones(K) / K
    f = np.concatenate([f[:int(K//2)], f, f[int(-K//2):]])  # to account for boundaries
    smooth_f = np.convolve(f, kernel, mode="same")
    smooth_f = smooth_f[K//2: -K//2]  # removing boundary-fixes
    return smooth_f


def save_model(model, optimizer, epoch, stats, margin):
    """ Saving model checkpoint """
    
    if(not os.path.exists("checkpoints")):
        os.makedirs("checkpoints")
    savepath = f"checkpoints/checkpoint_epoch_{epoch}_margin_{margin}.pth"

    torch.save({
        'epoch': epoch,
        'model_state_dict': model.state_dict(),
        'optimizer_state_dict': optimizer.state_dict(),
        'stats': stats
    }, savepath)
    return


def load_model(model, optimizer, savepath):
    """ Loading pretrained checkpoint """
    
    checkpoint = torch.load(savepath, map_location="cpu")
    model.load_state_dict(checkpoint['model_state_dict'])
    optimizer.load_state_dict(checkpoint['optimizer_state_dict'])
    epoch = checkpoint["epoch"]
    stats = checkpoint["stats"]
    
    return model, optimizer, epoch, stats


def count_model_params(model):
    """ Counting the number of learnable parameters in a nn.Module """
    num_params = sum(p.numel() for p in model.parameters() if p.requires_grad)
    return num_params

def visualize_progress(train_loss, val_loss, start=0):
    """ Visualizing loss and accuracy """
    fig, ax = plt.subplots(1,3)
    fig.set_size_inches(24,5)

    smooth_train = smooth(train_loss, 31)
    ax[0].plot(train_loss, c="blue", label="Loss", linewidth=3, alpha=0.5)
    ax[0].plot(smooth_train, c="red", label="Smoothed Loss", linewidth=3, alpha=1)
    ax[0].legend(loc="best")
    ax[0].set_xlabel("Iteration")
    ax[0].set_ylabel("CE Loss")
    ax[0].set_yscale("linear")
    ax[0].set_title("Training Progress (linear)")
    
    ax[1].plot(train_loss, c="blue", label="Loss", linewidth=3, alpha=0.5)
    ax[1].plot(smooth_train, c="red", label="Smoothed Loss", linewidth=3, alpha=1)
    ax[1].legend(loc="best")
    ax[1].set_xlabel("Iteration")
    ax[1].set_ylabel("CE Loss")
    ax[1].set_yscale("log")
    ax[1].set_title("Training Progress (log)")

    smooth_val = smooth(val_loss, 31)
    N_ITERS = len(val_loss)
    ax[2].plot(np.arange(start, N_ITERS)+start, val_loss[start:], c="blue", label="Loss", linewidth=3, alpha=0.5)
    ax[2].plot(np.arange(start, N_ITERS)+start, smooth_val[start:], c="red", label="Smoothed Loss", linewidth=3, alpha=1)
    ax[2].legend(loc="best")
    ax[2].set_xlabel("Iteration")
    ax[2].set_ylabel("CE Loss")
    ax[2].set_yscale("log")
    ax[2].set_title(f"Valid Progress")

    return

def display_projections(points, labels, ax=None, legend=None):
    """ Displaying low-dimensional data projections """
    
    COLORS = ['r', 'b', 'g', 'y', 'purple', 'orange', 'k', 'brown', 'grey',
              'c', "gold", "fuchsia", "lime", "darkred", "tomato", "navy"]
    
    legend = [f"Class {l}" for l in np.unique(labels)] if legend is None else legend
    if(ax is None):
        _, ax = plt.subplots(1,1,figsize=(12,6))
    
    for i,l in enumerate(np.unique(labels)):
        idx = np.where(l==labels)

        ax.scatter(points[idx, 0], points[idx, 1], label=legend[int(l)], c=COLORS[i])
    ax.legend(loc="best")


# Dataset

In [None]:
class Market1501(Dataset):
    '''
        A dataset wrapper class for the Market1501 dataset.
        This class was adapted from https://github.com/CoinCheung/triplet-reid-pytorch/blob/master/datasets/Market1501.py

        The dataset has to be initially downloaded using the script: download_dataset.sh.
        Unfortunately it seems that the server is down.
        The other option is to download it from a Google drive: https://drive.google.com/file/d/0B8-rUzbwVRk0c054eEozWG9COHM/view?pli=1&resourcekey=0-8nyl7K9_x37HlQm34MmrYQ

        Before using the DataLoader one needs to run the script: build_meta_file.sh.
        This script reads the data and builds a meta file that contains the image names for each person to quickly load positive samples.

        The problem of the dataset is the test set that does not contain information about the persons.
        We take the images from the query directory as the test set.
    '''
    def __init__(self, data_path, sample_triplets = True, is_train = True, *args, **kwargs):
        super(Market1501, self).__init__(*args, **kwargs)
        self.is_train = is_train
        self.sample_triplets = sample_triplets
        self.data_path = os.path.join(data_path, 'bounding_box_train' if is_train else 'query')
        self.imgs = os.listdir(data_path)
        self.imgs = [el for el in self.imgs if os.path.splitext(el)[1] == '.jpg']
        self.lb_ids = [int(el.split('_')[0]) for el in self.imgs]
        self.lb_cams = [int(el.split('_')[1][1]) for el in self.imgs]
        self.imgs = [os.path.join(data_path, el) for el in self.imgs]
        with open(os.path.join(data_path, 'train_meta_dir.json' if is_train else 'train_meta_dir.json'), 'r') as f:
            self.meta_dir = json.load(f)
        # Default data augmentation from the original work.
        # This could possibly be further tuned.
        if is_train:
            self.trans = transforms.Compose([
                transforms.Resize((288, 144)),
                transforms.RandomCrop((256, 128)),
                transforms.RandomHorizontalFlip(),
                transforms.ToTensor(),
                transforms.Normalize((0.486, 0.459, 0.408), (0.229, 0.224, 0.225)),
            ])
        else:
            self.trans_tuple = transforms.Compose([
                transforms.ToTensor(),
                transforms.Normalize((0.486, 0.459, 0.408), (0.229, 0.224, 0.225))
                ])
            self.Lambda = transforms.Lambda(
                lambda crops: [self.trans_tuple(crop) for crop in crops])
            self.trans = transforms.Compose([
                transforms.Resize((288, 144)),
                transforms.TenCrop((256, 128)),
                self.Lambda,
            ])

        # useful for sampler
        self.lb_img_dict = dict()
        self.lb_ids_uniq = set(self.lb_ids)
        lb_array = np.array(self.lb_ids)
        for lb in self.lb_ids_uniq:
            idx = np.where(lb_array == lb)[0]
            self.lb_img_dict.update({lb: idx})

    def __len__(self):
        return len(self.imgs)

    def __getitem__(self, idx):
        if self.sample_triplets:
            return self._return_tiplets(idx)
        else:
            return self._return_doubles(idx)
    
    def _return_doubles(self, idx):
        anchor_img = Image.open(self.imgs[idx])
        anchor_label = self.imgs[idx][:4]

        pos_img_paths = self.meta_dir[anchor_label]
        pos_img_path = random.choice(pos_img_paths)

        # ugly protection from choosing the same image twice
        while pos_img_path == self.imgs[idx]:
            pos_img_path = random.choice(pos_img_paths)
        pos_img = Image.open(pos_img_path)
        pos_img_label = self.imgs[idx][:4]
  
        anchor_img = self.trans(anchor_img)
        pos_img = self.trans(pos_img)
        neg_img = self.trans(neg_img)

        return (anchor_img, pos_img), (anchor_label, pos_img_label)


    def _return_tiplets(self, idx):
        # This method opens a complete jpg file and returns a PIL image.
        # This could potentially be inefficient and slow down training.
        # Pre-processing the images prior to training and saving them in PyTorch tensor files might reduce the data loading overhead.
        # The dataloader should work with arbitrary image formats. One just needs to adjust Image.open() to torch.load().
        anchor_img = Image.open(self.imgs[idx])
        anchor_label = self.imgs[idx][:4]

        pos_img_paths = self.meta_dir[anchor_label]
        pos_img_path = random.choice(pos_img_paths)

        # ugly protection from choosing the same image twice
        while pos_img_path == self.imgs[idx]:
            pos_img_path = random.choice(pos_img_paths)
        pos_img = Image.open(pos_img_path)
        pos_img_label = self.imgs[idx][:4]
        
        neg_img_path = random.choice(self.meta_dir[pos_img_label])
        # again ugly protection from choosing a negative image from the same person
        while neg_img_path[:4]==anchor_label:
            neg_img_path = random.choice(self.meta_dir[pos_img_label])
        neg_img = Image.open(neg_img_path)
        neg_img_label = neg_img_path[:4]

        anchor_img = self.trans(anchor_img)
        pos_img = self.trans(pos_img)
        neg_img = self.trans(neg_img)

        return (anchor_img, pos_img, neg_img), (anchor_label, pos_img_label, neg_img_label)
    
    
train_dataset = Market1501(data_path = os.path.join(os.getcwd(),"Market-1501-v15.09.15"), is_train = True)
test_dataset = Market1501(data_path = os.path.join(os.getcwd(),"Market-1501-v15.09.15"), is_train = False)

train_loader = torch.utils.data.DataLoader(dataset=train_dataset, batch_size=64, shuffle=True) 
test_loader = torch.utils.data.DataLoader(dataset=test_dataset, batch_size=64, shuffle=False) 

# Model

In [None]:
# Utils Modules 
# Here I simply adjusted the implementation of the model presented in the last session.

class NormLayer(nn.Module):
    """ Layer that computer embedding normalization """
    def __init__(self, l=2):
        """ Layer initializer """
        assert l in [1, 2]
        super().__init__()
        self.l = l
        return
    
    def forward(self, x):
        """ Normalizing embeddings x. The shape of x is (B,D) """
        x_normalized = x / torch.norm(x, p=self.l, dim=-1, keepdim=True)
        return x_normalized

# An adapted version of the Siamese model that uses a ResNet-18 backbone that can be either pretrained or not.
# We remove the classifier layers from the backbone.

class SiameseModel(nn.Module):
    """ 
    Implementation of a simple siamese model 
    """
    def __init__(self, emb_dim=32, pretrained=False):
        """ Module initializer """
        super().__init__()
        
        # convolutional feature extractor
        resnet = tv.models.resnet18(pretrained=pretrained)
        # Remove classification layer
        self.backbone = nn.Sequential(*list(resnet.children())[:-1])
        # fully connected embedder
        self.fc = nn.Linear(512, emb_dim)
        
        # auxiliar layers
        self.flatten = nn.Flatten()
        self.norm = NormLayer()
    
        return
    
    def forward_one(self, x):
        """ Forwarding just one sample through the model """
        x = self.cnn(x)
        x_flat = self.flatten(x)
        x_emb = self.fc(x_flat)
        x_emb_norm = self.norm(x_emb)
        return x_emb_norm
    
    def forward(self, anchor, positive, negative=None):
        """ Forwarding a triplet """
        anchor_emb = self.forward_one(anchor)
        positive_emb = self.forward_one(positive)
        if negative is not None:
            negative_emb = self.forward_one(negative)
        
        # We could also do the more efficient version as seen below.
        # It really depends on the GPU and batch size...
        
        # imgs = torch.concat([anchor, positive, negative], dim=0)
        # embs = self.forward_one(imgs)
        # anchor_emb, positive_emb, negative_emb = torch.chunk(embs, 3, dim=0)
        if negative is not None:
            return anchor_emb, positive_emb, negative_emb
        else:
            return anchor_emb, positive_emb

# Loss Functions

In [None]:

class TripletLoss(nn.Module):
    """ Implementation of the triplet loss function """
    def __init__(self, margin=0.2, reduce="mean"):
        """ Module initializer """
        assert reduce in ["mean", "sum"]
        super().__init__()
        self.margin = margin
        self.reduce = reduce
        return
        
    def forward(self, anchor, positive, negative):
        """ Computing pairwise distances and loss functions """
        # L2 distances
        d_ap = (anchor - positive).pow(2).sum(dim=-1)
        d_an = (anchor - negative).pow(2).sum(dim=-1)
        
        # triplet loss function
        loss = (d_ap - d_an + self.margin)
        loss = torch.maximum(loss, torch.zeros_like(loss))
        
        # averaging or summing      
        loss = torch.mean(loss) if(self.reduce == "mean") else torch.sum(loss)
      
        return loss

class TripletLossWithMining(nn.Module):
    def __init__(self, margin=0.2, reduce="mean"):
        super().__init__()
        self.margin = margin
        self.reduce = reduce
        return
    
    def forward(self, anchor, positive, labels):
        """
            The idea here is that we sample an anchor and positive example from the dataset.
            Next we compute the distance between the anchor and positive examples.
            Then for each anchor, we compute the distance to the other anchors in the batch
            and choose a random negative example (ensuring that it is not of the same class as the anchor),
            that has a larger distance than the positive example.

            Since we have a large number of persons with only a few images per person,
            we cannot meet the requirements of minimum number of persons per batch etc.
            So I decided to use this simple approach that also makes data loading much easier.

            Implementation Comment:
                Still, I am not sure how this works in practice and if it properly works,
                so I tried to make the implementation expressive enough to further debug it
        """
        d_ap = (anchor - positive).pow(2).sum(dim=-1)
        d_an = torch.zeros_like(d_ap)
        for i, emb in enumerate(anchor):
            neg_emb_dist = (anchor - emb).pow(2).sum(dim=-1)
            unequal_label_index = torch.where(labels!= labels[i])[0] # Get indices of the anchors with different labels
            unequal_neg_emb_dist = neg_emb_dist[unequal_label_index]
            larger_than_pos_index = torch.where(unequal_neg_emb_dist > d_ap[i])[0] # Get the indices of the anchors with different class and larger distances than the positive example
            d_an[i] = random.choice(unequal_neg_emb_dist[larger_than_pos_index])

        # triplet loss function
        loss = (d_ap - d_an + self.margin)
        loss = torch.maximum(loss, torch.zeros_like(loss))
        
        # averaging or summing      
        loss = torch.mean(loss) if(self.reduce == "mean") else torch.sum(loss)
      
        return loss

# TODO: Implement the other two loss functions

# Training Functions

In [None]:

class Trainer:
    """
    Class for training and validating a siamese model
    """
    
    def __init__(self, model, criterion, train_loader, valid_loader=None, n_iters=1e4, mining_strategy=False):
        """ Trainer initializer """
        self.model = model
        self.criterion = criterion
        self.train_loader = train_loader
        self.valid_loader = valid_loader
        
        self.n_iters = int(n_iters)
        self.optimizer = torch.optim.Adam(model.parameters(), lr=3e-4, weight_decay=1e-5)
        self.device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
        self.model.to(self.device)
        
        self.train_loss = []
        self.valid_loss = []

        self.mining_strategy = mining_strategy  
        return
    
    @torch.no_grad()
    def valid_step(self, val_iters=100):
        """ Some validation iterations """
        self.model.eval()
        cur_losses = []
        for i, ((anchors, positives, negatives),_) in enumerate(self.valid_loader):   
            # setting inputs to GPU
            anchors = anchors.to(self.device)
            positives = positives.to(self.device)
            negatives = negatives.to(self.device)
            
            # forward pass and triplet loss
            anchor_emb, positive_emb, negative_emb = self.model(anchors, positives, negatives)
            loss = self.criterion(anchor_emb, positive_emb, negative_emb)
            cur_losses.append(loss.item())
            
            if(i >= val_iters):
                break
    
        self.valid_loss += cur_losses
        self.model.train()
        
        return cur_losses
    
    @torch.no_grad()
    def valid_step_with_mining(self, val_iters=100):
        """ Some validation iterations """
        self.model.eval()
        cur_losses = []
        for i, ((anchors, positives),(lbl, pos_lbl)) in enumerate(self.valid_loader):   
            # setting inputs to GPU
            anchors = anchors.to(self.device)
            positives = positives.to(self.device)
            
            # forward pass and triplet loss
            anchor_emb, positive_emb = self.model(anchors, positives)
            loss = self.criterion(anchor_emb, positive_emb, lbl)
            cur_losses.append(loss.item())
            
            if(i >= val_iters):
                break
    
        self.valid_loss += cur_losses
        self.model.train()
        
        return cur_losses
    
    def fit(self):
        """ Train/Validation loop """
        if self.mining_strategy:
            self._fit_with_mining()
        else:
            self._fit_without_mining()
    
 

    def _fit_without_mining(self):
        self.iter_ = 0
        progress_bar = tqdm(total=self.n_iters, initial=0)

        for i in range(self.n_iters):
            for (anchors, positives, negatives), _ in self.train_loader:     
                # setting inputs to GPU
                anchors = anchors.to(self.device)
                positives = positives.to(self.device)
                negatives = negatives.to(self.device)

                # forward pass and triplet loss

                anchor_emb, positive_emb, negative_emb = self.model(anchors, positives, negatives)
                loss = self.criterion(anchor_emb, positive_emb, negative_emb)
                self.train_loss.append(loss.item())

                # optimization
                self.optimizer.zero_grad()
                loss.backward()
                self.optimizer.step()

                # updating progress bar
                progress_bar.set_description(f"Train Iter {self.iter_}: Loss={round(loss.item(),5)})")

                # doing some validation every once in a while
                if(self.iter_ % 250 == 0):
                    cur_losses = self.valid_step()
                    print(f"Valid loss @ iteration {self.iter_}: Loss={np.mean(cur_losses)}")

                self.iter_ = self.iter_+1 
                if(self.iter_ >= self.n_iters):
                    break
            if(self.iter_ >= self.n_iters):
                break
        return
    

    def _fit_with_mining(self):
        self.iter_ = 0
        progress_bar = tqdm(total=self.n_iters, initial=0)

        for i in range(self.n_iters):
            for (anchors, positives), (lbl, pos_lbl) in self.train_loader:     
                # setting inputs to GPU
                anchors = anchors.to(self.device)
                positives = positives.to(self.device)

                # forward pass and triplet loss

                anchor_emb, positive_emb = self.model(anchors, positives)
                loss = self.criterion(anchor_emb, positive_emb, lbl)
                self.train_loss.append(loss.item())

                # optimization
                self.optimizer.zero_grad()
                loss.backward()
                self.optimizer.step()

                # updating progress bar
                progress_bar.set_description(f"Train Iter {self.iter_}: Loss={round(loss.item(),5)})")

                # doing some validation every once in a while
                if(self.iter_ % 250 == 0):
                    cur_losses = self.valid_step()
                    print(f"Valid loss @ iteration {self.iter_}: Loss={np.mean(cur_losses)}")

                self.iter_ = self.iter_+1 
                if(self.iter_ >= self.n_iters):
                    break
            if(self.iter_ >= self.n_iters):
                break
        return

# Training: Pre-trained ResNet-18

In [None]:
model = SiameseModel()
criterion = TripletLoss(margin=0.2)
trainer = Trainer(model=model, criterion=criterion, train_loader=train_loader, valid_loader=test_loader, n_iters=1000)

In [None]:
trainer.fit()


In [None]:
visualize_progress(trainer.train_loss, trainer.valid_loss, start=120)


In [None]:
stats = {
    "train_loss": trainer.train_loss,
    "valid_loss": trainer.valid_loss
}
save_model(trainer.model, trainer.optimizer, trainer.iter_, stats, margin=0.2)

# Training: ResNet-18 from Scratch

In [None]:

model = SiameseModel()
criterion = TripletLoss(margin=0.2)
trainer = Trainer(model=model, criterion=criterion, train_loader=train_loader, valid_loader=test_loader, n_iters=1000)

In [None]:
trainer.fit()


In [None]:
visualize_progress(trainer.train_loss, trainer.valid_loss, start=120)


In [None]:
stats = {
    "train_loss": trainer.train_loss,
    "valid_loss": trainer.valid_loss
}
save_model(trainer.model, trainer.optimizer, trainer.iter_, stats, margin=0.2)

# Comparison of the two models

In [None]:
# TODO: Compare models qualitatively.

# Training: Best model using semi-hard negative mining strategy

In [None]:
# The semi-hard negative mining strategy is implemented within the loss function.
# The Triplet loss function with mining was not properly tested, so you might need to change something.
# I am not that happy with the implementation, if you want you can also change it completely.
train_dataset2 = Market1501(data_path = os.path.join(os.getcwd(),"Market-1501-v15.09.15"), is_train = True)
test_dataset2 = Market1501(data_path = os.path.join(os.getcwd(),"Market-1501-v15.09.15"), is_train = False)

train_loader2 = torch.utils.data.DataLoader(dataset=train_dataset, batch_size=128, sample_triplets=False, shuffle=True) 
test_loader2 = torch.utils.data.DataLoader(dataset=test_dataset, batch_size=128, shuffle=False) 

model = SiameseModel()
criterion = TripletLossWithMining(margin=0.2)
trainer = Trainer(model=model, criterion=criterion, train_loader=train_loader2, valid_loader=test_loader2, n_iters=1000)

In [None]:
trainer.fit()


In [None]:
visualize_progress(trainer.train_loss, trainer.valid_loss, start=120)


In [None]:
stats = {
    "train_loss": trainer.train_loss,
    "valid_loss": trainer.valid_loss
}
save_model(trainer.model, trainer.optimizer, trainer.iter_, stats, margin=0.2)

# Evaluation

In [None]:
# TODO: Evaluate the models and compare them. One might need to rename the previous trainers such that they do not overwrite each other and we can compare them.

In [None]:
# Embedding Visualization

In [None]:
# TODO: Visualize the embeddings. I copied the functions from the session already into the notebook.